git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* empty directories
@ 2007-04-23 15:40 Yakov Lerner
  2007-04-23 16:19 ` Alex Riesen
  0 siblings, 1 reply; 156+ messages in thread
From: Yakov Lerner @ 2007-04-23 15:40 UTC (permalink / raw
  To: Git Mailing List

When I git-add empty directory (mkdir d1;git-add d1),
git refuses to add it [1].

I was told on #git chan that git cannot store empty dirs.
But when I do
         git-add -f emptyDir # where emptyDir is empty dir
, emptyDir is added and then cloned. What does it mean ?

Does it mean that if i git-add emptyDir with -f, it may break
something in the repo ? That I shall not try it ? Or it is ok ?

Thanks
Yakov
.....................................................................
[1] $ mkdir emptyDir
$ git-add emptyDir
The following paths are ignored by one of your .gitignore files:
emptyDir (directory)
Use -f if you really want to add them.
$
Note that the printed warning is misleading.
The name (emptyDir) is not in any .gitignore files.
It would be better to print the real reason for ignoring, no ?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-04-23 15:40 Yakov Lerner
@ 2007-04-23 16:19 ` Alex Riesen
  2007-04-23 16:49   ` Yakov Lerner
  0 siblings, 1 reply; 156+ messages in thread
From: Alex Riesen @ 2007-04-23 16:19 UTC (permalink / raw
  To: Yakov Lerner; +Cc: Git Mailing List

On 4/23/07, Yakov Lerner <iler.ml@gmail.com> wrote:
> When I git-add empty directory (mkdir d1;git-add d1),
> git refuses to add it [1].
>
> I was told on #git chan that git cannot store empty dirs.

It can, just refuses to. Which considered good by most

> But when I do
>          git-add -f emptyDir # where emptyDir is empty dir
> , emptyDir is added and then cloned. What does it mean ?

$ git add -f emptyDir
fatal: unable to index file emptyDir

> Does it mean that if i git-add emptyDir with -f, it may break
> something in the repo ? That I shall not try it ? Or it is ok ?

It is not ok and it does not break anything. What git do you
have, as I apparently cannot reproduce it.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-04-23 16:19 ` Alex Riesen
@ 2007-04-23 16:49   ` Yakov Lerner
  0 siblings, 0 replies; 156+ messages in thread
From: Yakov Lerner @ 2007-04-23 16:49 UTC (permalink / raw
  To: Alex Riesen; +Cc: Git Mailing List

On 4/23/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> On 4/23/07, Yakov Lerner <iler.ml@gmail.com> wrote:
> > When I git-add empty directory (mkdir d1;git-add d1),
> > git refuses to add it [1].
> >
> > I was told on #git chan that git cannot store empty dirs.
>
> It can, just refuses to. Which considered good by most
>
> > But when I do
> >          git-add -f emptyDir # where emptyDir is empty dir
> > , emptyDir is added and then cloned. What does it mean ?
>
> $ git add -f emptyDir
> fatal: unable to index file emptyDir
>
> > Does it mean that if i git-add emptyDir with -f, it may break
> > something in the repo ? That I shall not try it ? Or it is ok ?
>
> It is not ok and it does not break anything. What git do you
> have, as I apparently cannot reproduce it.

It was 1.5.1.
In 1.5.1.2, it refuses to add, yes.
The message is:
fatal: emptyDir/../emptyDir: can only add regular files or symbolic links

Yakov

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Empty directories...
@ 2007-07-18  0:13 David Kastrup
  2007-07-18  0:35 ` Johannes Schindelin
                   ` (3 more replies)
  0 siblings, 4 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-18  0:13 UTC (permalink / raw
  To: git


GIT(7) -- 03/05/2007

NAME
	git - the stupid content tracker


Well, I use git for tracking contents.  That means, for example,
installation trees for some application.  Let's take a typical TeXlive
tree as an example.  Those trees contain, among other things,
directories where new fonts/formats/whatever get placed as things run.
Quite a few of them start out empty, but their permissions have to
correspond to their purpose (for example, some are world-writable).

I see little chance to get this achieved without doing something like

find -type d -empty -execdir touch {}/.git-this-is-empty +

before every checkin and

find -name .git-this-is-empty -exec rm -- {} +

after every checkout.  Which is pretty stupid.

As some anecdotal stuff, I did something like

mkdir test
cd test
git-init
touch README
git-add README # another peeve: why is no empty reference point possible?
git-commit -a -m "Initial branch"
git checkout -b newbranch master
unzip ../somearchive -d subdir
git add subdir
git commit -a -m "Add subdir"
git checkout -b newbranch2 master

and expect to have a clean slate.  No such luck: without warning, all
empty directories in the zip file are still remaining within subdir,
which as a consequence has not been cleaned up.

So even if one is of the opinion that empty directories are not worth
putting into the repository: if I check in an entire subdirectory
hierarchy and then switch to a branch where this subdirectory is not
existent, I expect the subdirectory to be _gone_, and not have some
littering of empty directories lying around.

And that git-diff can see nothing wrong with that does not really
improve things.

So if git is supposed to be a content tracker, I can't see a way
around it actually being able to track content, and empty directories
_are_ content.  It can't let them flying around with arbitrary
permissions on them when I switch branches or tags.  And the
workaround using "touch" mentioned above is really awful to do
manually all the time.

Could git technically track a file with a zero-length filename in
empty directories if one tells it explicitly to include it, like with
git-add \! -x "" subdir
or has somebody a better idea or interface or rationale?  I understand
that there are use cases where one does not bother about empty
directories, but for a _content_ tracker, not tracking directories
because they are empty seems quite serious.

Ok, kill me.  This must likely be the most common FAQ/rant/whatever
concerning git.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  0:13 Empty directories David Kastrup
@ 2007-07-18  0:35 ` Johannes Schindelin
  2007-07-18  6:07   ` David Kastrup
  2007-07-18  0:39 ` Matthieu Moy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 156+ messages in thread
From: Johannes Schindelin @ 2007-07-18  0:35 UTC (permalink / raw
  To: David Kastrup; +Cc: git

Hi,

On Wed, 18 Jul 2007, David Kastrup wrote:

> This must likely be the most common FAQ/rant/whatever concerning git.

If you had the idea already, I wonder why you did not find it.  It's not 
really anything like hard to find:

http://git.or.cz/gitwiki/GitFaq#head-1fbd4a018d45259c197b169e87dafce2a3c6b5f9

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  0:13 Empty directories David Kastrup
  2007-07-18  0:35 ` Johannes Schindelin
@ 2007-07-18  0:39 ` Matthieu Moy
  2007-07-18  6:16   ` David Kastrup
  2007-07-18  2:23 ` Junio C Hamano
  2007-07-26 23:33 ` Robin Rosenberg
  3 siblings, 1 reply; 156+ messages in thread
From: Matthieu Moy @ 2007-07-18  0:39 UTC (permalink / raw
  To: David Kastrup; +Cc: git

David Kastrup <dak@gnu.org> writes:

> or has somebody a better idea or interface or rationale?  I understand
> that there are use cases where one does not bother about empty
> directories, but for a _content_ tracker, not tracking directories
> because they are empty seems quite serious.

,----[ http://www.spinics.net/lists/git/msg30730.html ]
| From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
| 
| I wouldn't personally mind if somebody taught git to just track empty
| directories too.
| 
| There is no fundamental git database reason not to allow them: it's in
| fact quite easy to create an empty tree object. The problems with
| empty directories are in the *index*, and they shouldn't be
| insurmountable.
| 
| [...]
`----

-- 
Matthieu

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  0:13 Empty directories David Kastrup
  2007-07-18  0:35 ` Johannes Schindelin
  2007-07-18  0:39 ` Matthieu Moy
@ 2007-07-18  2:23 ` Junio C Hamano
  2007-07-18  5:56   ` David Kastrup
  2007-07-26 23:33 ` Robin Rosenberg
  3 siblings, 1 reply; 156+ messages in thread
From: Junio C Hamano @ 2007-07-18  2:23 UTC (permalink / raw
  To: David Kastrup; +Cc: git

David Kastrup <dak@gnu.org> writes:

> or has somebody a better idea or interface or rationale?  I understand
> that there are use cases where one does not bother about empty
> directories, but for a _content_ tracker, not tracking directories
> because they are empty seems quite serious.

No objections as long as a patch is cleanly made without
regression.  It's just nobody agreed that it is "quite serious"
yet so far, and no fundamental reason against it.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  2:23 ` Junio C Hamano
@ 2007-07-18  5:56   ` David Kastrup
  2007-07-18  6:34     ` Wincent Colaiuta
  2007-07-18  6:53     ` Junio C Hamano
  0 siblings, 2 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-18  5:56 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> or has somebody a better idea or interface or rationale?  I understand
>> that there are use cases where one does not bother about empty
>> directories, but for a _content_ tracker, not tracking directories
>> because they are empty seems quite serious.
>
> No objections as long as a patch is cleanly made without
> regression.  It's just nobody agreed that it is "quite serious"
> yet so far, and no fundamental reason against it.

Thanks.  It certainly is not serious for the Linux kernel source, but
seems awkward for quite a few situations.  Anyway, what is your take
on the situation I described?

That creating some directory hierarchy (happening to contain empty
directories) with some external program, adding and committing it,
then switching to a different branch (or maybe doing a git-reset
--hard) leaves a skeleton of empty directories around?

I find this almost worse than not being able to put them into the
repository: you can't get rid of them anymore either!

I'd be tempted to propose that git should remove empty subdirectories
when cleaning up a removed tree in the working directory, even though
that violates the principle to not delete anything it isn't tracking.
But since you can't get it to track the stuff in the first place...

But the real fix would be to track them.

Does some trick work possibly at checkin time, like putting an empty
file into every empty directory, adding to the index, then removing
all empty files explicitly from the index and then checking in, or is
this hopeless to work around with from the user side without affecting
the repository itself?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  0:35 ` Johannes Schindelin
@ 2007-07-18  6:07   ` David Kastrup
  2007-07-18 10:26     ` Johannes Schindelin
  2007-07-18 16:23     ` Linus Torvalds
  0 siblings, 2 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-18  6:07 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Wed, 18 Jul 2007, David Kastrup wrote:
>
>> This must likely be the most common FAQ/rant/whatever concerning git.
>
> If you had the idea already, I wonder why you did not find it.  It's not 
> really anything like hard to find:
>
> http://git.or.cz/gitwiki/GitFaq#head-1fbd4a018d45259c197b169e87dafce2a3c6b5f9

The FAQ answer is weazeling on several accounts:

a) No, git only cares about files, or rather git tracks content and
   empty directories have no content.

In the same manner as empty regular files have no contents, and git
tracks those.  Existence and permissions are important.

b) The problem is not just that empty directories don't get added into
the repository.  They also don't get removed again when switching to a
different checkout.  When git-diff returns zero, I expect a subsequent
checkout to not leave complete empty hierarchies around because git
can't delete any empty leaves which it chose not to track.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  0:39 ` Matthieu Moy
@ 2007-07-18  6:16   ` David Kastrup
  2007-07-18  6:30     ` Shawn O. Pearce
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-18  6:16 UTC (permalink / raw
  To: git

Matthieu Moy <Matthieu.Moy@imag.fr> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> or has somebody a better idea or interface or rationale?  I understand
>> that there are use cases where one does not bother about empty
>> directories, but for a _content_ tracker, not tracking directories
>> because they are empty seems quite serious.
>
> ,----[ http://www.spinics.net/lists/git/msg30730.html ]
> | From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> | 
> | I wouldn't personally mind if somebody taught git to just track empty
> | directories too.
> | 
> | There is no fundamental git database reason not to allow them:
> | it's in fact quite easy to create an empty tree object.
> | The problems with empty directories are in the *index*, and they
> | shouldn't be insurmountable.

Stop right here: does that mean that I can script some "put empty
directories into the last commit manually" procedure bypassing the
index?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  6:16   ` David Kastrup
@ 2007-07-18  6:30     ` Shawn O. Pearce
  0 siblings, 0 replies; 156+ messages in thread
From: Shawn O. Pearce @ 2007-07-18  6:30 UTC (permalink / raw
  To: David Kastrup; +Cc: git

David Kastrup <dak@gnu.org> wrote:
> > ,----[ http://www.spinics.net/lists/git/msg30730.html ]
> > | From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > | 
> > | I wouldn't personally mind if somebody taught git to just track empty
> > | directories too.
> > | 
> > | There is no fundamental git database reason not to allow them:
> > | it's in fact quite easy to create an empty tree object.
> > | The problems with empty directories are in the *index*, and they
> > | shouldn't be insurmountable.
> 
> Stop right here: does that mean that I can script some "put empty
> directories into the last commit manually" procedure bypassing the
> index?

Yes.  But when you read that tree into the index later (by say
checking out a branch that points to it) the empty directories
will not be created, as they have no files to cause their creation.
Committing changes on that branch will remove the empty directories.
;-)

Oh, and the above question from you sounds like you think you can
modify the last commit to include new directories that weren't
there before.  You cannot do that without changing the tree SHA-1,
which will cause the commit SHA-1 to change.  That in turns means you
are not actually adding to the last commit but instead are creating
an entirely different commit.  History in Git is always immutable.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  5:56   ` David Kastrup
@ 2007-07-18  6:34     ` Wincent Colaiuta
  2007-07-18  6:53     ` Junio C Hamano
  1 sibling, 0 replies; 156+ messages in thread
From: Wincent Colaiuta @ 2007-07-18  6:34 UTC (permalink / raw
  To: David Kastrup; +Cc: Junio C Hamano, git

El 18/7/2007, a las 7:56, David Kastrup escribió:

> That creating some directory hierarchy (happening to contain empty
> directories) with some external program, adding and committing it,
> then switching to a different branch (or maybe doing a git-reset
> --hard) leaves a skeleton of empty directories around?
>
> I find this almost worse than not being able to put them into the
> repository: you can't get rid of them anymore either!
>
> I'd be tempted to propose that git should remove empty subdirectories
> when cleaning up a removed tree in the working directory, even though
> that violates the principle to not delete anything it isn't tracking.
> But since you can't get it to track the stuff in the first place...
>
> But the real fix would be to track them.

Although I haven't yet been "bitten" by this issue I understand where  
you're coming from. This could confuse users and appear inconsistent  
to them (seeing as empty *files* can be tracked). I think it's  
probably worth tackling for that reason alone, but it will have the  
additional benefit of enabling other workflows like the one you  
describe ("installation trees for some application").

> Does some trick work possibly at checkin time, like putting an empty
> file into every empty directory, adding to the index, then removing
> all empty files explicitly from the index and then checking in, or is
> this hopeless to work around with from the user side without affecting
> the repository itself?

I wouldn't recommend any "tricks" here. I think the real solution is  
to allow the tracking of empty trees; everything else seems like a  
kludge. And then, as you've noted already that will allow Git to  
handle the "skeleton of empty directories" left behind problem that  
you describe.

Cheers,
Wincent

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  5:56   ` David Kastrup
  2007-07-18  6:34     ` Wincent Colaiuta
@ 2007-07-18  6:53     ` Junio C Hamano
       [not found]       ` <867ioyqhgc.fsf@lola.quinscape.zz>
                         ` (2 more replies)
  1 sibling, 3 replies; 156+ messages in thread
From: Junio C Hamano @ 2007-07-18  6:53 UTC (permalink / raw
  To: David Kastrup; +Cc: git

David Kastrup <dak@gnu.org> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> No objections as long as a patch is cleanly made without
>> regression.  It's just nobody agreed that it is "quite serious"
>> yet so far, and no fundamental reason against it.
>
> Thanks.  It certainly is not serious for the Linux kernel source, but
> seems awkward for quite a few situations.  Anyway, what is your take
> on the situation I described?

Didn't I say I do not have an objection for somebody who wants
to track empty directories, already?  I probably would not do
that myself but I do not see a reason to forbid it, either.

The right approach to take probably would be to allow entries of
mode 040000 in the index.  Traditionally, we allowed only 100644
(blobs as regular files) and 120000 (blobs as symlinks).  We
recently added 160000 (commit from outer space, aka subproject).

And we do that for all directories, not just empty ones.  So if
you have fileA, empty/, sub/fileB tracked, your index would
probably have these four entries, immediately after read-tree
of an existing tree object:

	100644 15db6f1f27ef7a... 0	fileA
	040000 4b825dc642cb6e... 0	empty
	040000 e125e11d3b63e3... 0	sub
	100644 52054201c2a872... 0	sub/fileB

Making sure that empty/ directory exists in the working tree is
probably done in entry.c; we have been touching that area in an
unrelated thread in the past few days.

If you add sub/fileC, with "update-index" (and "add"), you
invalidate the SHA-1 object name you stored for "sub" (because
there is no point recomputing the tree object until you know you
need a subtree for "sub" part, which does not happen until the
next "write-tree"), and end up with something like:

	100644 15db6f1f27ef7a... 0	fileA
	040000 4b825dc642cb6e... 0	empty
	040000 00000000000000... 0	sub
	100644 52054201c2a872... 0	sub/fileB
	100644 705bf16c546f32... 0	sub/fileC

These "missing" SHA-1 would need to be recomputed on-demand.

We have had necessary infrastructure to do this "keeping
untouched tree object names in the index" for quite some time,
but it is not a part of the index proper (it is stored in an
extension section in the index file, to keep the index
compatible with older versions of git).

Having made it sound so easy, here are the issues I would expect
to be nontrivial (but probably not rocket surgery either).

 * unpack-trees, which is the workhorse for twoway merge (aka
   "switching branches") and threeway merge, has a convoluted
   logic to avoid D/F conflicts; it can probably be cleaned up
   once we do the above conversion so that the index starts
   saying "Hey, I have a directory here" more explicitly.  The
   end result would probably be a code easier to follow.

 * status, update-index --refresh, and diff-files cares about
   the information cached in the index from the last time
   lstat(2) is run on each entry.  What we should store there
   for "tree" entries is very unclear to me, but probably we
   should teach them to ignore the stat-matching logic for
   these entries.

 * diff-index walks the index and a tree in parallel but does
   not currently expect to see a tree object in the index.  It
   needs to be taught to ignore these "tree" entries.

 * merge-recursive and merge-index walk the index, coming up
   with the merge results one path at a time.  They also need to
   be taught to ignore these "tree" entries.

 * diff-index and "read-tree -m" should be taught to take
   advantage of the "tree" entries in the index.  For example,
   if diff-index finds the "tree" entry in the index and the
   subtree found from the tree object exactly match, it does not
   even have to descend into the tree, which would be a huge
   performance win (because you do not have to open the subtree
   and its subtrees from the tree side; you already have read
   everything on the index side, and still have to skip the
   entries in the directory).  "read-tree -m" also should be
   able to optimize two identical subtrees in the 2 or 3 trees
   involved.

   Even if we follow the "lazy invalidate" strategy to maintain
   the "tree" entries in the normal codepath, we could have a
   special operation that says "now update all the tree entries
   by recomputing the tree object names as needed".  Perhaps we
   might want to initiate such an operation before "read-tree
   -m" automatically.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  6:07   ` David Kastrup
@ 2007-07-18 10:26     ` Johannes Schindelin
       [not found]       ` <86tzs2m1h7.fsf@lola.quinscape.zz>
  2007-07-18 16:23     ` Linus Torvalds
  1 sibling, 1 reply; 156+ messages in thread
From: Johannes Schindelin @ 2007-07-18 10:26 UTC (permalink / raw
  To: David Kastrup; +Cc: git

Hi,

On Wed, 18 Jul 2007, David Kastrup wrote:

> The FAQ answer is weazeling on several accounts:
> 
> a) No, git only cares about files, or rather git tracks content and
>    empty directories have no content.
> 
> In the same manner as empty regular files have no contents, and git
> tracks those.  Existence and permissions are important.

We do not track permissions of directories at all.  This is because Git is 
primarily meant to track source code, and most "permissions" (i.e. 
restrictions) do not make any sense there.

> b) The problem is not just that empty directories don't get added into
> the repository.  They also don't get removed again when switching to a
> different checkout.  When git-diff returns zero, I expect a subsequent
> checkout to not leave complete empty hierarchies around because git
> can't delete any empty leaves which it chose not to track.

I _like_ the behaviour that Git does not remove a directory it added, when 
I put some untracked file into it.  And switching back to that branch, Git 
has no problems, because it sees that the directory is already there.  In 
case of a file, it would complain, and rightfully so.

See the fundamental difference between a file and a directory now?  I 
think it boils down to "an empty directory has _no_ contents, but an empty 
file has an _empty_ content".

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
       [not found]       ` <86tzs2m1h7.fsf@lola.quinscape.zz>
@ 2007-07-18 11:24         ` Johannes Schindelin
  2007-07-18 11:40           ` Matthieu Moy
  0 siblings, 1 reply; 156+ messages in thread
From: Johannes Schindelin @ 2007-07-18 11:24 UTC (permalink / raw
  To: David Kastrup; +Cc: git

Hi,

On Wed, 18 Jul 2007, David Kastrup wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Wed, 18 Jul 2007, David Kastrup wrote:
> >
> >> The FAQ answer is weazeling on several accounts:
> >> 
> >> a) No, git only cares about files, or rather git tracks content and
> >>    empty directories have no content.
> >> 
> >> In the same manner as empty regular files have no contents, and git
> >> tracks those.  Existence and permissions are important.
> >
> > We do not track permissions of directories at all.
> 
> Ok, this seems like something that should be done as well, even if we
> can stipulate at first that a directory should have rwx for the user
> in question if you hope to track it.

No, no, no.  It should not be tracked.  It is the responsibility of the 
_user_ to set it to something sane, be that by a umask or by sticky 
groups, or by setting the permissions of the parent directory.

It is _nothing_ we want to put into the repository.  That is the _wrong_ 
place to put it.

> > This is because Git is primarily meant to track source code,
> 
> Tell that to the man page.  It declares git to be "a content tracker" 
> right at the front.

Why don't you?  I have no problems with the title.

> > and most "permissions" (i.e.  restrictions) do not make any sense
> > there.
> 
> So why are permissions for files being tracked, then?

This question is invalid.  Git only tracks the _executable_ bit.  And 
again, it is the users' responsibility, by setting the umask, to have the 
appropriate bits set for group and others.

> >> b) The problem is not just that empty directories don't get added 
> >> into the repository.  They also don't get removed again when 
> >> switching to a different checkout.  When git-diff returns zero, I 
> >> expect a subsequent checkout to not leave complete empty hierarchies 
> >> around because git can't delete any empty leaves which it chose not 
> >> to track.
> >
> > I _like_ the behaviour that Git does not remove a directory it
> > added, when I put some untracked file into it.
> 
> But it does not remove a directory it _refused_ to add when there were
> no files at all in it ever.  You probably have not read the problem
> description carefully.

I have.  But that does not apply here, because I used the term "to add a 
directory" in the sense of "mkdir".

> > And switching back to that branch, Git has no problems, because it 
> > sees that the directory is already there.  In case of a file, it would 
> > complain, and rightfully so.
> 
> And if you switch to a branch where the directory it did not remove now 
> is a file?

Git already throws an error, and rightfully so.  I am pleased by the 
current behaviour.

> > See the fundamental difference between a file and a directory now?
> 
> Condescension is not really solving a problem.

Hey, I only tried to help clarify things.

But since I seem to be unable to, I'll end my efforts with this 
suggestion:

If you want to track empty directories, the best thing would be to

- teach git-add to automatically create an empty .gitignore (and error out 
  if that already exists), and

- teach git-archive to not put .gitignore files into the output by default 
  (but the directories).  This might be a sensible change regardless if 
  you want to add empty directories to the repository or not.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 11:24         ` Johannes Schindelin
@ 2007-07-18 11:40           ` Matthieu Moy
  2007-07-18 12:12             ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Matthieu Moy @ 2007-07-18 11:40 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: David Kastrup, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > We do not track permissions of directories at all.
>> 
>> Ok, this seems like something that should be done as well, even if we
>> can stipulate at first that a directory should have rwx for the user
>> in question if you hope to track it.
>
> No, no, no.  It should not be tracked.  It is the responsibility of the 
> _user_ to set it to something sane, be that by a umask or by sticky 
> groups, or by setting the permissions of the parent directory.
>
> It is _nothing_ we want to put into the repository.  That is the _wrong_ 
> place to put it.

I'm not sure it's wrong to be able to track permissions, but it's
definitely wrong to track them by default.

GNU Arch had some permission tracking, and I got hit by it several
times. You have several things you might have wanted to track:

* read/write for the user. But I can't imagine a case where you
  wouldn't want to be able to read and write your own files.

* permissions for group. But that doesn't make any sense when several
  persons work on the same project, and don't share the same
  /etc/group.

* permissions for others. But that, again, doesn't make sense when
  several persons work on the same project with different setups. I
  sometimes work at home, where I'm basically the only user, I don't
  care at all about permissions for others. At work, it's totally
  different, since it's a big NFS shared by all the lab. And I might
  very well disclose my work to the rest of the lab, and work with
  someone who do not want to do so.

* Execute bit. This one is relevant. Indeed, it's more a kind of
  metadata than really a permission (you can still execute the file
  with /lib/ld-linux.so.2 /path/to/file or such kind of things).

Using GNU Arch, I got the cases in real life of a project in which
some files had group read permission, some other not, because they
were created by developers having different umask. Worse than this, I
got some group-writable files in my $HOME without noticing it, which
is basically a security hole.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 11:40           ` Matthieu Moy
@ 2007-07-18 12:12             ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-18 12:12 UTC (permalink / raw
  To: git

Matthieu Moy <Matthieu.Moy@imag.fr> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>>> > We do not track permissions of directories at all.
>>> 
>>> Ok, this seems like something that should be done as well, even if we
>>> can stipulate at first that a directory should have rwx for the user
>>> in question if you hope to track it.
>>
>> No, no, no.  It should not be tracked.  It is the responsibility of the 
>> _user_ to set it to something sane, be that by a umask or by sticky 
>> groups, or by setting the permissions of the parent directory.
>>
>> It is _nothing_ we want to put into the repository.  That is the _wrong_ 
>> place to put it.
>
> I'm not sure it's wrong to be able to track permissions, but it's
> definitely wrong to track them by default.

I am not sure about "definitely", but there certainly are applications
where it is appropriate.

> * Execute bit. This one is relevant. Indeed, it's more a kind of
>   metadata than really a permission (you can still execute the file
>   with /lib/ld-linux.so.2 /path/to/file or such kind of things).

Please spare us the sophistry.  Probably the most flexible approach
would be to be able to specify a checkout umask, defaulting to 700
(the other bits are then filled in from the normal user umask).  For
archival purposes, one would then set it to 777 instead.

There is the question how to deal with checkins.  While there is no
harm in checking in the full permissions in case one would need them,
it would likely be a nuisance to track the individual contributor's
settings.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  6:07   ` David Kastrup
  2007-07-18 10:26     ` Johannes Schindelin
@ 2007-07-18 16:23     ` Linus Torvalds
  2007-07-18 16:33       ` Linus Torvalds
                         ` (2 more replies)
  1 sibling, 3 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-18 16:23 UTC (permalink / raw
  To: David Kastrup; +Cc: Johannes Schindelin, git



On Wed, 18 Jul 2007, David Kastrup wrote:
> 
> In the same manner as empty regular files have no contents, and git
> tracks those.  Existence and permissions are important.

Yes, but directories really are different.

First off, git wouldn't track the permissions anyway (git tracks execute 
bits, but for directories that _has_ to be set or git couldn't use them 
itself, so that's not going to happen).

Second, and much more important, the directories will exist or not 
*regardless* of what git does.

> b) The problem is not just that empty directories don't get added into
> the repository.  They also don't get removed again when switching to a
> different checkout.

Bzzt. Wrong.

We *do* remove directories when all files under them go away.

HOWEVER (and this is where one of the reasons for not tracking them comes 
in):

   ** YOU CANNOT REMOVE A DIRECTORY IF IT HAS SOME UNTRACKED CONTENTS **

Think about that for five seconds, then think about it some more. Ponder 
it.

So the fact is, git *already* does ass good of a job as it could possibly 
do wrt directories that go away: it tries to remove them if all the files 
that are tracked in it have gone away.

But that leaves a very common case, namely switching to another branch 
without those files, and the directory still having stale object files etc 
build crud in it.

A SCM *must*not* just remove that directory. It would be horrible. The 
fact that it has untracked files in it does not make those untracked files 
"unimportant". Maybe you feel that way about object files, but what about 
tracking some important parts of your home directory - does the fact that 
you don't necessarily track *all* of it mean that the rest is totally 
unimportant adn that git should just remove it? HELL NO!

So directories really _are_ problematic. You cannot (and should not) track 
them the same way as you track a file.

And the difference is very fundamental indeed: when you track a regular 
file, you track *all* of its content. But when you track a directory, 
you don't track it's content *at*all*.

Think about that, and then think about the fact that git is defined as a 
"content tracker", and it's not "weasely" at all to say that you don't 
track directories.

So your argument is totally bogus. When you track an empty file, you very 
much track the *content* of that file, and "empty" just happens to be a 
very valid content.

But when you track a "directory", you don't actually track its content at 
all, you track it's *existence*, which is a very very very different 
thing. I hope you understand from the above what is so different.

(A true "directory content" tracker by definition would have to track 
every single file under that directory. You can claim that for the case of 
an empty directory the "existence tracking" is 100% equivalent with 
"content tracking", but that's simply not true. It becomes non-true the 
moment there are any files at all inside that directory, and be honest 
now: the only _point_ of an empty directory is that you expect it to 
potentially get files under it).

So "existence" != "content". Git very much does not track "existence" of 
files, it tracks the total content of them too.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 16:23     ` Linus Torvalds
@ 2007-07-18 16:33       ` Linus Torvalds
  2007-07-18 17:38         ` David Kastrup
  2007-07-18 16:39       ` Matthieu Moy
  2007-07-18 17:34       ` David Kastrup
  2 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-18 16:33 UTC (permalink / raw
  To: David Kastrup; +Cc: Johannes Schindelin, git



On Wed, 18 Jul 2007, Linus Torvalds wrote:
> 
> So "existence" != "content". Git very much does not track "existence" of 
> files, it tracks the total content of them too.

Btw, don't get me wrong: I think that in order to be better at tracking 
other SCM's idiotic choices, we could (and I foresee that we eventually 
have to) try to track empty directories as a special case too.

So I'm not _against_ the notion of tracking empty directories, and I would 
welcome patches that do so. As I mentioned in some earlier thread when 
this came up a few weeks ago, I actually suspect that the "subproject" 
support probably ended up making it easier, because in many ways an "empty 
directory" is very close to a "anonymous subproject" from a low-level 
plumbing standpoint (even if it is *not* so from a high-level standpoint).

So I suspect that adding support for empty directories ends up being about 
just slightly extending the places that now have subproject support to 
know about a new situation.

But I do want to point out that "tracking a directory" is not at all the 
same thing as "tracking a file", no matter how much you try to argue 
otherwise. The semantics are totally different, and it all boils down to 
the fact that when you track a file, you are always talking about the 
*full* content of the file, while tracking a directory is always about 
tracking just a *subset* of the contents of the directory.

Of course, with directories, there's the trivial case where the subset 
happens to be everything, but that is neither the common nor the 
interesting case. All the interesting and complex cases happen exactly 
when the directory has untracked files in it, and at that point 

 - you really aren't tracking "contents" any more
 - you can no longer recreate the directory from the data you have (so you 
   cannot remove it on branch switches etc)
 - ergo: you're not a content tracker any more, you're a "container" 
   tracker.

And really, the "nontracked files in a directory" is the *default* thing, 
not some really unusual thing that we could disallow.

But I'm not against adding support for "container tracking". I just want 
people to understand that it's something totally different from what we do 
now. It's much more like subproject support than tracking files.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 16:23     ` Linus Torvalds
  2007-07-18 16:33       ` Linus Torvalds
@ 2007-07-18 16:39       ` Matthieu Moy
  2007-07-18 17:06         ` Linus Torvalds
  2007-07-18 17:34       ` David Kastrup
  2 siblings, 1 reply; 156+ messages in thread
From: Matthieu Moy @ 2007-07-18 16:39 UTC (permalink / raw
  To: Linus Torvalds; +Cc: David Kastrup, Johannes Schindelin, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

>> b) The problem is not just that empty directories don't get added into
>> the repository.  They also don't get removed again when switching to a
>> different checkout.
>
> Bzzt. Wrong.
>
> We *do* remove directories when all files under them go away.
>
> HOWEVER (and this is where one of the reasons for not tracking them comes 
> in):
>
>    ** YOU CANNOT REMOVE A DIRECTORY IF IT HAS SOME UNTRACKED CONTENTS **

I believe David's point was different.

If you checkout a branch, create an empty directory in this branch
(probably a placeholder, either for future versionned files, or for
generated files), you cannot tell git "this empty directory is in this
branch, but not in other ones" without adding a file in it.

So, doing "git-checkout anotherbranch", this empty directory doesn't
go away. It's just unversionned in both branches, git won't touch it.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 16:39       ` Matthieu Moy
@ 2007-07-18 17:06         ` Linus Torvalds
  2007-07-18 21:37           ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-18 17:06 UTC (permalink / raw
  To: Matthieu Moy; +Cc: David Kastrup, Johannes Schindelin, git



On Wed, 18 Jul 2007, Matthieu Moy wrote:
> 
> If you checkout a branch, create an empty directory in this branch
> (probably a placeholder, either for future versionned files, or for
> generated files), you cannot tell git "this empty directory is in this
> branch, but not in other ones" without adding a file in it.

Right. Which is the suggested setup: add an empty ".gitignore" file to the 
directory, and you're done. It now acts "as if" git tracked the directory 
(git will remove the directory when switching branches), but without the 
lie that we really track any directory contents.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 16:23     ` Linus Torvalds
  2007-07-18 16:33       ` Linus Torvalds
  2007-07-18 16:39       ` Matthieu Moy
@ 2007-07-18 17:34       ` David Kastrup
  2 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-18 17:34 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Johannes Schindelin, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, 18 Jul 2007, David Kastrup wrote:
>
>> b) The problem is not just that empty directories don't get added
>> into the repository.  They also don't get removed again when
>> switching to a different checkout.
>
> Bzzt. Wrong.
>
> We *do* remove directories when all files under them go away.

But empty directories which were empty to start with don't go away
since they are not tracked.  And that means that their parents don't
go away.

Git will remove directories which _had_ git-tracked content prior to
the checkout.  But it will not register empty directories created
outside of git, and consequently will not remove them.

> HOWEVER (and this is where one of the reasons for not tracking them
> comes in):
>
>    ** YOU CANNOT REMOVE A DIRECTORY IF IT HAS SOME UNTRACKED CONTENTS **
>
> Think about that for five seconds, then think about it some
> more. Ponder it.

Linus, condescension is all very nice, but I already told you: I had a
directory hierarchy created outside of git's control (every file comes
into being first outside of git).  This hierarchy contained empty
directories.  The while hierarchy was committed into git.  git
silently skipped registering empty directories.  Then a different
version got checked out which did not contain the directory hierarchy
in question.  And git left the (unregistered) empty directories in, as
well as all their parent directories.

And that is just plain wrong.

> So the fact is, git *already* does ass good of a job as it could
> possibly do wrt directories that go away: it tries to remove them if
> all the files that are tracked in it have gone away.

But I told git to track the whole directory tree recursively.  There
were no uncommitted files it complained about.  It is not reasonable
that it is afterwards unable to remove this when I checkout some other
tag.

> A SCM *must*not* just remove that directory. It would be
> horrible. The fact that it has untracked files in it does not make
> those untracked files "unimportant".

Sure.  But that it refuses to track the files makes the total behavior
an annoyance.  I don't complain _how_ git handles not being able to
track empty directories.  I complain about it not being able to track
them in the first place.  The consequences are hideous.

> Maybe you feel that way about object files, but what about tracking
> some important parts of your home directory - does the fact that you
> don't necessarily track *all* of it mean that the rest is totally
> unimportant adn that git should just remove it? HELL NO!

When I tell it to track it, it should not refuse.  Even if it is
empty.  Because if it _stayed_ empty, git can then remove it (and
possibly the parents) when I checkout something else.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 16:33       ` Linus Torvalds
@ 2007-07-18 17:38         ` David Kastrup
  2007-07-18 18:05           ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-18 17:38 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But I do want to point out that "tracking a directory" is not at all
> the same thing as "tracking a file", no matter how much you try to
> argue otherwise.

Since I did not try to argue this, could you beat another strawman?
I have seen this prepackaged rant already, but it does not really
address the problem I have been experiencing.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 17:38         ` David Kastrup
@ 2007-07-18 18:05           ` Linus Torvalds
  0 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-18 18:05 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Wed, 18 Jul 2007, David Kastrup wrote:
> 
> Since I did not try to argue this, could you beat another strawman?

How about a bit of honesty?

Here's the quote:

 "The FAQ answer is weazeling on several accounts:

  a) No, git only cares about files, or rather git tracks content and
     empty directories have no content.

  In the same manner as empty regular files have no contents, and git
  tracks those.  Existence and permissions are important."

You called it "weaselly" to say that git tracks only content, and then 
very much tried to equate "existence and permissions" with content.

That's the part I answered.

So it wasn't a strawman, it was a direct answer to your assertion. Now go 
away and either come back with the patch to implement it (that I have 
encouraged you to do), or add a ".gitignore" file to the directory (that 
others have told you will solve your problems).

Don't bother talking crap.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 17:06         ` Linus Torvalds
@ 2007-07-18 21:37           ` David Kastrup
  2007-07-18 21:45             ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-18 21:37 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Matthieu Moy, Johannes Schindelin, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, 18 Jul 2007, Matthieu Moy wrote:
>> 
>> If you checkout a branch, create an empty directory in this branch
>> (probably a placeholder, either for future versionned files, or for
>> generated files), you cannot tell git "this empty directory is in this
>> branch, but not in other ones" without adding a file in it.
>
> Right. Which is the suggested setup: add an empty ".gitignore" file
> to the directory, and you're done.

That implies that every directory in a versioned tree will exclusively
be created under manual and conscious control.  Not by running some
installer or script, unpacking some archive and so on.  But if every
content on a disk was created and put there under manual control of
the disk owner, we could still get along with floppy disks quite fine.
In practice, much more content gets sent around and juggled than what
is under immediate supervision of the user.

This is getting silly: you don't need to pull out rabbits out of your
head.  You said that you are not inclined to do any work in that area
since it does not touch _your_ use cases (well, at least not to a
degree that you consider worth bothering about) but that is no reason
to get into ridiculous arguments about other usage.  No code will come
of that.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 21:37           ` David Kastrup
@ 2007-07-18 21:45             ` Linus Torvalds
  2007-07-18 23:13               ` David Kastrup
  2007-07-18 23:16               ` [RFC PATCH] " Linus Torvalds
  0 siblings, 2 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-18 21:45 UTC (permalink / raw
  To: David Kastrup; +Cc: Matthieu Moy, Johannes Schindelin, git



On Wed, 18 Jul 2007, David Kastrup wrote:
>
> You said that you are not inclined to do any work in that area
> since it does not touch _your_ use cases (well, at least not to a
> degree that you consider worth bothering about) but that is no reason
> to get into ridiculous arguments about other usage.

How hard is it for you to admit that I also said "please send in a patch".

I don't need it. You do. You do the work. I'm just explaining why the work 
hasn't been done.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18 21:45             ` Linus Torvalds
@ 2007-07-18 23:13               ` David Kastrup
  2007-07-18 23:16               ` [RFC PATCH] " Linus Torvalds
  1 sibling, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-18 23:13 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Matthieu Moy, Johannes Schindelin, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, 18 Jul 2007, David Kastrup wrote:
>>
>> You said that you are not inclined to do any work in that area
>> since it does not touch _your_ use cases (well, at least not to a
>> degree that you consider worth bothering about) but that is no reason
>> to get into ridiculous arguments about other usage.
>
> How hard is it for you to admit that I also said "please send in a
> patch".

Yup, that was one sentence in about 5 pages of bile.  In contrast,
Junio gave a good overview of the technical areas involved here, and
estimates about what to do there best.

That's a constructive way to encite somebody to delve into the task
and try to see whether he can come up with something.

But 5 pages of what amounts to "you are an idiot, come up with a
patch" is not leading anywhere.

> I don't need it. You do. You do the work. I'm just explaining why
> the work hasn't been done.

No, you are _defending_ why the work has not been done.  This
rationalizing around the bush is a waste of time.  You probably have
spent quite more time with your venting than Junio did with his
technical analysis, and the latter has been much more helpful.

So why waste all that time and adrenaline on something where you have
already said all you consider relevant?  The arguments don't get any
stronger by shouting, and it is not like you are inconvenienced in any
manner if somebody takes a look at the matter.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [RFC PATCH] Re: Empty directories...
  2007-07-18 21:45             ` Linus Torvalds
  2007-07-18 23:13               ` David Kastrup
@ 2007-07-18 23:16               ` Linus Torvalds
  2007-07-18 23:40                 ` Linus Torvalds
                                   ` (2 more replies)
  1 sibling, 3 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-18 23:16 UTC (permalink / raw
  To: Junio C Hamano, David Kastrup
  Cc: Matthieu Moy, Johannes Schindelin, Git Mailing List


Gaah.

I'm a damn softie (and soft in the head too, for writing the code).

Ok, here's a trivial patch to start the ball rolling. I'm really not 
interested in taking this patch any further personally, but I'm hoping 
that maybe it can make somebody else who is actually _interested_ in 
trackign empty directories (hint hint) decide that it's a good enough 
start that they can fill in the details.

This really updates three different areas, which are nicely separated into 
three different files, so while it's one single patch, you can actually 
follow along the changes by just looking at the differences in each file, 
which directly translate to separate conceptual changes:

 - builtin-update-index.c

   This simply contains the changes to update the index file. As usual, 
   there are multiple different cases, and they boil down to:

	(a) No index entry existed at all previously. If so, a directory 
	    will first go through the "index_path()" logic, which tries to 
	    create a GITLINK entry for it, if the subdirectory is a git 
	    directory. However, the new thing is that if that fails, it 
	    will instead just create a fake empty tree entry for it, and 
	    set the index mode to S_IFDIR.

	(b) It was a gitlink entry before. It stays as a gitlink entry, 
	    even if it cannot be indexed, and a file/symlink entry in 
	    the working tree is a conflict error.

	(c) It was a empty directory entry before. A directory stays as an 
	    empty directory entry, and a file/symlink entry in the working 
	    tree is a conflict error.

   Somebody should check that we properly delete the directory entry if we 
   add a file under it, I honestly didn't bother to go through all the 
   logic. I *think* we do it correctly just thanks to all the previous 
   code for gitlinks. Whatever.

   What I'm trying to say is that the changes are fairly straightforward, 
   but if somebody decides to push this, they need to think about it a lot 
   more than I'm ready to right now.

 - read-cache.c: match the new index type with the filesystem.

   This is pretty damn obvious. A S_ISDIR() always matches, and nothing 
   else matches at all. 

 - unpack-trees.c: unpack empty directories not by unpacking them 
   recursively into the index, but by adding them directly to the index as 
   a S_IFDIR entry instead.

   This one almost certainly needs more work, in particular when merging 
   trees where one has an empty directory, and the other has files _in_ 
   that directory! But the trivial approach makes a simple "git read-tree"
   with an empty directory unpack it into the index as a S_IFDIR entry, so 
   now doing git-write-tree + git-read-tree should result in the original 
   index contents.

I think the patch itself is pretty simple, but the subtle interactions 
that flow out of this all are anything but. It may "just work" almost 
as-is, but quite frankly, I think people need to think about all the 
issues that can happen a lot!

So see this as a basis for further work. The "further work" may be pretty 
simple, or it may not be. I'm personally not that interested, but like my 
original "subprojects" series, hopefully somebody else ends up running 
with this (or alternatively just proving that trying to track empty 
directories is a total nightmare).

			Linus

---
 builtin-update-index.c |   33 +++++++++++++++++++++++----------
 read-cache.c           |    4 ++++
 unpack-trees.c         |   12 +++++++++---
 3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/builtin-update-index.c b/builtin-update-index.c
index 509369e..2eb2a46 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -94,8 +94,16 @@ static int add_one_path(struct cache_entry *old, const char *path, int len, stru
 	fill_stat_cache_info(ce, st);
 	ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
 
-	if (index_path(ce->sha1, path, st, !info_only))
-		return -1;
+	if (index_path(ce->sha1, path, st, !info_only)) {
+		/*
+		 * If we weren't able to index the directory as a GITLINK,
+		 * see if we can just add it as a plain directory instead.
+		 */
+		if (!S_ISDIR(st->st_mode))
+			return -1;
+		ce->ce_mode = htonl(S_IFDIR);
+		pretend_sha1_file(NULL, 0, OBJ_TREE, ce->sha1);
+	}
 	option = allow_add ? ADD_CACHE_OK_TO_ADD : 0;
 	option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0;
 	if (add_cache_entry(ce, option))
@@ -134,6 +142,11 @@ static int process_directory(const char *path, int len, struct stat *st)
 	/* Exact match: file or existing gitlink */
 	if (pos >= 0) {
 		struct cache_entry *ce = active_cache[pos];
+
+		/* Was it a directory before? */
+		if (S_ISDIR(ntohl(ce->ce_mode)))
+			return 0;
+
 		if (S_ISGITLINK(ntohl(ce->ce_mode))) {
 
 			/* Do nothing to the index if there is no HEAD! */
@@ -162,12 +175,8 @@ static int process_directory(const char *path, int len, struct stat *st)
 		return error("%s: is a directory - add individual files instead", path);
 	}
 
-	/* No match - should we add it as a gitlink? */
-	if (!resolve_gitlink_ref(path, "HEAD", sha1))
-		return add_one_path(NULL, path, len, st);
-
-	/* Error out. */
-	return error("%s: is a directory - add files inside instead", path);
+	/* No match - try to just add it as-is */
+	return add_one_path(NULL, path, len, st);
 }
 
 /*
@@ -178,8 +187,12 @@ static int process_file(const char *path, int len, struct stat *st)
 	int pos = cache_name_pos(path, len);
 	struct cache_entry *ce = pos < 0 ? NULL : active_cache[pos];
 
-	if (ce && S_ISGITLINK(ntohl(ce->ce_mode)))
-		return error("%s is already a gitlink, not replacing", path);
+	if (ce) {
+		if (S_ISGITLINK(ntohl(ce->ce_mode)))
+			return error("%s is already a gitlink, not replacing", path);
+		if (S_ISDIR(ntohl(ce->ce_mode)))
+			return error("%s is already a directory entry, not replacing", path);
+	}
 
 	return add_one_path(ce, path, len, st);
 }
diff --git a/read-cache.c b/read-cache.c
index a363f31..d3d2cc0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -142,6 +142,10 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 		    (has_symlinks || !S_ISREG(st->st_mode)))
 			changed |= TYPE_CHANGED;
 		break;
+	case S_IFDIR:
+		if (!S_ISDIR(st->st_mode))
+			changed |= TYPE_CHANGED;
+		return changed;
 	case S_IFGITLINK:
 		if (!S_ISDIR(st->st_mode))
 			changed |= TYPE_CHANGED;
diff --git a/unpack-trees.c b/unpack-trees.c
index 89dd279..22e452b 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -181,9 +181,13 @@ static int unpack_trees_rec(struct tree_entry_list **posns, int len,
 				any_dirs = 1;
 				parse_tree(tree);
 				subposns[i] = create_tree_entry_list(tree);
-				posns[i] = posns[i]->next;
-				src[i + o->merge] = o->df_conflict_entry;
-				continue;
+
+				/* If it wasn't empty, recurse into it */
+				if (subposns[i]) {
+					posns[i] = posns[i]->next;
+					src[i + o->merge] = o->df_conflict_entry;
+					continue;
+				}
 			}
 
 			if (!o->merge)
@@ -197,6 +201,8 @@ static int unpack_trees_rec(struct tree_entry_list **posns, int len,
 
 			ce = xcalloc(1, ce_size);
 			ce->ce_mode = create_ce_mode(posns[i]->mode);
+			if (posns[i]->directory)
+				ce->ce_mode = htonl(S_IFDIR);
 			ce->ce_flags = create_ce_flags(baselen + pathlen,
 						       ce_stage);
 			memcpy(ce->name, base, baselen);

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Empty directories...
       [not found]       ` <867ioyqhgc.fsf@lola.quinscape.zz>
@ 2007-07-18 23:34         ` Junio C Hamano
  0 siblings, 0 replies; 156+ messages in thread
From: Junio C Hamano @ 2007-07-18 23:34 UTC (permalink / raw
  To: David Kastrup; +Cc: git

David Kastrup <dak@gnu.org> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Having made it sound so easy, here are the issues I would expect
>> to be nontrivial (but probably not rocket surgery either).
>> ...
> This would seem to imply that the index does not need to be
> upwards-compatible: simplifying the code means that old indexes won't
> be treated all too well.

I did not imply any such thing, by the way.  These are off the
top of my head technical issues and there probably are more, but
I limited the list to technical side of the things.

You of course have social side to take care of.  If you are
breaking everybody else's index, you would need to tell
everybody: "I am sorry but if you upgrade your git to this
version that does what I want, you have to nuke your index and
start over, so commit all changes first, and then update the
git.  Sorry for causing you a minor inconvenience".  Everybody
at this point involves (obviously) the kernel folks, wine,
x.org, among many others.

I suspect your saying that to them is probably not good enough
for them to forgive the minor inconveniences, which means you
need to convince _me_ to join you in defending, in the release
notes, that this is a feature worth having even though there is
a minor inconvenience to redo everybody's index files.  Which I
suspect is quite unlikely to happen at this moment, though...

A much less troublesome approach might be to do things
differently from what I outlined, to keep the index compatible
as long as it does not contain an empty directory, which is what
we did for subprojects support.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-18 23:16               ` [RFC PATCH] " Linus Torvalds
@ 2007-07-18 23:40                 ` Linus Torvalds
  2007-07-18 23:42                 ` David Kastrup
  2007-07-21  4:29                 ` David Kastrup
  2 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-18 23:40 UTC (permalink / raw
  To: Junio C Hamano, David Kastrup
  Cc: Matthieu Moy, Johannes Schindelin, Git Mailing List



On Wed, 18 Jul 2007, Linus Torvalds wrote:
>
> +		if (!S_ISDIR(st->st_mode))
> +			return -1;
> +		ce->ce_mode = htonl(S_IFDIR);
> +		pretend_sha1_file(NULL, 0, OBJ_TREE, ce->sha1);

Oh, one word of warning: that whole "pretend_sha1_file()" thing won't 
create the object itself, and when I did the limited testing that I did, I 
actually made sure had a magic zero-sized tree object in my object 
directory.

If you don't, some things will complain, because they end up getting a 
SHA1 that they cannot look up, becasue *they* didn't create that pretend 
entry.

I didn't know which way I wanted to go with that thing. I was kind of 
thinking that maybe we would just have the zero-sized OBJ_BLOB and 
OBJ_TREE objects as special magical things, and have all git programs just 
do that "pretend" at the beginning.

But that kind of thing is probably just a totally unnecessary special 
case, and instead, that "pretend_sha1_file()" should have just been a

	write_sha1_file(NULL, 0, "tree", ce->sha1);

instead.

Anyway, if there are issues with not finding an object called 
4b825dc642cb6eb9a060e54bf8d69288fbee4904, then that's the empty tree 
object, and that pretend thing was the cause.

(The git repo itself has the empty tree as an object in it, because one of 
the commits has that - probably as a result of a bug, but there you have 
it)

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-18 23:16               ` [RFC PATCH] " Linus Torvalds
  2007-07-18 23:40                 ` Linus Torvalds
@ 2007-07-18 23:42                 ` David Kastrup
  2007-07-19  0:22                   ` Linus Torvalds
       [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
  2007-07-21  4:29                 ` David Kastrup
  2 siblings, 2 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-18 23:42 UTC (permalink / raw
  To: Linus Torvalds
  Cc: Junio C Hamano, Matthieu Moy, Johannes Schindelin,
	Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Gaah.
>
> I'm a damn softie (and soft in the head too, for writing the code).
>
> Ok, here's a trivial patch to start the ball rolling. I'm really not 
> interested in taking this patch any further personally, but I'm hoping 
> that maybe it can make somebody else who is actually _interested_ in 
> trackign empty directories (hint hint) decide that it's a good enough 
> start that they can fill in the details.

Well, kudos.  Together with the analysis from Junio, this seems like a
good start.  Would you have any recommendations about what stuff one
should really read in order to get up to scratch about git internals?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-18 23:42                 ` David Kastrup
@ 2007-07-19  0:22                   ` Linus Torvalds
  2007-07-19  5:28                     ` Junio C Hamano
       [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
  1 sibling, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-19  0:22 UTC (permalink / raw
  To: David Kastrup
  Cc: Junio C Hamano, Matthieu Moy, Johannes Schindelin,
	Git Mailing List



On Thu, 19 Jul 2007, David Kastrup wrote:
> 
> Well, kudos.  Together with the analysis from Junio, this seems like a
> good start.  Would you have any recommendations about what stuff one
> should really read in order to get up to scratch about git internals?

Well, you do need to understand the index. That's where all the new 
subtlety happens.

The data structures themselves are trivial, and we've supported empty 
trees (at the top level) from the beginning, so that part is not anything 
new.

However, now having a new entry type in the index (S_IFDIR) means that 
anything that interacts with the index needs to think twice. But a lot of 
that is just testing what happens, and so the first thing to do is to have 
a test-suite.

There's also the question about how to show an empty tree in a diff. We've 
never had that: the only time we had empty trees was when we compared a 
totally empty "root" tree against another tree, and then it was obvious. 
But what if the empty tree is a subdirectory of another tree - how do you 
express that in a diff? Do you care? Right now, since we always recurse 
into the tree (and then not find anything), empty trees will simply not 
show up _at_all_ in any diffs.

And what about usability issues elsewhere? With my patch, doing something 
like a

	git add directory/

still won't do anything, because the behaviour of "git add" has always 
been to recurse into directories. So to add a new empty directory, you'd 
have to do

	git update-index --add directory

and that's not exactly user-friendly.

So do you add a "-n" flag to "git add" to tell it to not recurse? Or do 
you always recurse, but then if you notice that the end result is empty, 
you add it as a directory?

All the logic for that whole directory lookup is in git/dir.c, and that 
code takes various flags because different programs want different things 
(show "ignored" files, or ignore them? Show empty directories or ignore 
them? etc).

So primarily, I think the job is:

 - thinking about the index, and the interactions when adding a directory 
   or adding files under a directory that already exists.

   I *think* we get all the corner cases right, because they should be 
   exactly the same as with subprojects, but hey, maybe there's some piece 
   that tests S_ISGITLINK() and now needs a S_ISDIR() test too..

 - adding test cases

 - thinking about the user interfaces for this, and adding code to handle 
   directories where needed (eg the above "git add" issue).

 - thinking about merges (which is largely about the index too, but is a 
   whole 'nother set of issues, with multiple stages in the same index at 
   the same time)

It might all be trivial. The directory traversal already knows that empty 
directories are special, so getting the right behaviour to "git add" may 
be really really easy. Or maybe it's not. I think a lot of it is just 
finding what needs to be done, seeign if we already do it, and if not, 
seeign how to do it. Boring test-cases, in other words.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  0:22                   ` Linus Torvalds
@ 2007-07-19  5:28                     ` Junio C Hamano
  2007-07-19  5:38                       ` Shawn O. Pearce
  2007-07-19  5:59                       ` David Kastrup
  0 siblings, 2 replies; 156+ messages in thread
From: Junio C Hamano @ 2007-07-19  5:28 UTC (permalink / raw
  To: Linus Torvalds
  Cc: David Kastrup, Matthieu Moy, Johannes Schindelin,
	Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 19 Jul 2007, David Kastrup wrote:
>> 
>> Well, kudos.  Together with the analysis from Junio, this seems like a
>> good start.  Would you have any recommendations about what stuff one
>> should really read in order to get up to scratch about git internals?
>
> Well, you do need to understand the index. That's where all the new 
> subtlety happens.
>
> The data structures themselves are trivial, and we've supported empty 
> trees (at the top level) from the beginning, so that part is not anything 
> new.
>
> However, now having a new entry type in the index (S_IFDIR) means that 
> anything that interacts with the index needs to think twice. But a lot of 
> that is just testing what happens, and so the first thing to do is to have 
> a test-suite.
>
> There's also the question about how to show an empty tree in a diff. We've 
> never had that: the only time we had empty trees was when we compared a 
> totally empty "root" tree against another tree, and then it was obvious. 
> But what if the empty tree is a subdirectory of another tree - how do you 
> express that in a diff? Do you care? Right now, since we always recurse 
> into the tree (and then not find anything), empty trees will simply not 
> show up _at_all_ in any diffs.
>
> And what about usability issues elsewhere? With my patch, doing something 
> like a
>
> 	git add directory/
>
> still won't do anything, because the behaviour of "git add" has always 
> been to recurse into directories. So to add a new empty directory, you'd 
> have to do
>
> 	git update-index --add directory
>
> and that's not exactly user-friendly.
>
> So do you add a "-n" flag to "git add" to tell it to not recurse? Or do 
> you always recurse, but then if you notice that the end result is empty, 
> you add it as a directory?

Another issue I thought about was what you would do in the step
3 in the following:

 1. David says "mkdir D; git add D"; you add S_IFDIR entry in
    the index at D;

 2. David says "date >D/F; git add D/F"; presumably you drop D
    from the index (to keep the index more backward compatible)
    and add S_IFREG entry at D/F.

 3. David says "git rm D/F".

Have we stopped keeping track of the "empty directory" at this
point?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  5:28                     ` Junio C Hamano
@ 2007-07-19  5:38                       ` Shawn O. Pearce
  2007-07-19  6:08                         ` David Kastrup
                                           ` (2 more replies)
  2007-07-19  5:59                       ` David Kastrup
  1 sibling, 3 replies; 156+ messages in thread
From: Shawn O. Pearce @ 2007-07-19  5:38 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Linus Torvalds, David Kastrup, Matthieu Moy, Johannes Schindelin,
	Git Mailing List

Junio C Hamano <gitster@pobox.com> wrote:
> Another issue I thought about was what you would do in the step
> 3 in the following:
> 
>  1. David says "mkdir D; git add D"; you add S_IFDIR entry in
>     the index at D;
> 
>  2. David says "date >D/F; git add D/F"; presumably you drop D
>     from the index (to keep the index more backward compatible)
>     and add S_IFREG entry at D/F.
> 
>  3. David says "git rm D/F".
> 
> Have we stopped keeping track of the "empty directory" at this
> point?

Sadly yes.  But I don't think that's what the folks who want to
track empty directories want to have happen here.

Which is why I'm thinking we just need to track the directory, as a
node in the index, even if there are files in it, and even if we got
that directory and its contained files there by just unpacking trees.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  5:28                     ` Junio C Hamano
  2007-07-19  5:38                       ` Shawn O. Pearce
@ 2007-07-19  5:59                       ` David Kastrup
  2007-07-19  9:54                         ` David Kastrup
  1 sibling, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-19  5:59 UTC (permalink / raw
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> Another issue I thought about was what you would do in the step
> 3 in the following:
>
>  1. David says "mkdir D; git add D"; you add S_IFDIR entry in
>     the index at D;
>
>  2. David says "date >D/F; git add D/F"; presumably you drop D
>     from the index (to keep the index more backward compatible)
>     and add S_IFREG entry at D/F.

I don't think that one should drop D here.  Operation 1 _is_ not
backward compatible, so if you want to revert it, you should
explicitly remove D.  And we can't "keep" the index backward
compatible if it isn't so after step 1.

>  3. David says "git rm D/F".
>
> Have we stopped keeping track of the "empty directory" at this
> point?

The case I am worrying about is rather

mkdir D
mkdir D/E
touch D/E/file
git add D
[*]
git rm D/E/file

>From a user perspective, E should be registered still.  Compare this
with

mkdir D
mkdir D/E
touch D/E/file
git add D/E/file
[*]
git rm D/E/file

Where likely both D and E should now be considered unregistered.  So
the situation is different between the first or the second [*], and
the difference might be impossible to express completely in the frame
of a backwards-compatible index, even though we don't track an empty
directory at the point [*] at all, and the only registered _file_ is
D/E/file.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  5:38                       ` Shawn O. Pearce
@ 2007-07-19  6:08                         ` David Kastrup
  2007-07-19  7:10                           ` Geoff Russell
  2007-07-19  6:09                         ` Shawn O. Pearce
       [not found]                         ` <9436820E-53D1-425D-922E-D4C76578E40A@silverinsanity.com>
  2 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-19  6:08 UTC (permalink / raw
  To: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Sadly yes.  But I don't think that's what the folks who want to
> track empty directories want to have happen here.
>
> Which is why I'm thinking we just need to track the directory, as a
> node in the index, even if there are files in it, and even if we got
> that directory and its contained files there by just unpacking
> trees.

I have come to about the same conclusion.  So if
backward-compatibility is any concern, one needs to work with some
sort of extension records, and designing them in a way that

new-git add tree
old-git rm tree

will not leave empty subdirectories in the index will be tricky, to
say the least.  One will likely have to add an extension record
"directory" for each directory as well as "my containing dir takes
care of itself" to each file that has been added with new-git and has
had its parent directory entered by other means.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  5:38                       ` Shawn O. Pearce
  2007-07-19  6:08                         ` David Kastrup
@ 2007-07-19  6:09                         ` Shawn O. Pearce
  2007-07-19  8:13                           ` Matthieu Moy
       [not found]                         ` <9436820E-53D1-425D-922E-D4C76578E40A@silverinsanity.com>
  2 siblings, 1 reply; 156+ messages in thread
From: Shawn O. Pearce @ 2007-07-19  6:09 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Linus Torvalds, David Kastrup, Matthieu Moy, Johannes Schindelin,
	Git Mailing List

"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
> > Another issue I thought about was what you would do in the step
> > 3 in the following:
> > 
> >  1. David says "mkdir D; git add D"; you add S_IFDIR entry in
> >     the index at D;
> > 
> >  2. David says "date >D/F; git add D/F"; presumably you drop D
> >     from the index (to keep the index more backward compatible)
> >     and add S_IFREG entry at D/F.
> > 
> >  3. David says "git rm D/F".
> > 
> > Have we stopped keeping track of the "empty directory" at this
> > point?
> 
> Sadly yes.  But I don't think that's what the folks who want to
> track empty directories want to have happen here.
> 
> Which is why I'm thinking we just need to track the directory, as a
> node in the index, even if there are files in it, and even if we got
> that directory and its contained files there by just unpacking trees.

I take this back.  I really don't want that behavior.

If I do:

  mkdir -p foo/bar
  echo hello >foo/bar/world
  git add foo
  git -f rm foo/bar/world

I never asked for foo/bar or foo to stay.  In fact I want them
to disappear from Git entirely, as foo/bar is now empty and has
no content.


But we also cannot do a special --mkdir option for update-index
either, because how do we know that the user designated subtree is
a directory we must always keep in the index?

So I think the only way this works is to have a new mode that we use
in tree (04755 ?) that tells us not only is this thing a subtree,
but also that the user wants it to stay here, even if it is empty.
Those trees are always in the index as a real tree entry, even if
there are files contained in it.

And as far as getting that directory entry created/removed from
the index, well, I think a special flag to update-index would be
in order, much like --chmod=[+-]x.

Just my $0.0002 USD, which really ain't worth much at all.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  6:08                         ` David Kastrup
@ 2007-07-19  7:10                           ` Geoff Russell
  0 siblings, 0 replies; 156+ messages in thread
From: Geoff Russell @ 2007-07-19  7:10 UTC (permalink / raw
  To: git

Dear gits,

When I first started using git, I naively did

           $ mkdir NEWDIR && chmod BLAH NEWDIR
           $ git add NEWDIR

I just expected that this was content in the current directory that I
wanted tracked
together with the permissions.

It wasn't ... I spent a day or 2 thinking I was stupid, my version of git was
corrupt, my machine was busted, .... etc.  Eventually of course, I read the
documentation (when all else fails) and realised that this perfectly obvious
behaviour was not supported.  The behaviour was obviously so obvious
that eventually
an error message was added telling all the people who hadn't
read the documentation that trying to add a directory was 'fatal'.

I put up with and work around this behaviour because git is so bloody
brilliant at everything else.  But it would be nice if it worked.

Cheers,
Geoff Russell

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  6:09                         ` Shawn O. Pearce
@ 2007-07-19  8:13                           ` Matthieu Moy
  2007-07-19 10:51                             ` Tomash Brechko
  0 siblings, 1 reply; 156+ messages in thread
From: Matthieu Moy @ 2007-07-19  8:13 UTC (permalink / raw
  To: Shawn O. Pearce
  Cc: Junio C Hamano, Linus Torvalds, David Kastrup,
	Johannes Schindelin, Git Mailing List

"Shawn O. Pearce" <spearce@spearce.org> writes:

> If I do:
>
>   mkdir -p foo/bar
>   echo hello >foo/bar/world
>   git add foo
>   git -f rm foo/bar/world
>
> I never asked for foo/bar or foo to stay.

Well, outside git, if you do

$ mkdir -p foo/bar
$ echo hello > foo/bar/world
$ rm -f foo/bar/world

You didn't ask foo/bar to stay either, and still, it's quite natural
to have it stay in your filesystem. So, the same way you'd have ran
"rm -r foo", it seems reasonable to me to ask for "git-rm -r foo" if
the user wants to get rid of foo/ itself.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  5:59                       ` David Kastrup
@ 2007-07-19  9:54                         ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-19  9:54 UTC (permalink / raw
  To: git

David Kastrup <dak@gnu.org> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Another issue I thought about was what you would do in the step
>> 3 in the following:
>>
>>  1. David says "mkdir D; git add D"; you add S_IFDIR entry in
>>     the index at D;
>>
>>  2. David says "date >D/F; git add D/F"; presumably you drop D
>>     from the index (to keep the index more backward compatible)
>>     and add S_IFREG entry at D/F.
>
> I don't think that one should drop D here.  Operation 1 _is_ not
> backward compatible, so if you want to revert it, you should
> explicitly remove D.  And we can't "keep" the index backward
> compatible if it isn't so after step 1.
>
>>  3. David says "git rm D/F".
>>
>> Have we stopped keeping track of the "empty directory" at this
>> point?
>
> The case I am worrying about is rather
>
> mkdir D
> mkdir D/E
> touch D/E/file
> git add D
> [*]
> git rm D/E/file
>
> From a user perspective, E should be registered still.  Compare this
> with
>
> mkdir D
> mkdir D/E
> touch D/E/file
> git add D/E/file
> [*]
> git rm D/E/file

Let's take this through the motions with my last proposal: at the
first [*], the index now contains

D/.        [dir]
D/E/.      [dir]
D/E/file   [file]

After git rm D/E/file, it contains

D/.        [dir]
D/E/.      [dir]

Compared with the second, where we just have in the index

D/E/file   [file]

and it is gone again after the remove.

After commiting in the first case, we have in the repository
D          [tree]
D/.        [dir]
D/E        [tree]
D/E/.      [dir]
D/E/file   [file]

Now we do
git rm D/E, and the index contains

D/E/.      [remove dir]
D/E/file   [remove file]

If we commit now,
D/E        [tree]
becomes empty and is removed.  All that stays is

D          [tree]
D/.        [dir]

So we still have [tree] items only in the repository, not in the
index, and there is no such thing as an empty tree.  But directories
have a presence in index and repository.  They are not containers of
files, that role is retained by trees.  Rather they are siblings of
the files in their associated tree.

As a note aside: if one wanted to track directory permissions, one
would track them in the [dir] entries, not in the [tree] entries.
Trees remain abstract structuring entities in the repository that
don't have an outside representation.  Directories will be
auto-created and deleted as necessary in the work directory to
facilitate having a place for checking tree elements out and in.

This means that
git add D/E/file
would _not_ track permissions of D and E (nor their existence).

However, Linus is right that permissions are something to be discussed
separately.  But separating [tree] and [dir] makes for a plausible and
understandable way of treating them.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19  8:13                           ` Matthieu Moy
@ 2007-07-19 10:51                             ` Tomash Brechko
  2007-07-19 11:31                               ` David Kastrup
  2007-07-19 12:16                               ` Johannes Schindelin
  0 siblings, 2 replies; 156+ messages in thread
From: Tomash Brechko @ 2007-07-19 10:51 UTC (permalink / raw
  To: Git Mailing List

Dear Git fellows,

A year or so ago I too would strongly advocate the need of tracking
empty directories, permissions et al., it seemed so "natural" and
"plain obvious" to me back then.  But since that time I learned to
appreciate the "contents tracking" approach, and now view directories
(paths in general) only as the means for Git to know where to put the
contents on checkout.  This, BTW, is consistent with how Git figures
container copies/renames.

No doubt mighty Git developers can add support for empty directories,
manage to stay backward compatible, think out consistent user
interface etc.  But there's no end to how much information one may
want to store in Git to make it "_file system_ contents tracking
software".  Starting with empty directories, one may argue then that
certain installation trees also need particular file ownership, so
lets store user/group names like tar does.  It was mentioned already
in this thread that in addition to 'rwx' we also would have to store
ACLs (some OSes have only one of these concepts, some both), SELinux
security contexts, perhaps other arbitrary file attributes that may be
part of file system state.

Wouldn't it be better to preserve Git as a contents tracking system,
and add some tools on top of it that can translate file system state
into textual (or binary) form, so it can be stored in current Git?
And then use this textual representation to restore actual file system
attributes/layout on checkout?  And the only change in Git itself
would be some more hooks, for instance one hook before checking out
over the old work tree, and one after the checkout.  Or one can simply
wrap certain Git commands to implement such hooks.

In any case, no one is going to be against the new feature if it won't
break anything for those of us who find the pure contents tracking the
right thing.  And storing empty directories by default may not be
natural for everyone.  So before going into technical details of how
this can possibly be implemented, could someone answer the following:

1 Is Git going to track directories _always_?  Looks like not, because
  in this thread there seems to be a distinction between 'git add DIR'
  and 'git add DIR/FILE', i.e. not everyone is sure if in the last
  case Git should track DIR or not.

2 If Git will track only explicitly mentioned directories, then what
  about recursive operations?  Will it add only files by default, or
  directories too?  Perhaps there will be some --add-dirs option to
  'git add'.

3 Since in certain recursive operations one will want to affect
  directories too, how .gitignore will look?  Most files have a notion
  of extension, so me may say '*.o', but with directories things a bit
  more complicated.  One would want to say "exclude DIR2 only if under
  DIR1 at any hierarchy depth", i.e. exclude paths matching
  qr%DIR1/(.+/)?DIR1/%, and shell wildcards aren't that expressive,
  '*' doesn't cross hierarchy.  Note that we live without this now,
  but this will be the next "natural" demand once directories become
  first class citizens.


This list is surely incomplete.  The point is that before we go into
technical details, let's consider what exactly we are going to
implement, how this will affect current usage model, how (empty)
directory handling will extend to future similar demands, etc.  My
fear is that once some patch is around, it's very tempting to accept
it.  And once it is in, it's almost impossible to remove the feature
later.


Regards,

-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 10:51                             ` Tomash Brechko
@ 2007-07-19 11:31                               ` David Kastrup
  2007-07-19 12:32                                 ` Tomash Brechko
  2007-07-19 12:38                                 ` David Kastrup
  2007-07-19 12:16                               ` Johannes Schindelin
  1 sibling, 2 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-19 11:31 UTC (permalink / raw
  To: git

Tomash Brechko <tomash.brechko@gmail.com> writes:

> Dear Git fellows,
>
> A year or so ago I too would strongly advocate the need of tracking
> empty directories, permissions et al., it seemed so "natural" and
> "plain obvious" to me back then.  But since that time I learned to
> appreciate the "contents tracking" approach, and now view
> directories (paths in general) only as the means for Git to know
> where to put the contents on checkout.  This, BTW, is consistent
> with how Git figures container copies/renames.

I'll answer to this based on my proposal of adding "A/B/. [dir]" as a
separate entity to index and repository, keeping "[tree]" out of
indices, and don't allow an empty "[tree]" into repositories.

This is a very natural abstraction.

> But there's no end to how much information one may want to store in
> Git to make it "_file system_ contents tracking software".  Starting
> with empty directories, one may argue then that certain installation
> trees also need particular file ownership, so lets store user/group
> names like tar does.  It was mentioned already in this thread that
> in addition to 'rwx' we also would have to store ACLs (some OSes
> have only one of these concepts, some both), SELinux security
> contexts, perhaps other arbitrary file attributes that may be part
> of file system state.

A [dir] entry may be eventually be made to track any of this, like a
[file] entry could.  If one wished to do this.

> Wouldn't it be better to preserve Git as a contents tracking system,
> and add some tools on top of it that can translate file system state
> into textual (or binary) form, so it can be stored in current Git?
> And then use this textual representation to restore actual file
> system attributes/layout on checkout?  And the only change in Git
> itself would be some more hooks, for instance one hook before
> checking out over the old work tree, and one after the checkout.  Or
> one can simply wrap certain Git commands to implement such hooks.

This is not good since "tracking" means "tracking".  With your model,
the metainformation would be dissociated from the information.
Renames and moves would make ground beef of the metadata.

> In any case, no one is going to be against the new feature if it
> won't break anything for those of us who find the pure contents
> tracking the right thing.

My proposal would allow setting an option to track or not track
directories implicitly by default.

> And storing empty directories by default may not be natural for
> everyone.  So before going into technical details of how this can
> possibly be implemented, could someone answer the following:

I'll answer assuming the proposed model.

> 1 Is Git going to track directories _always_?  Looks like not, because
>   in this thread there seems to be a distinction between 'git add DIR'
>   and 'git add DIR/FILE', i.e. not everyone is sure if in the last
>   case Git should track DIR or not.

Let's have a variable

core.adddirs

If you set core.adddirs to false, git will not enter directories into
the index for addition.  Consequently, they will not end up in the
repository.  If you git-rm a directory, the index will contain a
notice to delete the directory along with deletion notices for all
registered other elements of the directory.  Committing this means
that the directory will no longer be separately controlled by git,
even if for some reason the repository has other files remaining in
the tree.

Something like the Linux kernel repository which may be accessed by
ancient git versions would naturally contain "core.adddirs: false" in
its default configuration file, and this would be passed around when
cloning.  So directory elements would stay out of it.

> 2 If Git will track only explicitly mentioned directories, then what
>   about recursive operations?  Will it add only files by default, or
>   directories too?  Perhaps there will be some --add-dirs option to
>   'git add'.

There could be a commandline override for "core.adddirs".

> 3 Since in certain recursive operations one will want to affect
>   directories too, how .gitignore will look?  Most files have a notion
>   of extension, so me may say '*.o', but with directories things a bit
>   more complicated.  One would want to say "exclude DIR2 only if under
>   DIR1 at any hierarchy depth", i.e. exclude paths matching
>   qr%DIR1/(.+/)?DIR1/%, and shell wildcards aren't that expressive,
>   '*' doesn't cross hierarchy.  Note that we live without this now,
>   but this will be the next "natural" demand once directories become
>   first class citizens.

Huh?  I don't get this.  It's like "we can't allow people to buy
chocolate, or they'll demand next to have nuclear weapons delivered at
their house".  Deal with the demands as they come up.  If a directory
has a tree-local name ".", it can be dealt with in patterns if really
needed.  I don't see much of a necessity however.

Although it would be natural to have
core.adddirs: false
be equivalent to
core.excludefile: .

And so it might be possible to actually not need a separate
core.adddirs option at all, technically.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 10:51                             ` Tomash Brechko
  2007-07-19 11:31                               ` David Kastrup
@ 2007-07-19 12:16                               ` Johannes Schindelin
  2007-07-19 12:24                                 ` David Kastrup
  1 sibling, 1 reply; 156+ messages in thread
From: Johannes Schindelin @ 2007-07-19 12:16 UTC (permalink / raw
  To: Tomash Brechko; +Cc: Git Mailing List

Hi,

On Thu, 19 Jul 2007, Tomash Brechko wrote:

> A year or so ago I too would strongly advocate the need of tracking 
> empty directories, permissions et al., it seemed so "natural" and "plain 
> obvious" to me back then.  But since that time I learned to appreciate 
> the "contents tracking" approach, and now view directories (paths in 
> general) only as the means for Git to know where to put the contents on 
> checkout.  This, BTW, is consistent with how Git figures container 
> copies/renames.

Thank you.  It is my impression, too, that after a while it becomes 
obvious what is good and what is not.

FWIW I just whipped up a proof-of-concept patch (so at least _I_ cannot be 
accused of chickening out of writing code):

This adds the command line option "--add-empty-dirs" to "git add", which 
does the only sane thing: putting a placeholder into that directory, and 
adding that.  Since ".gitignore" is already a reserved file name in git, 
it is used as the name of this place holder.

---

	It is probably not fool-proof yet, needs documentation and a test 
	case.  But I am really sick and tired of this discussion.

 builtin-add.c |   25 +++++++++++++++++++++----
 dir.c         |   16 +++++++++++++++-
 dir.h         |    3 ++-
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 7345479..1294840 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -47,7 +47,7 @@ static void prune_directory(struct dir_struct *dir, const char **pathspec, int p
 }
 
 static void fill_directory(struct dir_struct *dir, const char **pathspec,
-		int ignored_too)
+		int ignored_too, int substitute_empty_dirs)
 {
 	const char *path, *base;
 	int baselen;
@@ -63,6 +63,7 @@ static void fill_directory(struct dir_struct *dir, const char **pathspec,
 		if (!access(excludes_file, R_OK))
 			add_excludes_from_file(dir, excludes_file);
 	}
+	dir->substitute_empty_directories = substitute_empty_dirs;
 
 	/*
 	 * Calculate common prefix for the pathspec, and
@@ -143,7 +144,8 @@ static const char ignore_warning[] =
 int cmd_add(int argc, const char **argv, const char *prefix)
 {
 	int i, newfd;
-	int verbose = 0, show_only = 0, ignored_too = 0;
+	int verbose = 0, show_only = 0, ignored_too = 0,
+		substitute_empty_dirs = 0;
 	const char **pathspec;
 	struct dir_struct dir;
 	int add_interactive = 0;
@@ -191,6 +193,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 			take_worktree_changes = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--add-empty-dirs")) {
+			substitute_empty_dirs = 1;
+			continue;
+		}
 		usage(builtin_add_usage);
 	}
 
@@ -206,7 +212,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	}
 	pathspec = get_pathspec(prefix, argv + i);
 
-	fill_directory(&dir, pathspec, ignored_too);
+	fill_directory(&dir, pathspec, ignored_too, substitute_empty_dirs);
 
 	if (show_only) {
 		const char *sep = "", *eof = "";
@@ -231,8 +237,19 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		exit(1);
 	}
 
-	for (i = 0; i < dir.nr; i++)
+	for (i = 0; i < dir.nr; i++) {
+		const char *name = dir.entries[i]->name;
+		const char *slash;
+		if (substitute_empty_dirs && (slash = strrchr(name, '/')) &&
+				!strcmp(slash, "/.gitignore") &&
+				access(name, R_OK)) {
+			int fd = open(name, O_WRONLY | O_CREAT | O_EXCL, 0666);
+			if (fd < 0)
+				return error("Could not create %s", name);
+			close(fd);
+		}
 		add_file_to_cache(dir.entries[i]->name, verbose);
+	}
 
  finish:
 	if (active_cache_changed) {
diff --git a/dir.c b/dir.c
index 8d8faf5..b0b4628 100644
--- a/dir.c
+++ b/dir.c
@@ -456,11 +456,11 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co
 {
 	DIR *fdir = opendir(path);
 	int contents = 0;
+	char fullname[PATH_MAX + 1];
 
 	if (fdir) {
 		int exclude_stk;
 		struct dirent *de;
-		char fullname[PATH_MAX + 1];
 		memcpy(fullname, base, baselen);
 
 		exclude_stk = push_exclude_per_directory(dir, base, baselen);
@@ -536,6 +536,20 @@ exit_early:
 		pop_exclude_per_directory(dir, exclude_stk);
 	}
 
+	if (!contents && dir->substitute_empty_directories) {
+		const char *name = ".gitignore";
+		int len = strlen(name);
+		/* Ignore overly long pathnames! */
+		if (len + baselen + 8 > sizeof(fullname))
+			return 0;
+		memcpy(fullname + baselen, name, len+1);
+		if (simplify_away(fullname, baselen + len, simplify)
+				|| excluded(dir, fullname))
+			return 0;
+		dir_add_name(dir, fullname, baselen + len);
+		return 1;
+	}
+
 	return contents;
 }
 
diff --git a/dir.h b/dir.h
index ec0e8ab..0099718 100644
--- a/dir.h
+++ b/dir.h
@@ -34,7 +34,8 @@ struct dir_struct {
 		     show_other_directories:1,
 		     hide_empty_directories:1,
 		     no_gitlinks:1,
-		     collect_ignored:1;
+		     collect_ignored:1,
+		     substitute_empty_directories:1;
 	struct dir_entry **entries;
 	struct dir_entry **ignored;
 

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 12:16                               ` Johannes Schindelin
@ 2007-07-19 12:24                                 ` David Kastrup
  2007-07-19 14:44                                   ` Brian Gernhardt
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-19 12:24 UTC (permalink / raw
  To: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Thu, 19 Jul 2007, Tomash Brechko wrote:
>
>> A year or so ago I too would strongly advocate the need of tracking 
>> empty directories, permissions et al., it seemed so "natural" and "plain 
>> obvious" to me back then.  But since that time I learned to appreciate 
>> the "contents tracking" approach, and now view directories (paths in 
>> general) only as the means for Git to know where to put the contents on 
>> checkout.  This, BTW, is consistent with how Git figures container 
>> copies/renames.
>
> Thank you.  It is my impression, too, that after a while it becomes 
> obvious what is good and what is not.
>
> FWIW I just whipped up a proof-of-concept patch (so at least _I_ cannot be 
> accused of chickening out of writing code):
>
> This adds the command line option "--add-empty-dirs" to "git add", which 
> does the only sane thing: putting a placeholder into that directory, and 
> adding that.  Since ".gitignore" is already a reserved file name in git, 
> it is used as the name of this place holder.

But that means that checkout will create a file .gitignore in
previously empty directories, doesn't it?

I think that the placeholder name should rather be ".".

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 11:31                               ` David Kastrup
@ 2007-07-19 12:32                                 ` Tomash Brechko
  2007-07-19 12:46                                   ` David Kastrup
  2007-07-19 12:38                                 ` David Kastrup
  1 sibling, 1 reply; 156+ messages in thread
From: Tomash Brechko @ 2007-07-19 12:32 UTC (permalink / raw
  To: git

Hi David,

On Thu, Jul 19, 2007 at 13:31:50 +0200, David Kastrup wrote:
> core.excludefile: .

Really nice idea to give directories 'DIR/.' name.  I'm sure there are
several other ways to implement your proposal.  But why to put in in
Git itself?  Decomposition and abstraction principle tells me that
this should go to some other place.

Please consider this: I myself use Git to track my own local projects,
and for this usage you proposal have no value for me, i.e. as a
_Source_ Code Management system Git is rather complete.  But I also
track /etc and ~/ in Git, and for this I'd love to have directories,
permissions, ownership, other attributes, to be tracked.  I have Perl
script wrapping Git that allows me to filter tracked paths by full
regexps instead of Git's file globs, and also to filter out too big
files assuming that they are binary anyway.  Most people solving the
same problem moved further and implemented tools to store part of file
system state (permissions and ownership) in a textual representation,
to track that in Git.  I'm sure you've seen such posts in the list.
And my point is that rather than building the support for all of it
into core Git, and then implementing sophisticated configuration to
disable parts of it, wouldn't it be better to have a separate tools
orthogonal to Git itself?

At the extreme case (probably not really seriously), consider the
following design: there are two layers, file system layer, and
contents layer.  On checkout file system layer creates (or examines
existing) directory tree along with all files and their file system
state (permissions, ownership, ACLs, attributes, ...), and then asks
contents layer to update the contents.  This way layers are
independent, and file system layer may be implemented on top of pure
contents tracking.  File system layer may be extended to be made
particular OS/FS dependent if some development team wishes so.  Even
hard links may be supported: since file system layer may deside to
remember that two paths really reference the same inode
(i.e. contents), contents layer may be asked to update the data only
once with either file name/descriptor.

This, BTW, is why I think not tracking file attributes when
versioning, say, /etc, is not a big loss.  When I will move to the new
system, I will mostly be interested in contents diffs of the same
configuration files in /etc.  I will trust their new attributes, and
will not want to restore them to what they were on the old system.

So the essence of my objection is that we should not pollute core Git
with file system state tracking more than it's required to know where
to put the contents to.  Everything else should go elsewhere.

Again, I'd love to have your proposal be implemented, but only in a
way that won't interfere with pure SCM's operations.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 11:31                               ` David Kastrup
  2007-07-19 12:32                                 ` Tomash Brechko
@ 2007-07-19 12:38                                 ` David Kastrup
  2007-07-19 13:21                                   ` David Kastrup
  1 sibling, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-19 12:38 UTC (permalink / raw
  To: git

David Kastrup <dak@gnu.org> writes:

> Although it would be natural to have
> core.adddirs: false
> be equivalent to
> core.excludefile: .
>
> And so it might be possible to actually not need a separate
> core.adddirs option at all, technically.

To followup on myself here:

A project such as the linux kernel which presumably does not want to
have directories tracked will put the single pattern
.
into its top-level .gitignore file.  That is all.  At least if it does
not confuse current versions of git to do ugly things.

A separate option core.adddirs is still necessary because
man gitignore
states:

       When deciding whether to ignore a path, git normally  checks  gitignore
       patterns from multiple sources, with the following order of precedence:

       ·  Patterns read from the file specified by the configuration  variable
          core.excludesfile.

       ·  Patterns read from $GIT_DIR/info/exclude.

       ·  Patterns  read  from  a .gitignore file in the same directory as the
          path, or in any parent directory, ordered from the deepest such file
          to  a file in the root of the repository. These patterns match rela‐
          tive to the location of the  .gitignore  file.  A  project  normally
          includes  such  .gitignore  files in its repository, containing pat‐
          terns for files generated as part of the project build.

The priority for "core.adddirs", however, should be below that so that
preferences set in the repository's .gitignore files take precedence.
So core.excludesfile seems to be the wrong place.

A project with the policy of always tracking directories would place
!.
into its top-level .gitignore file.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 12:32                                 ` Tomash Brechko
@ 2007-07-19 12:46                                   ` David Kastrup
  2007-07-23 20:18                                     ` Nix
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-19 12:46 UTC (permalink / raw
  To: git

Tomash Brechko <tomash.brechko@gmail.com> writes:

> Hi David,
>
> On Thu, Jul 19, 2007 at 13:31:50 +0200, David Kastrup wrote:
>> core.excludefile: .
>
> Really nice idea to give directories 'DIR/.' name.  I'm sure there are
> several other ways to implement your proposal.  But why to put in in
> Git itself?  Decomposition and abstraction principle tells me that
> this should go to some other place.

Because of a fundamental law of computation: information maintained in
two separate places will get out of synch eventually.

> Please consider this: I myself use Git to track my own local
> projects, and for this usage you proposal have no value for me,
> i.e. as a _Source_ Code Management system Git is rather complete.
> But I also track /etc and ~/ in Git, and for this I'd love to have
> directories, permissions, ownership, other attributes, to be
> tracked.  I have Perl script wrapping Git that allows me to filter
> tracked paths by full regexps instead of Git's file globs, and also
> to filter out too big files assuming that they are binary anyway.

Look, git _tracks_ contents.  Your permissions managements needs to be
told explicitly when and how things change.  So you end up with git
_tracking_ material and your permissions/directory management needing
the level of manual handholding Subversion demands.

> And my point is that rather than building the support for all of it
> into core Git, and then implementing sophisticated configuration to
> disable parts of it, wouldn't it be better to have a separate tools
> orthogonal to Git itself?

And my personal answer to that is "no".  We don't want orthogonality
for intimately related things, because it forces us to work the
"orthogonal" things in lockstep.  And if you force git to operate in
lockstep with manual explicit tracking, then git becomes useless for
tracking stuff automatically.

> So the essence of my objection is that we should not pollute core
> Git with file system state tracking more than it's required to know
> where to put the contents to.  Everything else should go elsewhere.
>
> Again, I'd love to have your proposal be implemented, but only in a
> way that won't interfere with pure SCM's operations.

Tell git to ignore "." and it won't "interfere".

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 12:38                                 ` David Kastrup
@ 2007-07-19 13:21                                   ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-19 13:21 UTC (permalink / raw
  To: git

David Kastrup <dak@gnu.org> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> Although it would be natural to have
>> core.adddirs: false
>> be equivalent to
>> core.excludefile: .
>>
>> And so it might be possible to actually not need a separate
>> core.adddirs option at all, technically.
>
> To followup on myself here:
>
> A project such as the linux kernel which presumably does not want to
> have directories tracked will put the single pattern
> .
> into its top-level .gitignore file.  That is all.  At least if it does
> not confuse current versions of git to do ugly things.

Another followup: it doesn't.  I placed a single line
.
into a .gitignore file.  This did not cause git to ignore the contents
of ., and even
git-add .
worked as previously, namely adding the contents of the current
directory and subdirectories to the index.

In short: the gitignore idea for policing directory management is
perfectly upwards-compatible with current versions of git.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 12:24                                 ` David Kastrup
@ 2007-07-19 14:44                                   ` Brian Gernhardt
  2007-07-19 15:43                                     ` Johannes Schindelin
  0 siblings, 1 reply; 156+ messages in thread
From: Brian Gernhardt @ 2007-07-19 14:44 UTC (permalink / raw
  To: David Kastrup; +Cc: git


On Jul 19, 2007, at 8:24 AM, David Kastrup wrote:

> I think that the placeholder name should rather be ".".

For what it's worth, the more this gets discussed, the more I think  
your idea is a good one.

~~ Brian

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
       [not found]                           ` <863azk78yp.fsf@lola.quinscape.zz>
@ 2007-07-19 15:08                             ` Brian Gernhardt
  2007-07-19 15:27                               ` David Kastrup
  2007-07-20  0:01                               ` Junio C Hamano
  0 siblings, 2 replies; 156+ messages in thread
From: Brian Gernhardt @ 2007-07-19 15:08 UTC (permalink / raw
  To: David Kastrup
  Cc: Shawn O.Pearce, Junio C Hamano, Linus Torvalds, Matthieu Moy,
	Johannes Schindelin, Git Mailing List


On Jul 19, 2007, at 10:40 AM, David Kastrup wrote:

> Have you synched with the current state of my proposals posted to the
> mailing list before posting this note?  Perhaps your concerns have
> already been addressed in them.

Mail.app split the thread into two or three pieces.  I wrote this  
after reading the first part, but had missed the rest.  I very much  
like the proposals of separating trees from directories and the "."  
entries.

My apologies for the wasted bandwidth arguing for things that had  
already been decided.

~~ Brian

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 15:08                             ` Brian Gernhardt
@ 2007-07-19 15:27                               ` David Kastrup
  2007-07-19 15:50                                 ` Brian Gernhardt
  2007-07-20  0:01                               ` Junio C Hamano
  1 sibling, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-19 15:27 UTC (permalink / raw
  To: git

Brian Gernhardt <benji@silverinsanity.com> writes:

> On Jul 19, 2007, at 10:40 AM, David Kastrup wrote:
>
>> Have you synched with the current state of my proposals posted to the
>> mailing list before posting this note?  Perhaps your concerns have
>> already been addressed in them.
>
> Mail.app split the thread into two or three pieces.  I wrote this
> after reading the first part, but had missed the rest.  I very much
> like the proposals of separating trees from directories and the "."
> entries.
>
> My apologies for the wasted bandwidth arguing for things that had
> already been decided.

"decided"!  Now that's a strong word for my wild brainstorming if I
ever heard one, in particular considering my well-near non-existent
record of contributions and popularity here: most of the recent
"discussion" has been me following up on myself.

Anyway, thanks for the heads-up: very much appreciated.  I'll probably
badly need it when people in Pacific Standard Time get to work again
and tear me to pieces.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 14:44                                   ` Brian Gernhardt
@ 2007-07-19 15:43                                     ` Johannes Schindelin
  2007-07-19 16:06                                       ` Brian Gernhardt
                                                         ` (2 more replies)
  0 siblings, 3 replies; 156+ messages in thread
From: Johannes Schindelin @ 2007-07-19 15:43 UTC (permalink / raw
  To: Brian Gernhardt; +Cc: David Kastrup, git

Hi,

On Thu, 19 Jul 2007, Brian Gernhardt wrote:

> 
> On Jul 19, 2007, at 8:24 AM, David Kastrup wrote:
> 
> > I think that the placeholder name should rather be ".".
> 
> For what it's worth, the more this gets discussed, the more I think your 
> idea is a good one.

I do not like it at all. "." already has a very special meaning.  It is a 
_directory_, no place holder.

More and more I get the impression that this thread is just not worth it.  
The problem was solved long ago, and all that is talked about here is how 
to complicate things.

Unhappy,
Dscho

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 15:27                               ` David Kastrup
@ 2007-07-19 15:50                                 ` Brian Gernhardt
  0 siblings, 0 replies; 156+ messages in thread
From: Brian Gernhardt @ 2007-07-19 15:50 UTC (permalink / raw
  To: David Kastrup; +Cc: git


On Jul 19, 2007, at 11:27 AM, David Kastrup wrote:

> Brian Gernhardt <benji@silverinsanity.com> writes:
>
>> My apologies for the wasted bandwidth arguing for things that had
>> already been decided.
>
> "decided"!  Now that's a strong word for my wild brainstorming if I
> ever heard one, in particular considering my well-near non-existent
> record of contributions and popularity here: most of the recent
> "discussion" has been me following up on myself.

Meh.  I suppose I meant "talked about" or "brought up" here.  Trying  
to be quick and terse, and ended up losing meaning like usual.

~~ Brian

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 15:43                                     ` Johannes Schindelin
@ 2007-07-19 16:06                                       ` Brian Gernhardt
  2007-07-19 16:17                                         ` Johannes Schindelin
  2007-07-19 16:17                                       ` Matthieu Moy
  2007-07-19 16:21                                       ` David Kastrup
  2 siblings, 1 reply; 156+ messages in thread
From: Brian Gernhardt @ 2007-07-19 16:06 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: David Kastrup, git


On Jul 19, 2007, at 11:43 AM, Johannes Schindelin wrote:

> I do not like it at all. "." already has a very special meaning.   
> It is a
> _directory_, no place holder.

And we're talking about using it to describe the directory.

> More and more I get the impression that this thread is just not  
> worth it.
> The problem was solved long ago, and all that is talked about here  
> is how
> to complicate things.

By solved, you mean ignored?  There is no reason for git not to track  
empty directories other than "we don't like it".

Some projects I work on require certain directories to exist in order  
to run properly, but tend to occasionally do things like delete all  
files in this required directory.  So far, it hasn't been an issue  
because I'm working solo and using git just to bar against  
stupidity.  Git's policy of "don't touch things I don't know about"  
works.  But if I ever had to have someone clone it, they'd need to re- 
create the directories.  In this case, empty directories are part of  
the content I care about.  Yes, I could have a script do it, but  
that's a work around, not a solution.

In another case, I'm using creating a git repository out of source  
that is distributed as occasional tarballs with patches in between.   
Git's lack of ability to track the empty directories means that I can  
NOT re-create appropriate tarballs for the states distributed only as  
patches.  Yes, I could add placeholder files, but then the state is  
not identical.

There are use cases for tracking directories.  I'll agree that it  
shouldn't be used for every source tree.  But there are cases where  
it is useful and there's no reason to simply forbid it.

~~ Brian

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 16:06                                       ` Brian Gernhardt
@ 2007-07-19 16:17                                         ` Johannes Schindelin
  2007-07-19 16:28                                           ` David Kastrup
  2007-07-19 16:34                                           ` Brian Gernhardt
  0 siblings, 2 replies; 156+ messages in thread
From: Johannes Schindelin @ 2007-07-19 16:17 UTC (permalink / raw
  To: Brian Gernhardt; +Cc: git

Hi,

On Thu, 19 Jul 2007, Brian Gernhardt wrote:

> On Jul 19, 2007, at 11:43 AM, Johannes Schindelin wrote:
> 
> > I do not like it at all. "." already has a very special meaning.  It 
> > is a _directory_, no place holder.
> 
> And we're talking about using it to describe the directory.
> 
> > More and more I get the impression that this thread is just not worth 
> > it. The problem was solved long ago, and all that is talked about here 
> > is how to complicate things.
> 
> By solved, you mean ignored?  There is no reason for git not to track 
> empty directories other than "we don't like it".

No, no, no, no, no!

You are really trying to annoy me, right?

Here a short description, which you should read until you understand it 
and then leave me alone:

To add a directory to the tracked content, you have to _mark_ it as 
tracked.  So that when you remove the _real_ content of the directory, Git 
will not remove it.

Alas, we already have such a marker.  It is called ".gitignore", and has 
been ignored by _you_.  There is _nothing_ wrong, from a technical 
standpoint, to call this marker ".gitignore", and it is _also_ not wrong 
to put this marker into the file system _in addition_ to the index.

So go and add your directories via that marker, and _be done with it_.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 15:43                                     ` Johannes Schindelin
  2007-07-19 16:06                                       ` Brian Gernhardt
@ 2007-07-19 16:17                                       ` Matthieu Moy
  2007-07-19 16:21                                       ` David Kastrup
  2 siblings, 0 replies; 156+ messages in thread
From: Matthieu Moy @ 2007-07-19 16:17 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Brian Gernhardt, David Kastrup, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> More and more I get the impression that this thread is just not worth it.  
> The problem was solved long ago, and all that is talked about here is how 
> to complicate things.

The problem was not _solved_, it was _worked around_.

Adding a .gitignore or whatever other file to mean "the directory
exists" is clearly a good workaround, but still, you have to use
"git-add $dir/.gitignore" where you really _mean_ "git-add $dir/". I
can see no reason for the presence of this .gitignore file other than
"err, I've put it here because git doesn't manage empty directories".

The fact that you need a FAQ entry for that actually shows there is a
problem. You don't have a FAQ for "Q: How to I add a file? A: Use
git-add file", you shouldn't need a FAQ for "How do I add a
directory", it should just work as expected.

You claim it "solves" the problem, but have you ever used an importer
like git-svn on a project that uses empty directories as placeholders
(I do have this problem in daily life because my colleagues still use
SVN)? What is the meaning of this .gitignore file the day you export
it to anything outside git?

If you ignore problems because they have a workaround, then even CVS
can be usable. People have been working around CVS's problems for
years, and many people are happy with CVS because they didn't realise
that solving problems is better than working around them (See the
OpenCVS project ...). Fortunately, git doesn't have as many problems
to work around as CVS ;-).

I'm happy with the answer "it should be done, but not by me, send a
patch", and I can't really complain myself since I did not send a
patch, but here, you're complaining about someone who actually starts
volunteering to solve the problem, which I can't agree with.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 15:43                                     ` Johannes Schindelin
  2007-07-19 16:06                                       ` Brian Gernhardt
  2007-07-19 16:17                                       ` Matthieu Moy
@ 2007-07-19 16:21                                       ` David Kastrup
  2 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-19 16:21 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Thu, 19 Jul 2007, Brian Gernhardt wrote:
>
>> 
>> On Jul 19, 2007, at 8:24 AM, David Kastrup wrote:
>> 
>> > I think that the placeholder name should rather be ".".
>> 
>> For what it's worth, the more this gets discussed, the more I think your 
>> idea is a good one.
>
> I do not like it at all. "." already has a very special meaning.  It is a 
> _directory_, no place holder.

And this is what it will be under my scheme: a directory.  It is just
that "directory" is differentiated from a "tree".  Both are tracked in
the repository (directory tracking is optional), and there is no such
thing as an empty tree, a tree being defined by its contents and
nothing else, as previously.  A "directory" has no contents, but only
existence in index and repository.  A "tree" only exists in the
repository, not in index or work directory.  It is mapped to physical
directories in the work directory.  If no corresponding "directory"
exists in index and/or repository, the work directories are created
and deleted on the fly as before in order to represent the state of
the "tree" in the repository.  So here are the concepts:

entity     working directory        index           repository
--------------------------------------------------------------
file       mapped to files          file            [blob]
dir        mapped to dir existence  dir             [dir]
tree       mapped to dir tree       unrepresented   [tree] (non-empty container)

> More and more I get the impression that this thread is just not
> worth it.  The problem was solved long ago, and all that is talked
> about here is how to complicate things.

I disagree on both accounts: that the problem has been solved (the
existence of a workaround involving constant manual intervention is
not a solution for me), and that my proposal will constitute a
complication to the user.

For projects setting a "." into the top level .gitignore, nothing at
all will change, even when "core.adddirs: true" will become the
default at some point of time.  Once this is the default, new users
with new projects will not notice anything surprising, at least until
the time that they pull from somebody with a repository with different
non-explicit conventions.

This is something which may still require thought in order to result
in the least complicated handling of cooperation.  But with regard to
the internals itself, I don't see that there is too much non-obvious
complexity involved here, and the framework appears very consistent,
logical, and compatible with git's ideas to me.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 16:17                                         ` Johannes Schindelin
@ 2007-07-19 16:28                                           ` David Kastrup
  2007-07-19 16:34                                           ` Brian Gernhardt
  1 sibling, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-19 16:28 UTC (permalink / raw
  To: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Here a short description, which you should read until you understand
> it and then leave me alone:
>
> To add a directory to the tracked content, you have to _mark_ it as
> tracked.  So that when you remove the _real_ content of the
> directory, Git will not remove it.

Correct.  That is what my proposal is about.

> Alas, we already have such a marker.  It is called ".gitignore", and
> has been ignored by _you_.  There is _nothing_ wrong, from a
> technical standpoint, to call this marker ".gitignore", and it is
> _also_ not wrong to put this marker into the file system _in
> addition_ to the index.

Uh, then the directories are no longer empty.

> So go and add your directories via that marker, and _be done with
> it_.

But one is not done before running

find -name .gitignore -delete

and then the next recursive add will remove the .gitignore "markers".
The idea of "." is to have a marker that does _not_ appear in the work
directory.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 16:17                                         ` Johannes Schindelin
  2007-07-19 16:28                                           ` David Kastrup
@ 2007-07-19 16:34                                           ` Brian Gernhardt
  2007-07-19 17:30                                             ` Johannes Schindelin
       [not found]                                             ` <Pine.LNX.4.64.070719 1829530.14781@racer.site>
  1 sibling, 2 replies; 156+ messages in thread
From: Brian Gernhardt @ 2007-07-19 16:34 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git


On Jul 19, 2007, at 12:17 PM, Johannes Schindelin wrote:

> Alas, we already have such a marker.  It is called ".gitignore",  
> and has
> been ignored by _you_.  There is _nothing_ wrong, from a technical
> standpoint, to call this marker ".gitignore", and it is _also_ not  
> wrong
> to put this marker into the file system _in addition_ to the index.
>
> So go and add your directories via that marker, and _be done with it_.

But this alters the content of the directory away from what I want it  
to be, namely empty.  You aren't addressing the concept of tracking  
an empty directory, you're just saying you won't do it.

~~ Brian

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 16:34                                           ` Brian Gernhardt
@ 2007-07-19 17:30                                             ` Johannes Schindelin
       [not found]                                             ` <Pine.LNX.4.64.070719 1829530.14781@racer.site>
  1 sibling, 0 replies; 156+ messages in thread
From: Johannes Schindelin @ 2007-07-19 17:30 UTC (permalink / raw
  To: Brian Gernhardt; +Cc: git

Hi,

On Thu, 19 Jul 2007, Brian Gernhardt wrote:

> On Jul 19, 2007, at 12:17 PM, Johannes Schindelin wrote:
> 
> > Alas, we already have such a marker.  It is called ".gitignore", and has
> > been ignored by _you_.  There is _nothing_ wrong, from a technical
> > standpoint, to call this marker ".gitignore", and it is _also_ not wrong
> > to put this marker into the file system _in addition_ to the index.
> > 
> > So go and add your directories via that marker, and _be done with it_.
> 
> But this alters the content of the directory away from what I want it to be,
> namely empty.  You aren't addressing the concept of tracking an empty
> directory, you're just saying you won't do it.

OMG last time I checked, my _empty_ directory contained "." and "..".  
What do I do now?

Really,
Dscho

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
       [not found]                                             ` <Pine.LNX.4.64.070719 1829530.14781@racer.site>
@ 2007-07-19 17:47                                               ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-19 17:47 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Thu, 19 Jul 2007, Brian Gernhardt wrote:
>
>> On Jul 19, 2007, at 12:17 PM, Johannes Schindelin wrote:
>> 
>> > Alas, we already have such a marker.  It is called ".gitignore", and has
>> > been ignored by _you_.  There is _nothing_ wrong, from a technical
>> > standpoint, to call this marker ".gitignore", and it is _also_ not wrong
>> > to put this marker into the file system _in addition_ to the index.
>> > 
>> > So go and add your directories via that marker, and _be done with it_.
>> 
>> But this alters the content of the directory away from what I want it to be,
>> namely empty.  You aren't addressing the concept of tracking an empty
>> directory, you're just saying you won't do it.
>
> OMG last time I checked, my _empty_ directory contained "." and "..".  
> What do I do now?

If you have a suitable Solaris system, you could try

sudo unlink .
sudo unlink ..

and have a chance that this will work until the next file system
check.

I don't think that adding tracking of ".." would be easy to implement
in git, but I seem to remember that somebody recently proposed a plan
of at least tracking "." which would seem better than nothing and
possibly more useful than "sudo unlink .".

All the best,

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 15:08                             ` Brian Gernhardt
  2007-07-19 15:27                               ` David Kastrup
@ 2007-07-20  0:01                               ` Junio C Hamano
  2007-07-20  0:15                                 ` Linus Torvalds
  1 sibling, 1 reply; 156+ messages in thread
From: Junio C Hamano @ 2007-07-20  0:01 UTC (permalink / raw
  To: Brian Gernhardt
  Cc: David Kastrup, Shawn O.Pearce, Linus Torvalds, Matthieu Moy,
	Johannes Schindelin, Git Mailing List

Brian Gernhardt <benji@silverinsanity.com> writes:

> My apologies for the wasted bandwidth arguing for things that had
> already been decided.

Sorry, who decided what?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  0:01                               ` Junio C Hamano
@ 2007-07-20  0:15                                 ` Linus Torvalds
  2007-07-20  0:33                                   ` Linus Torvalds
  2007-07-20 10:19                                   ` Olivier Galibert
  0 siblings, 2 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-20  0:15 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy,
	Johannes Schindelin, Git Mailing List



On Thu, 19 Jul 2007, Junio C Hamano wrote:
>
> Brian Gernhardt <benji@silverinsanity.com> writes:
> 
> > My apologies for the wasted bandwidth arguing for things that had
> > already been decided.
> 
> Sorry, who decided what?

I think people who didn't know how the world works decided that 
directories that were added manually as directories would stay as 
directories even after the last file was removed.

That's physically impossible with the git data-structures (since there is 
no way of saving "this directory was added empty" in the tree structures, 
nor any point to it), so I think it's just insane rambling.

I dunno. I think empty directories are worth supporting, mainly to be able 
to capture other SCM's notion of what _they_ track, but quite frankly, the 
level of discussion about them hasn't been exactly inspiring. It seems to 
be more about "this is what we'd like to see, without really having a 
reason for it, nor necessarily understanding what we're talking about" 
than "this is realistic and useful and here are patches".

I *do* think that it's a very valid argument that if you import something 
from SVN that has an empty directory, the git import should show that.  

That's about the only valid argument I've ever seen for them, though, and 
I think that's totally irrelevant to such issues as to whether "git rm 
file/in/directory" should remove the directory(*) from being tracked by 
git when the file goes away or not.

			Linus
 
(*) And, for anybody confused about the issue, the answer to the latter 
question is an emphatic: "Yes it should, live with it, and if you want the 
directory back, you had better add it back as an empty directory"

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  0:15                                 ` Linus Torvalds
@ 2007-07-20  0:33                                   ` Linus Torvalds
  2007-07-20  2:24                                     ` Junio C Hamano
                                                       ` (2 more replies)
  2007-07-20 10:19                                   ` Olivier Galibert
  1 sibling, 3 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-20  0:33 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy,
	Johannes Schindelin, Git Mailing List



On Thu, 19 Jul 2007, Linus Torvalds wrote:
> 
> That's physically impossible with the git data-structures (since there is 
> no way of saving "this directory was added empty" in the tree structures, 
> nor any point to it), so I think it's just insane rambling.

Of course, it's physically *possible* to have a tree that contains two 
entries for the same name: first the "empty tree" and then the "real 
tree", and yeah, in theory you could track things that way.

So I guess the "physically impossible" was a bit strong. You'd have to 
have a totally insane format, and you'd have to violate deeply seated 
rules about what trees look like (and the index too, for that matter: we'd 
have to do the same for the index, and keep the S_IFDIR entry alive 
despite having other entries that are children of it), but it's 
*possible*.

It's just a really bad idea.

So to be sane, when you add files, the empty directory entry has to go 
away. Otherwise you could have two very different trees that encode the 
same *content* (just with different ways of getting there - depending on 
whether you have a history with empty trees or not), and that's very much 
against the philosophy of git, and breaks some fundamental rules (like the 
fact that "same content == same SHA1").

In fact, that may be the best way to explain why it's *not* an option to 
have "empty trees remain empty trees if we remove the last file from 
them": git fundamnetally tracks "content snapshots", and anything that 
implies the content containing any history is against the rules.

So the whole notion of "remembering" whether a directory was added 
explicitly as an empty directory or not is just not a sensible concept in 
git. 

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  0:33                                   ` Linus Torvalds
@ 2007-07-20  2:24                                     ` Junio C Hamano
  2007-07-20  2:31                                       ` Linus Torvalds
  2007-07-20  5:58                                       ` David Kastrup
  2007-07-20  5:35                                     ` David Kastrup
       [not found]                                     ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net>
  2 siblings, 2 replies; 156+ messages in thread
From: Junio C Hamano @ 2007-07-20  2:24 UTC (permalink / raw
  To: Linus Torvalds
  Cc: Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy,
	Johannes Schindelin, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So the whole notion of "remembering" whether a directory was added 
> explicitly as an empty directory or not is just not a sensible concept in 
> git. 

That is true if it is implemented as David suggested, to have a
phony "." entry in the tree object itself.  The object name of
such a tree (when it contains blobs and trees underneath) will
be different from a tree that contains the same set of blobs and
trees.  It would destroy the fundamental concepts of git.

But you _could_ treat that "should-be-kept-even-when-empty"-ness
just like we treat executable bit on blobs, I think.

When blobs with the same contents but of different type (REG vs
LNK) and regular file with or without executable bit are entered
in git, they all get the same SHA-1 but we can still tell them
apart because the index and the tree entry have mode bits.  So
hypothetically, you could introduce "sticky" directory in tree
entries to mark "this will not go away when emptied".

In a 'tree' object, they might appear as:

        40000 ordinary-directory '\0' 20-byte SHA-1
        41000 directory-dontremove-even-if-empty '\0' 20-byte SHA-1

In 'index', as your "I'm soft" patch, we do not have to add
nonsticky kind of tree nodes, but for "empty" ones, we can add:

	041000 directory-dontremove-even-if-empty '\0' 20-byte SHA-1

in the index and (unlike your patch) keep it there even after a
blob or a tree is added underneath it.

The "sticky" bit on such a directory would have to obey the
usual rule of 3-way merge, which would be a huge change to do
so, but I do no see there is anything fundamental that prevents
you from doing this.  Other than the fact that probably no git
long timer is interested in spending time on such a feature,
that is.

Obviously, this "sticky" bit will cascade up and make your
otherwise equivalent parent tree's different, but I think that
is just as a sane behaviour as two trees that contain the same
blob with only executable-bit differences have different names.

This will involve a lot of changes, so I would not recommend
anybody doing so, though.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  2:24                                     ` Junio C Hamano
@ 2007-07-20  2:31                                       ` Linus Torvalds
  2007-07-20  5:55                                         ` David Kastrup
  2007-07-20  5:58                                       ` David Kastrup
  1 sibling, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-20  2:31 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy,
	Johannes Schindelin, Git Mailing List



On Thu, 19 Jul 2007, Junio C Hamano wrote:
> 
> But you _could_ treat that "should-be-kept-even-when-empty"-ness
> just like we treat executable bit on blobs, I think.

True. Or you could make it a path attribute and/or a per-repository 
decision, so that while the data wouldn't necessarily be in the database 
itself, the user could specify the behaviour he wanted.

> This will involve a lot of changes, so I would not recommend
> anybody doing so, though.

Agreed. The upside just isn't there.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  0:33                                   ` Linus Torvalds
  2007-07-20  2:24                                     ` Junio C Hamano
@ 2007-07-20  5:35                                     ` David Kastrup
  2007-07-20  9:27                                       ` Simon 'corecode' Schubert
       [not found]                                     ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net>
  2 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-20  5:35 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 19 Jul 2007, Linus Torvalds wrote:
>> 
>> That's physically impossible with the git data-structures (since
>> there is no way of saving "this directory was added empty" in the
>> tree structures, nor any point to it), so I think it's just insane
>> rambling.
>
> Of course, it's physically *possible* to have a tree that contains
> two entries for the same name: first the "empty tree" and then the
> "real tree", and yeah, in theory you could track things that way.
>
> So I guess the "physically impossible" was a bit strong. You'd have
> to have a totally insane format, and you'd have to violate deeply
> seated rules about what trees look like (and the index too, for that
> matter: we'd have to do the same for the index, and keep the S_IFDIR
> entry alive despite having other entries that are children of it),
> but it's *possible*.

Excuse me?  You don't need a "totally insane format".  You need an
entry "." of a new type "directory" that can be part of the current
concept of a "tree".  This new type does _not_ have children.  It is
not a container for files.  It would be the thing that would carry
permissions or other properties if git were to store them.  It can be
put into .gitignore files like other files.

One drawback is that adding and removing it alone is not supported
with the current git-add and git-remove commands: they would require
an additional argument "-d" like "ls" does.

All of this is a straightforward extension fitting very well the
current paradigms and also existing file systems and their usage.

> It's just a really bad idea.

> So to be sane, when you add files, the empty directory entry has to
> go away.

You really have not followed the discussion at all.  This is not
possible since otherwise you could not distinguish the cases

mkdir A
touch A/B
git-add A
git-rm A/B

where A was added and not removed and should stay and

mkdir A
touch A/B
git-add A/B
git-rm A/B

where a single file was added and removed and nothing should stay.

> Otherwise you could have two very different trees that encode the
> same *content* (just with different ways of getting there -
> depending on whether you have a history with empty trees or not),
> and that's very much against the philosophy of git, and breaks some
> fundamental rules (like the fact that "same content == same SHA1").

No, the content is _different_.  One tree contains a tracked
directory, the other does not.  That means that the trees behave
_differently_ when you manipulate them, and that means that they are
_not_ the same tree.

> In fact, that may be the best way to explain why it's *not* an
> option to have "empty trees remain empty trees if we remove the last
> file from them": git fundamnetally tracks "content snapshots", and
> anything that implies the content containing any history is against
> the rules.
>
> So the whole notion of "remembering" whether a directory was added
> explicitly as an empty directory or not is just not a sensible
> concept in git.

Certainly.  That is why we instead remember whether or not a directory
entry "." was added or not.  It will be added (unless the defaults and
gitignore settings ask "." to be non-tracked) when git adds the
corresponding tree or subtree, and it will get removed when git
removes the corresponding tree or subtree.  Emptiness is not a special
case, and it can't be.  Currently, the main information associated
with "." is "stay around even if tree becomes empty".

Now you can do

    unlink .

in Solaris and have the name "." vanish while the directory still
works as a container by other names.

I don't propose that git be able to track this difference, though, and
I doubt that most file archivers would.

But git can or cannot ignore files, and in a similar way it can or
cannot ignore what a directory has more than being an abstract
container.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
       [not found]                                     ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net>
@ 2007-07-20  5:53                                       ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-20  5:53 UTC (permalink / raw
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> So the whole notion of "remembering" whether a directory was added 
>> explicitly as an empty directory or not is just not a sensible concept in 
>> git. 
>
> That is true if it is implemented as David suggested, to have a
> phony "." entry in the tree object itself.  The object name of such
> a tree (when it contains blobs and trees underneath) will be
> different from a tree that contains the same set of blobs and trees.
> It would destroy the fundamental concepts of git.

How so?

> But you _could_ treat that "should-be-kept-even-when-empty"-ness
> just like we treat executable bit on blobs, I think.
>
> When blobs with the same contents but of different type (REG vs LNK)
> and regular file with or without executable bit are entered in git,
> they all get the same SHA-1 but we can still tell them apart because
> the index and the tree entry have mode bits.  So hypothetically, you
> could introduce "sticky" directory in tree entries to mark "this
> will not go away when emptied".

A tree containing files with and without executable bits will show
different SHA-1 sums.  There is no reason that this should be
different for a tree containing the conceptual "." or not.  I won't
fight for a specific implementation but if I am going to implement
this (and the current lack of enthusiasm points to that) I will not go
and duplicate the entire ignore/add/rm/index/repository machinery in
order to have a bit rather than an actual "." directory entry.

Most Unix file systems have an honest, physical, down-to-Earth
directory entry "." even on disk because it _simplifies_ matters, even
though one could special-case "." all throughout and make do without a
physical entry in theory.

And, as I explained, "." lends itself perfectly to the gitignore
machinery in order to policy projects to track or not track
directories.

> In a 'tree' object, they might appear as:
>
>         40000 ordinary-directory '\0' 20-byte SHA-1
>         41000 directory-dontremove-even-if-empty '\0' 20-byte SHA-1
>
> In 'index', as your "I'm soft" patch, we do not have to add
> nonsticky kind of tree nodes,

It does not work, since then you can't distinguish

mkdir A
touch B
git-add A/B

from

mkdir A
touch B
git-add A

It is very clear that git-rm A/B _mustn't_ leave an empty directory in
the first case, and _must_ leave an empty directory in the second case
_if_ and only if one tracks directories.

> Obviously, this "sticky" bit will cascade up and make your otherwise
> equivalent parent tree's different,

No, it must not "cascade up".  After

mkdir -p A/B
touch A/B/C
git-add A/B
git-rm A/B

there must be nothing tracked by git.  The "sticky" bit does not
"cascade up".  Its upward effect is only changing the SHA-1 of the
tree, like any change below does.

> This will involve a lot of changes, so I would not recommend anybody
> doing so, though.

Neither would I.  Why people want to complicate the code base
everywhere by avoiding to treat "." like a legitimate entry (as Unix
file systems do for a _reason_) is simply a miracle to me.

The framework is pretty much _there_.  There is no point in not making
use of it and duplicating the whole machinery because we want a "bit
set" implementation instead of a file name.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  2:31                                       ` Linus Torvalds
@ 2007-07-20  5:55                                         ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-20  5:55 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 19 Jul 2007, Junio C Hamano wrote:
>> 
>> But you _could_ treat that "should-be-kept-even-when-empty"-ness
>> just like we treat executable bit on blobs, I think.
>
> True. Or you could make it a path attribute and/or a per-repository
> decision, so that while the data wouldn't necessarily be in the
> database itself, the user could specify the behaviour he wanted.

No, one can't.  Once can decide per repository whether one wants to
permit this kind of information in.  But if one does, the information
needs to there for _every_ tree.  And a "." entry is a natural and
intuitive way to do that.  "." has been used as a directory entry for
decades in Unix.

>> This will involve a lot of changes, so I would not recommend
>> anybody doing so, though.
>
> Agreed. The upside just isn't there.

It is a good thing that you did not design the Unix file systems.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  2:24                                     ` Junio C Hamano
  2007-07-20  2:31                                       ` Linus Torvalds
@ 2007-07-20  5:58                                       ` David Kastrup
  2007-07-20 15:31                                         ` Linus Torvalds
  1 sibling, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-20  5:58 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Linus Torvalds, Brian Gernhardt, Shawn O.Pearce, Matthieu Moy,
	Johannes Schindelin, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> So the whole notion of "remembering" whether a directory was added 
>> explicitly as an empty directory or not is just not a sensible concept in 
>> git. 
>
> That is true if it is implemented as David suggested, to have a
> phony "." entry in the tree object itself.

Unix file systems contain a phony "." entry in the directory itself,
and have survived in spite of this.

> The object name of such a tree (when it contains blobs and trees
> underneath) will be different from a tree that contains the same set
> of blobs and trees.  It would destroy the fundamental concepts of
> git.

Like "." destroyed the fundamental concepts of Unix filesystems.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  6:53     ` Junio C Hamano
       [not found]       ` <867ioyqhgc.fsf@lola.quinscape.zz>
@ 2007-07-20  8:29       ` Johan Herland
  2007-07-20  8:41         ` David Kastrup
  2007-07-22 21:35       ` David Kastrup
  2 siblings, 1 reply; 156+ messages in thread
From: Johan Herland @ 2007-07-20  8:29 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, David Kastrup

On Wednesday 18 July 2007, Junio C Hamano wrote:
> Didn't I say I do not have an objection for somebody who wants
> to track empty directories, already?  I probably would not do
> that myself but I do not see a reason to forbid it, either.
> 
> The right approach to take probably would be to allow entries of
> mode 040000 in the index.  Traditionally, we allowed only 100644
> (blobs as regular files) and 120000 (blobs as symlinks).  We
> recently added 160000 (commit from outer space, aka subproject).
> 
> And we do that for all directories, not just empty ones.  So if
> you have fileA, empty/, sub/fileB tracked, your index would
> probably have these four entries, immediately after read-tree
> of an existing tree object:

Sorry for jumping in late...

Why do you want to add _all_ directories, and not just the ones we want to 
explicitly track (independent of whether they're empty or not).

Basically, add a "--dir" flag to git-add, git-rm and friends, to tell them 
you're acting on the directory itself (rather than its (recursive) 
contents). "git-add --dir foo" will add the "040000 123abc... 0 foo" to the 
index/tree whether or not foo is an empty directory. "git-rm --dir foo" will 
remove that entry (or fail if it doesn't exist), but _not_ the contents of 
foo.

Since we're making directory tracking _explicit_, this should all be trivially 
backward-compatible.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-20  8:29       ` Johan Herland
@ 2007-07-20  8:41         ` David Kastrup
  2007-07-20 10:20           ` Johan Herland
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-20  8:41 UTC (permalink / raw
  To: git

Johan Herland <johan@herland.net> writes:

> On Wednesday 18 July 2007, Junio C Hamano wrote:
>> Didn't I say I do not have an objection for somebody who wants
>> to track empty directories, already?  I probably would not do
>> that myself but I do not see a reason to forbid it, either.
>> 
>> The right approach to take probably would be to allow entries of
>> mode 040000 in the index.  Traditionally, we allowed only 100644
>> (blobs as regular files) and 120000 (blobs as symlinks).  We
>> recently added 160000 (commit from outer space, aka subproject).
>> 
>> And we do that for all directories, not just empty ones.  So if
>> you have fileA, empty/, sub/fileB tracked, your index would
>> probably have these four entries, immediately after read-tree
>> of an existing tree object:
>
> Sorry for jumping in late...

It could have given you a chance to read up on what has already been
discussed.

> Why do you want to add _all_ directories, and not just the ones we
> want to explicitly track (independent of whether they're empty or
> not).

Because the problematic cases are more often than not the _implicit_
cases.  Do you check a directory tree for empty directories before you
archive it?  In order to archive every empty directory explicitly?

If you did that, you could equally maintain a script that manually
does mkdir/rmdir.

> Basically, add a "--dir" flag to git-add, git-rm and friends, to
> tell them you're acting on the directory itself (rather than its
> (recursive) contents). "git-add --dir foo" will add the "040000
> 123abc... 0 foo" to the index/tree whether or not foo is an empty
> directory. "git-rm --dir foo" will remove that entry (or fail if it
> doesn't exist), but _not_ the contents of foo.

There is nothing wrong with implementing something like this in
_addition_ to treating directory entries implicitly.  For example, ls
has an option -d which does just that, and even git-ls-files has an
option --directory.  Heck, I even have

rm --help
Usage: rm [OPTION]... FILE...
Remove (unlink) the FILE(s).

  -d, --directory       unlink FILE, even if it is a non-empty directory
                          (super-user only; this works only if your system
                           supports `unlink' for nonempty directories)
[...]

which works on just the directory and not on the contents.

So a --directory option for appropriate commands would be natural for
_explicit_ manipulation of such entries.

But the important, the _really_ important thing are the implicit
behaviors.  If I have to hassle with every directory myself, I don't
need a content tracking system.

The --directory stuff, in contrast, are things nice to have when the
framework is in place (and may be even necessary for some direct
manual maintenance tasks), but they don't really concern the
framework.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  5:35                                     ` David Kastrup
@ 2007-07-20  9:27                                       ` Simon 'corecode' Schubert
  2007-07-20 10:11                                         ` David Kastrup
  2007-07-20 10:34                                         ` Junio C Hamano
  0 siblings, 2 replies; 156+ messages in thread
From: Simon 'corecode' Schubert @ 2007-07-20  9:27 UTC (permalink / raw
  To: David Kastrup; +Cc: git

David Kastrup wrote:
>> Otherwise you could have two very different trees that encode the
>> same *content* (just with different ways of getting there -
>> depending on whether you have a history with empty trees or not),
>> and that's very much against the philosophy of git, and breaks some
>> fundamental rules (like the fact that "same content == same SHA1").
> 
> No, the content is _different_.  One tree contains a tracked
> directory, the other does not.  That means that the trees behave
> _differently_ when you manipulate them, and that means that they are
> _not_ the same tree.

You are mistaking things.  Like the executable bit on a file is not content, the fact that a directory should be kept despite being empty is also an *attribute* of the directory.  This is meta-data, not actual data (content).  So no matter how elegant tracking the "." entry might be (and I think it is, because it covers a lot of corner cases already), it puts the information at the wrong place.

That's sad, because otherwise it would be really elegant.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  9:27                                       ` Simon 'corecode' Schubert
@ 2007-07-20 10:11                                         ` David Kastrup
  2007-07-20 10:34                                         ` Junio C Hamano
  1 sibling, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-20 10:11 UTC (permalink / raw
  To: git

Simon 'corecode' Schubert <corecode@fs.ei.tum.de> writes:

> David Kastrup wrote:
>>> Otherwise you could have two very different trees that encode the
>>> same *content* (just with different ways of getting there -
>>> depending on whether you have a history with empty trees or not),
>>> and that's very much against the philosophy of git, and breaks some
>>> fundamental rules (like the fact that "same content == same SHA1").
>>
>> No, the content is _different_.  One tree contains a tracked
>> directory, the other does not.  That means that the trees behave
>> _differently_ when you manipulate them, and that means that they are
>> _not_ the same tree.
>
> You are mistaking things.

No, I am redefining them, or rather the view on them.  Subtle
difference.

> Like the executable bit on a file is not content, the fact that a
> directory should be kept despite being empty is also an *attribute*
> of the directory.  This is meta-data, not actual data (content).

We need to track it, anyway.  So there is little point in not using
the existing infrastructure for handling named entities.

> So no matter how elegant tracking the "." entry might be (and I
> think it is, because it covers a lot of corner cases already), it
> puts the information at the wrong place.

I don't see that the place is wrong: after all, that is where Unix
places "." too, and for good reason.  I was arguing for _separating_
the concept of "directory" and "tree" in the repository.  The tree is
a container entity defined exclusively by its contents (which
determine its hash).  That is how git already does things.  There is
_no_ connection with the physical existence of a directory: in the
work directory, git creates and deletes directories as a _side-effect_
of storing and removing trees.  But git itself does not track
directories as a physical entity at _all_.  If you had a flat
filesystem allowing slashes in filenames, git would get along better
than it does now, without ever creating or removing a directory.
Trees are just a convenient selection and pattern matching mechanism
for files as far as git is concerned.  The correspondence to physical
directories in the work directory is a nuisance rather than an asset
as far as git is concerned.

In a recent thread here, tags with slashes were supported by
essentially doing

    mkdir -p "`dirname $TAG`"
    touch $TAG

where directory creation is just a side effect of supporting slashes.
And that, if you look closely, is git's current relation with
directories altogether.  The directories in the work file system are
created by git just as a side effect for representing slashes, which
in turn facilitate a certain manner of pattern matching.

And "." seems perfectly well suited to bring across the point that
there actually is _physical_ existence associated with a directory,
existence that remains when the rest of the tree is gone and _makes_ a
difference to what the tree is, because it has a _different_
representation in the work file system.

Storing it as an _attribute_ of the tree is a bad idea, since then the
simple rule "a tree without contents is empty" needs an exception.
And a tree stops becoming just a container of its contents and all
sort of new exceptions creep up.

There are some systems where the difference between directory as a
file and directory as a structuring method are more apparent than
under Unix (some utilities like rsync differentiate between A/B and
A/B/ to bring across that difference).

Here is an example for some Emacs function concerned with the concept:

    directory-file-name is a built-in function in `C source code'.
    (directory-file-name DIRECTORY)

    Returns the file name of the directory named DIRECTORY.
    This is the name of the file that holds the data for the directory DIRECTORY.
    This operation exists because a directory is also a file, but its name as
    a directory is different from its name as a file.
    In Unix-syntax, this function just removes the final slash.
    On VMS, given a VMS-syntax directory name such as "[X.Y]",
    it returns a file name such as "[X]Y.DIR.1".

    [back]

> That's sad, because otherwise it would be really elegant.

If something is not elegant because of the angle of view, change the
view.  And it is not like the different angle has no predecessors or
no consistency.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  0:15                                 ` Linus Torvalds
  2007-07-20  0:33                                   ` Linus Torvalds
@ 2007-07-20 10:19                                   ` Olivier Galibert
  1 sibling, 0 replies; 156+ messages in thread
From: Olivier Galibert @ 2007-07-20 10:19 UTC (permalink / raw
  To: Linus Torvalds
  Cc: Junio C Hamano, Brian Gernhardt, David Kastrup, Shawn O.Pearce,
	Matthieu Moy, Johannes Schindelin, Git Mailing List

On Thu, Jul 19, 2007 at 05:15:28PM -0700, Linus Torvalds wrote:
> (*) And, for anybody confused about the issue, the answer to the latter 
> question is an emphatic: "Yes it should, live with it, and if you want the 
> directory back, you had better add it back as an empty directory"

Wouldn't it be perfectly reasonable for git rm to re-add emptied
directories as empty transparently if the appropriate
flag/configuration is set?  rm is porcelain after all.

  OG.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-20  8:41         ` David Kastrup
@ 2007-07-20 10:20           ` Johan Herland
  2007-07-20 10:54             ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Johan Herland @ 2007-07-20 10:20 UTC (permalink / raw
  To: git; +Cc: David Kastrup

On Friday 20 July 2007, David Kastrup wrote:
> Johan Herland <johan@herland.net> writes:
> > Sorry for jumping in late...
> 
> It could have given you a chance to read up on what has already been
> discussed.

I have tried to keep on top of the discussion so far.

> > Why do you want to add _all_ directories, and not just the ones we
> > want to explicitly track (independent of whether they're empty or
> > not).
> 
> Because the problematic cases are more often than not the _implicit_
> cases.  Do you check a directory tree for empty directories before you
> archive it?  In order to archive every empty directory explicitly?

No, of course I don't. But then archiving (as in tar) is intended to recreate 
the "working copy" exactly as it was. Git (and other SCMs), however, is only 
interested in recreating the part of the working copy it explicitly tracks.

Given the following working copy:
/
/tracked/
/tracked/file
/tracked/dir/
/untracked/
/untracked/file
/untracked/dir/

and the following commands:
$ git add tracked

$ git clone

The cloned result could be any of the following:

(1)
/
/tracked/
/tracked/file

This is the current behaviour; directories are not tracked at all, but only 
added as necessary to support files.

(2)
/
/tracked/
/tracked/file
/tracked/dir/
/untracked/
/untracked/dir/

i.e. implicitly tracking _all_ directories. This is what you literally ask 
for, but I think most would find this unreasonable.

(3)
/
/tracked/
/tracked/file
/tracked/dir/

i.e. recursively tracking directories (and files). This seems useful, but 
there is nothing _implicit_ about this.


I have a feeling that you're actually arguing for doing (3) by default. What I 
am arguing is to do (1) by default, and (3) if given a suitable command-line 
option (i.e. "git add --with-dirs tracked").

Note that this is really an interface question. How these entries are actually 
stored in the repo is a different discussion.


Finally, let's look at the case of "git add tracked/file" followed by "git rm 
tracked/file". I'm arguing that "tracked/" should be automatically removed, 
since I never asked for it to be tracked by git. On the other 
hand, "git-add --non-recursive tracked" followed by the above two commands, 
should of course leave "tracked/" in place, since I now actually asked 
explicitly for the directory to be tracked.

My point is fundamentally that selectively tracking directories is a more 
powerful concept than just tracking _all_ directories by default. Note that 
if we support selectively tracking directories, tracking _everything_ (like 
you seem to want) is trivially implemented by _always_ supplying the 
appropriate option to git-add. If we track everything by design, we don't 
have the option of selectively tracking some directories.


> > Basically, add a "--dir" flag to git-add, git-rm and friends, to
> > tell them you're acting on the directory itself (rather than its
> > (recursive) contents). "git-add --dir foo" will add the "040000
> > 123abc... 0 foo" to the index/tree whether or not foo is an empty
> > directory. "git-rm --dir foo" will remove that entry (or fail if it
> > doesn't exist), but _not_ the contents of foo.
> 
> There is nothing wrong with implementing something like this in
> _addition_ to treating directory entries implicitly.

I don't agree. By _selectively_ tracking directories you can implement any 
policy you want on top of it.

> For example, ls 
> has an option -d which does just that, and even git-ls-files has an
> option --directory.  Heck, I even have

Yes, having commandline options for explicitly specifying directories (and not 
their contents) is _exactly_ what I want.

> But the important, the _really_ important thing are the implicit
> behaviors.  If I have to hassle with every directory myself, I don't
> need a content tracking system.

I disagree. Just as you have to decide which files to track, you similarly 
should have to decide which directories to track. Of course, the tools make 
this easier for you by being able to recursively handle files. In the same 
way they should be able to do the same thing for directories.


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  9:27                                       ` Simon 'corecode' Schubert
  2007-07-20 10:11                                         ` David Kastrup
@ 2007-07-20 10:34                                         ` Junio C Hamano
  2007-07-20 13:23                                           ` David Kastrup
  2007-07-20 19:24                                           ` Linus Torvalds
  1 sibling, 2 replies; 156+ messages in thread
From: Junio C Hamano @ 2007-07-20 10:34 UTC (permalink / raw
  To: Simon 'corecode' Schubert; +Cc: David Kastrup, git

Simon 'corecode' Schubert <corecode@fs.ei.tum.de> writes:

> You are mistaking things.  Like the executable bit on a file
> is not content, the fact that a directory should be kept
> despite being empty is also an *attribute* of the directory.
> This is meta-data, not actual data (content).  So no matter
> how elegant tracking the "." entry might be (and I think it
> is, because it covers a lot of corner cases already), it puts
> the information at the wrong place.

Actually, I do not think there is absolute right or wrong here.
The difference is not that the information is at the "right" or
"wrong" place, but one approach places the information at more
efficient-to-use place than the other.  In that sense, the
attribute approach _is_ a more elegant solution between the two.

Making it an attribute has a huge practical advantage.

By treating executable bit as a piece metadata, we can compare
the "contents" quickly.  If you "chmod +x" a blob without
changing anything else, we can detect that fact, because blob
object names are equal.  At the philosophical level, you _could_
argue that the executable-ness is one bit of content and include
that in the object name computation for the blob.  There is
nothing fundamentally wrong about that approach, but that
destroys the nice "cheap comparability" between blobs that
differ only by executable-ness.

David's "." in tree is essentially the same argument as treating
the executable-ness as one extra bit of content.  The fact that
a particular tree wants to stay even after emptied can be
treated as part of contents (thereby reflected in its object
name).  There is nothing fundamentally wrong there, either.  But
that means two trees that contain otherwise identical set of
blobs and subtrees, but differ only in the behaviour of when
they are emptied, would get different object names, hence you
need to descend into them to see if they are different.

Using attribute that is detached from the content itself allows
you to hoist that one bit one level up.  By treating
executable-ness not as part of content, we can compare two blobs
with different executable bits cheaply.  You can avoid
descending into such a tree when comparing it with another tree
that is different only by the "will-stay-when-emptied"-ness the
same way.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-20 10:20           ` Johan Herland
@ 2007-07-20 10:54             ` David Kastrup
  2007-07-20 12:18               ` Johan Herland
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-20 10:54 UTC (permalink / raw
  To: git

Johan Herland <johan@herland.net> writes:

> On Friday 20 July 2007, David Kastrup wrote:
>> Johan Herland <johan@herland.net> writes:
>> > Sorry for jumping in late...
>> 
>> It could have given you a chance to read up on what has already been
>> discussed.
>
> I have tried to keep on top of the discussion so far.
>
>> > Why do you want to add _all_ directories, and not just the ones we
>> > want to explicitly track (independent of whether they're empty or
>> > not).
>> 
>> Because the problematic cases are more often than not the
>> _implicit_ cases.  Do you check a directory tree for empty
>> directories before you archive it?  In order to archive every empty
>> directory explicitly?
>
> No, of course I don't. But then archiving (as in tar) is intended to
> recreate the "working copy" exactly as it was. Git (and other SCMs),
> however, is only interested in recreating the part of the working
> copy it explicitly tracks.

Yes, and
git-add some-dir
tells it to track _everything_ inside some-dir.  Which means that the
included files are tracked _implicitly_.  The included directories
(including some-dir itself) are not.

> Given the following working copy:
> /
> /tracked/
> /tracked/file
> /tracked/dir/
> /untracked/
> /untracked/file
> /untracked/dir/
>
> and the following commands:
> $ git add tracked
>
> $ git clone
>
> The cloned result could be any of the following:
>
> (1)
> /
> /tracked/
> /tracked/file
>
> This is the current behaviour; directories are not tracked at all, but only 
> added as necessary to support files.

And so your case (1) actually rather is a single line:

/tracked/file

Everything else is just part of representing /tracked/file and
disappears as soon as /tracked/file disappears.

> (2)
> /
> /tracked/
> /tracked/file
> /tracked/dir/
> /untracked/
> /untracked/dir/
>
> i.e. implicitly tracking _all_ directories. This is what you literally ask 
> for,

I don't see how you can possibly conclude that from what I have been
writing.

> but I think most would find this unreasonable.

And it is.  So please _don't_ put words into my mouth.  In my
proposal, the following (and nothing else) would get tracked:

/tracked/.
/tracked/file

and that's it.  That is what was requested, and that is what is
tracked.  There will be, incidentally, a tree "/tracked/" and a tree
"/" in the _repository_, but those collapse as soon as they are empty.
They are just an _abstract_ data structuring tool in the repository
that is _mapped_ to directories on checkout.

> /
> /tracked/
> /tracked/file
> /tracked/dir/
>
> i.e. recursively tracking directories (and files). This seems useful, but 
> there is nothing _implicit_ about this.

You did not ask for "/tracked/file" and you did not ask for
"/tracked/dir/" (whatever they may be).  That you wanted to track them
was _implied_ by your request of "/tracked/".

> I have a feeling that you're actually arguing for doing (3) by
> default.  What I am arguing is to do (1) by default, and (3) if
> given a suitable command-line option (i.e. "git add --with-dirs
> tracked").
>
> Note that this is really an interface question.

Not at all.  It is a _conceptual_ question: in order for this to work
at _all_ (instead of being an inconsistent heap of ugly surprises),
directories need a representation in the repo.  This representation,
as opposed to in the work file system, is _optional_: the repository
got perfectly well along without it up to now, and the fallback is
already implemented when there is a tree without corresponding
directory.

> How these entries are actually stored in the repo is a different
> discussion.

Sure.  But anything that requires four dozens of special cases instead
of four because one wanted to keep "things that are under some
specialized view separate separate" is not something I am going to
implement.  I am too old to juggle with complexity for the sake of
complexity.  I can make much more use of the existing infrastructure
by actually making file and directory entries quite similar.

ls -la
also has no special cases for "." and ".." because they are, at a very
fundamental level, very special in achieving a special purpose
_without_ being special-cased.

> Finally, let's look at the case of "git add tracked/file" followed
> by "git rm tracked/file". I'm arguing that "tracked/" should be
> automatically removed, since I never asked for it to be tracked by
> git.

Sure.  And nobody ever said otherwise.  In fact, I gave about a dozen
examples in that line and more special in the thread up to now.

> On the other hand, "git-add --non-recursive tracked" followed by the
> above two commands, should of course leave "tracked/" in place,
> since I now actually asked explicitly for the directory to be
> tracked.

Sure.  Use "--directory" instead of "--non-recursive" and you have a
somewhat more special option for that.

> My point is fundamentally that selectively tracking directories is a
> more powerful concept than just tracking _all_ directories by
> default.

Perhaps you might read up on some of the past discussion before
beating dead horses.  This has been covered already, and more than
once.  I never asked for "all directories" to be tracked.  I outlined
cases where they are tracked and where not, and I tested that the
mechanisms in "man gitignore" already work _perfectly_ with the
pattern "." for configuring the _implied_ tracking at directory,
repository, project, and user preference level.

> Note that if we support selectively tracking directories, tracking
> _everything_ (like you seem to want) is trivially implemented by
> _always_ supplying the appropriate option to git-add. If we track
> everything by design, we don't have the option of selectively
> tracking some directories.

But that means manual intervention all of the time.  It is fine when a
tool provides an option to shoot you in the arm instead of in the foot
as usual, but that's not really a fix, but an acerbation of the
problem.

>> > Basically, add a "--dir" flag to git-add, git-rm and friends, to
>> > tell them you're acting on the directory itself (rather than its
>> > (recursive) contents). "git-add --dir foo" will add the "040000
>> > 123abc... 0 foo" to the index/tree whether or not foo is an empty
>> > directory. "git-rm --dir foo" will remove that entry (or fail if
>> > it doesn't exist), but _not_ the contents of foo.
>> 
>> There is nothing wrong with implementing something like this in
>> _addition_ to treating directory entries implicitly.
>
> I don't agree. By _selectively_ tracking directories you can
> implement any policy you want on top of it.

No, you can't.  Because a "policy" means that things are _implied_.
Being able to do everything manually is not a policy.  It may be a
lifesaver at times, but then you have little business drifting in the
river in the first place.

>> But the important, the _really_ important thing are the implicit
>> behaviors.  If I have to hassle with every directory myself, I
>> don't need a content tracking system.
>
> I disagree. Just as you have to decide which files to track, you
>similarly should have to decide which directories to track. Of
>course, the tools make this easier for you by being able to
>recursively handle files. In the same way they should be able to do
>the same thing for directories.

--directory _explicitly_ is not working recursively, so it does not
solve that problem.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-20 10:54             ` David Kastrup
@ 2007-07-20 12:18               ` Johan Herland
       [not found]                 ` <86odi7utdj.fsf@lola.quinscape.zz>
  0 siblings, 1 reply; 156+ messages in thread
From: Johan Herland @ 2007-07-20 12:18 UTC (permalink / raw
  To: David Kastrup; +Cc: git

On Friday 20 July 2007, David Kastrup wrote:
> Johan Herland <johan@herland.net> writes:
> > My point is fundamentally that selectively tracking directories is a
> > more powerful concept than just tracking _all_ directories by
> > default.
> 
> Perhaps you might read up on some of the past discussion before
> beating dead horses.  This has been covered already, and more than
> once.  I never asked for "all directories" to be tracked.  I outlined
> cases where they are tracked and where not, and I tested that the
> mechanisms in "man gitignore" already work _perfectly_ with the
> pattern "." for configuring the _implied_ tracking at directory,
> repository, project, and user preference level.

It seems our discussion is based on so many misunderstandings of each other 
that it's not very useful to reply to specific parts of it.

AFAICS, from a high-level POV, we're pretty much in agreement on the following 
points:

1. Git should be able to track directories.

2. Tracked directories should be kept alive, even if empty.

3. Git must not necessarily track _all_ directories.


Conversely, we seem to disagree on these points:

4. Whether or not git should track directories by default. You say yes, I say 
no.

5. How the tracking of directories should be implemented in git's object 
database. I want to keep the index/tree as-is except for adding directory 
entries (w/mode 040000) for the tracked directories only. You seem to want to 
add directory entries for _all_ directories and then additional "." entries 
for directories you don't want deleted if/when empty.


Am I making sense, or have I misunderstood our misunderstandings?


...Johan


-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
       [not found]                 ` <86odi7utdj.fsf@lola.quinscape.zz>
@ 2007-07-20 13:20                   ` Johan Herland
  2007-07-20 13:33                     ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Johan Herland @ 2007-07-20 13:20 UTC (permalink / raw
  To: David Kastrup; +Cc: git

On Friday 20 July 2007, David Kastrup wrote:
> Johan Herland <johan@herland.net> writes:
> 
> > AFAICS, from a high-level POV, we're pretty much in agreement on the
> > following points:
> >
> > 1. Git should be able to track directories.
> >
> > 2. Tracked directories should be kept alive, even if empty.
> >
> > 3. Git must not necessarily track _all_ directories.
> >
> >
> > Conversely, we seem to disagree on these points:
> >
> > 4. Whether or not git should track directories by default. You say
> > yes, I say no.
> 
> Element of least surprise.  But since my proposal allows easy and
> intuitive declaration of the preference at user, project, and
> directory level without one choice messing with the choice of other
> projects and contributors with mixed preferences, this is quite
> unimportant.
> 
> We are in agreement that adding or removing the tracking explicitly
> for a single directory might be useful to have.  But it can't be the
> only way.

As long as you can add/remove tracking recursively for a whole (sub)tree, I 
don't see what's the problem. Of course, if you want to change the default 
behaviour, you should be able either set a config variable somewhere, or - as 
a last resort - alias git-add and git-rm to always supply the appropriate 
command-line option.

> > 5. How the tracking of directories should be implemented in git's
> > object database. I want to keep the index/tree as-is except for
> > adding directory entries (w/mode 040000) for the tracked directories
> > only. You seem to want to add directory entries for _all_
> > directories and then additional "." entries for directories you
> > don't want deleted if/when empty.
> 
> No.  I don't want to change _anything_ for untracked directories.
> They are, as previously, implied by the contents and have a "tree"
> entry for efficiency reasons.  Nothing new here.
> 
> The directory mode entries are named "." and are for tracked
> directories only.

Ok. So our difference in opinion on implementation is even smaller than I 
imagined; basically only whether the directory is tracked by a mode "040000" 
entry, or by a "." entry.

> > Am I making sense, or have I misunderstood our misunderstandings?
> 
> The latter.  You are violently arguing for what I outlined.  Which
> probably shows that I am not the best at explaining my ideas, and that
> it reflects badly upon them.

That probably goes for both of us :)


Well, as long as we have this clarified, I don't see much point in continuing 
this part of the thread. I feel confident that the git community as a whole 
will converge on the best technical solution, once it surfaces.


Have fun!

...Johan


-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20 10:34                                         ` Junio C Hamano
@ 2007-07-20 13:23                                           ` David Kastrup
  2007-07-20 19:24                                           ` Linus Torvalds
  1 sibling, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-20 13:23 UTC (permalink / raw
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> Actually, I do not think there is absolute right or wrong here.  The
> difference is not that the information is at the "right" or "wrong"
> place, but one approach places the information at more
> efficient-to-use place than the other.

Agreed.

> In that sense, the attribute approach _is_ a more elegant solution
> between the two.

Disagreed.  See below.

> Making it an attribute has a huge practical advantage.
>
> By treating executable bit as a piece metadata, we can compare the
> "contents" quickly.  If you "chmod +x" a blob without changing
> anything else, we can detect that fact, because blob object names
> are equal.  At the philosophical level, you _could_ argue that the
> executable-ness is one bit of content and include that in the object
> name computation for the blob.  There is nothing fundamentally wrong
> about that approach, but that destroys the nice "cheap
> comparability" between blobs that differ only by executable-ness.
>
> David's "." in tree is essentially the same argument as treating the
> executable-ness as one extra bit of content.  The fact that a
> particular tree wants to stay even after emptied can be treated as
> part of contents (thereby reflected in its object name).

Small nit here: the tree does not want to stay after emptied, since it
is not empty as long as it contains ".".

> There is nothing fundamentally wrong there, either.  But that means
> two trees that contain otherwise identical set of blobs and
> subtrees, but differ only in the behaviour of when they are emptied,
> would get different object names, hence you need to descend into
> them to see if they are different.

And here we disagree in our assessment, and where I find the example
of the execute bit unfitting.  We are talking about _trees_ here, not
files.  So this is only relevant if we have a _huge_, _flat_ tree with
_lots_ of entries at _bottom_ level.

How often does it occur in practice that a _large_ tree has "."  added
or removed and nothing else changes?  Never, because the normal use
case is that a directory is either tracked from the start, or not
tracked at all.  And even if you change the tracking for a whole
project at once (which is a one-time job): the cost difference is
looking at all _tree_ leaf entries, not at all the involved files.

> Using attribute that is detached from the content itself allows you
> to hoist that one bit one level up.  By treating executable-ness not
> as part of content, we can compare two blobs with different
> executable bits cheaply.  You can avoid descending into such a tree
> when comparing it with another tree that is different only by the
> "will-stay-when-emptied"-ness the same way.

But changing the executable bit of a file will happen often during
development.  Adding or removing "." will never usually be done _ever_
except when the tree is first created or removed, and then the cost is
negligible.

So "performance" is not an issue for making this an attribute or a
flat entry.  While the user level abstraction need not match the
actual representation, I think that it will make for lot less special
cases and problematic behavior to pull through with "." as a directory
entry that mostly behaves like other files and, like other files,
requires git to create a directory to contain it.  All the logic for
creating and deleting directories and creating and adding and ignoring
files can _perfectly_ stay the same.

There are just two differences:

a) git always sees "." as a file in every directory in the work tree
   and considers it a file.
b) when it comes to actually creating or modifying or reading the
   actual file in the work directory, it silently skips the
   operation.

It would not even be necessary to give the directory entry any special
attributes or permissions to make this scheme work: declaring it a
normal file and just special-casing the name "." on those operations
would lead to consistent and working behavior, with no change of
format in index and repository at all.

Possibly even a) alone would suffice, at the cost of letting git
complain and continue at every operation (or making a _really_ royal
mess for Solaris root users).

I might be tempted to make a proof-of-concept patch for that.

But for backward-compatibility, it will be better to use an entry type
which old versions of git will be able to ignore when checking out or
in.  And for user-friendliness, one does not really want to list such
entries as regular files.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-20 13:20                   ` Johan Herland
@ 2007-07-20 13:33                     ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-20 13:33 UTC (permalink / raw
  To: git

Johan Herland <johan@herland.net> writes:

> On Friday 20 July 2007, David Kastrup wrote:
>> Johan Herland <johan@herland.net> writes:
>
>> > 4. Whether or not git should track directories by default. You
>> > say yes, I say no.
>> 
>> Element of least surprise.  But since my proposal allows easy and
>> intuitive declaration of the preference at user, project, and
>> directory level without one choice messing with the choice of other
>> projects and contributors with mixed preferences, this is quite
>> unimportant.
>> 
>> We are in agreement that adding or removing the tracking explicitly
>> for a single directory might be useful to have.  But it can't be
>> the only way.
>
> As long as you can add/remove tracking recursively for a whole
> (sub)tree, I don't see what's the problem.

Neither do I.  But a --directory option never is recursive.  That is
the whole point.

Probably we are in violent agreement again.

> Of course, if you want to change the default behaviour, you should
> be able either set a config variable somewhere, or - as a last
> resort - alias git-add and git-rm to always supply the appropriate
> command-line option.

Or declare diverging behaviors using a !. or . entry in the gitignore
mechanisms.  Which work everywhere where we need them.

>> > 5. How the tracking of directories should be implemented in git's
>> > object database. I want to keep the index/tree as-is except for
>> > adding directory entries (w/mode 040000) for the tracked
>> > directories only. You seem to want to add directory entries for
>> > _all_ directories and then additional "." entries for directories
>> > you don't want deleted if/when empty.
>> 
>> No.  I don't want to change _anything_ for untracked directories.
>> They are, as previously, implied by the contents and have a "tree"
>> entry for efficiency reasons.  Nothing new here.
>> 
>> The directory mode entries are named "." and are for tracked
>> directories only.
>
> Ok. So our difference in opinion on implementation is even smaller
> than I imagined; basically only whether the directory is tracked by
> a mode "040000" entry, or by a "." entry.

Actually, even smaller: I'd track them by a "." entry with mode
1777755755 or whatever is the natural expression for "this is a
directory".  The mode would be different from the existing "this is a
tree".

_If_ one wants at one time track permissions of files apart from "x",
the "." entry would be natural for carrying directory permissions.
Without ".", you basically tell git "I don't care about the existence
of this directory.  Just do what is necessary for checking out my
files".

>> > Am I making sense, or have I misunderstood our misunderstandings?
>> 
>> The latter.  You are violently arguing for what I outlined.  Which
>> probably shows that I am not the best at explaining my ideas, and
>> that it reflects badly upon them.
>
> That probably goes for both of us :)
>
> Well, as long as we have this clarified, I don't see much point in
> continuing this part of the thread. I feel confident that the git
> community as a whole will converge on the best technical solution,
> once it surfaces.

I'll probably crank out some insolently primitive proof of concept
eventually.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20  5:58                                       ` David Kastrup
@ 2007-07-20 15:31                                         ` Linus Torvalds
  0 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-20 15:31 UTC (permalink / raw
  To: David Kastrup
  Cc: Junio C Hamano, Brian Gernhardt, Shawn O.Pearce, Matthieu Moy,
	Johannes Schindelin, Git Mailing List



On Fri, 20 Jul 2007, David Kastrup wrote:
> 
> Like "." destroyed the fundamental concepts of Unix filesystems.

David, I'd suggest you just be quiet and learn, instead of spouting 
idiotic nonsense.

When Junio talks about fundamental concepts of git, you should sit back, 
relax, and ponder. And maybe realize that the git filesystem isn't a "unix 
filesystem". It's a content-addressable one, it's not POSIX, and yes, it 
really does have totally different fundamental concepts.

So your arguments are just inane and stupid, and show that you aren't 
worth discussing with, because you don't even understand what you are 
talking about.

So here's a suggestion: how about trying to *understand* git first. After 
that, you can talk.

In fact, at this point, I have an even better suggestion: how about you 
just shut the hell up until you have a tested patch? Code talks, bullshit 
walks. And right now you are nothing but bullshit.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20 10:34                                         ` Junio C Hamano
  2007-07-20 13:23                                           ` David Kastrup
@ 2007-07-20 19:24                                           ` Linus Torvalds
  2007-07-20 21:02                                             ` Johan Herland
  1 sibling, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-20 19:24 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Simon 'corecode' Schubert, David Kastrup, git



On Fri, 20 Jul 2007, Junio C Hamano wrote:
> 
> Using attribute that is detached from the content itself allows
> you to hoist that one bit one level up.  By treating
> executable-ness not as part of content, we can compare two blobs
> with different executable bits cheaply.  You can avoid
> descending into such a tree when comparing it with another tree
> that is different only by the "will-stay-when-emptied"-ness the
> same way.

Having thought about it a bit more, I would absolutely *detest* any kind 
of "executable bit" like behaviour.

Why? 

Merging. I think one of the fundamental issues in merging is that you do 
it "in the working tree". This is something that pretty much *everybody* 
else gets wrong, and it's somethign where git absolutely shines.

But git shines here exactly because git never tracks "history" or the 
state in the tree, and only ever tracks things that are indubitably real 
content. Which is why you never *ever* have to tell git about "I moved 
file X to file Y" - because git only tracks things that it can see right 
in front of it, in the tree.

The "sticky directory" bit simply would not be something like that. It 
simply isn't "content", and as such, it should not be tracked. It's as 
easy as that. We don't want a merge of two branches to have to specify any 
extra data "outside" the tree as to how it should be merged.

So the issue about whether a directory *exists* or not can be merged (just 
look at the tree), but the issue about whether the directory is supposed 
to be sticky is something that you'd have to tell git about *outside* of 
the tree, and that violates the whole point of working tree merges.

I do realize that if you use inferior operating systems, we already have 
these kinds of "outside the tree" data entries, thanks to issues like 
symlinks and normal file executable bits that you would have to explicitly 
tell git about when you're working in a broken environment. So in that 
sense, it wouldn't be anything technically new for git. 

But that doesn't change the fundamental issue: the limitation with 
executable bits and symlinks is a limitation of the broken environment, 
not of git. But "directories stay around after the last file is gone" is 
not that, it would simply be a design mistake in git itself.

There are other reasons to not do it. What about file renames? Maybe the 
directory got *renamed*. From a pure content angle, this is "all the files 
in that directory went away". If you have stupid rules like "directories 
stay around even though all the files went away", you would again have 
problems with this common case.

In other words: I don't care one whit about the whiners. What's MUCH more 
important than some random whiny person saying "Daddy, daddy, I want a 
pony" is whether you can afford to maintain that pony in the future. And 
this pony is just stupid.

So here:

	No, you cannot have a pony. NOT YOURS.

but I still think we should support the concept of importing things from 
other systems, and thus eventually support empty directories. Just not any 
crazy semantics with sticky histories.

			Linus

PS. As usual, per-user or per-repository *local* attributes are something 
else. They aren't "sticky history", they are just purely behavioural 
defaults. Those kinds of things may make sense. But that's not a "tracking 
content" issue.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20 19:24                                           ` Linus Torvalds
@ 2007-07-20 21:02                                             ` Johan Herland
  2007-07-20 21:48                                               ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Johan Herland @ 2007-07-20 21:02 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

On Friday 20 July 2007, Linus Torvalds wrote:
> [...]
> 
> But that doesn't change the fundamental issue: the limitation with 
> executable bits and symlinks is a limitation of the broken environment, 
> not of git. But "directories stay around after the last file is gone" is 
> not that, it would simply be a design mistake in git itself.
> 
> There are other reasons to not do it. What about file renames? Maybe the 
> directory got *renamed*. From a pure content angle, this is "all the files 
> in that directory went away". If you have stupid rules like "directories 
> stay around even though all the files went away", you would again have 
> problems with this common case.
> 
> In other words: I don't care one whit about the whiners. What's MUCH more 
> important than some random whiny person saying "Daddy, daddy, I want a 
> pony" is whether you can afford to maintain that pony in the future. And 
> this pony is just stupid.
> 
> So here:
> 
> 	No, you cannot have a pony. NOT YOURS.
> 
> but I still think we should support the concept of importing things from 
> other systems, and thus eventually support empty directories. Just not any 
> crazy semantics with sticky histories.

Does this mean that you are firmly opposed to the concept of storing 
directories in the index/tree as such, or that you are only opposed to 
(some of) the implementation ideas that have been discussed so far?

If the former is the case, does this mean that there will be no support for 
empty directories in git, alternatively that such support is limited to 
incorporating e.g. Dscho's .gitignore workaround into porcelain commands 
(i.e. "git add --directory some_dir" will be mangled/transformed 
into "touch some_dir/.gitignore && git add some_dir/.gitignore")?

(Granted, Dscho's .gitignore workaround is fairly elegant as workarounds go, 
but it still reeks of inheriting a CVS misfeature.)


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20 21:02                                             ` Johan Herland
@ 2007-07-20 21:48                                               ` Linus Torvalds
  2007-07-20 22:36                                                 ` Julian Phillips
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-20 21:48 UTC (permalink / raw
  To: Johan Herland; +Cc: git



On Fri, 20 Jul 2007, Johan Herland wrote:
> 
> Does this mean that you are firmly opposed to the concept of storing 
> directories in the index/tree as such, or that you are only opposed to 
> (some of) the implementation ideas that have been discussed so far?

I've already sent out a *patch* to do so, for chissake. It handled all 
these cases perfectly fine, as far as I know, but I didn't test it all 
that deeply (and made it clear when I sent that patch out).

In fact, in this whole pointless discussion, I think I'm so far the only 
one to have done anything constructive at all. Sad.

So here's my standpoint:

 - people who use git natively might as well use the ".gitignore" trick. 
   It really *does* work, and there really aren't any downsides. Those 
   directories will stay around forever, until you decide that you don't 
   want them any more. Problem solved.

   Sure, if you export the git archive into some other format, you might 
   well want to do something about the ".gitignore" files (like just 
   delete them, since they won't be meaningful in an SVN environment, for 
   example, but you might also just convert them into SVN's "attributes" 
   or whatever it is that SVN uses to ignore files).

 - If you don't use git natively, but just to track another thing, you 
   could easily use the patches that I already sent out. Yes, they need 
   more testing. Yes, you'd also probably like some user interface updates 
   (notably "git add/rm" should be taught about directories).

   And yes, I probably (almost certainly) didn't handle all cases, but the 
   patch I sent out was actually a working one. It really *did* pass my 
   trivial tests.

But once you start tracking empty directories *without* a .gitignore file, 
some things fall out of that:

 - git really *really* is designed to track "snapshots in time". You 
   generate history from these snapshots. This is a very fundmanetal 
   issue, and a lot of people seem to have trouble understanding the 
   deeper implications.

   For example, git and hg may look similar, but git tracks "snapshots in 
   time", and hg tracks "file histories tied together in snapshots". That 
   really is a fundamentally different thing. 

   And one of the fundamental results of git's approach is that content is 
   content. There is *never* any notion of "history".  A snapshot really 
   is just that: it's a standalone thing. It *has* no history. The history 
   comes entirely from outside.

   This means that the whole notion of "this directory will not go away 
   because I added it explicitly" is a totally broken notion in git. It 
   has a notion of "history" - something that simply DOES NOT EXIST, 
   unless you seriously break the whole notion of "snapshots in time".

   In other words, when I say that git is a "content tracker", I'm 
   serious. It tracks nothing *but* content. If some concept doesn't exist 
   in the working tree, git doesn't track it. If it cannot be seen in the 
   filesystem, it doesn't exist.

 - Contrast this with a lot of totally broken SCM's, that track "history" 
   of files. As a result, they have absolutely *horrid* merge problems, 
   because you can no longer just merge things in the working directory, 
   and "the result" is the result. No, if you track history, you now have 
   to tell the SCM about how the *history* moved, not just the content.

So this is why git MUST NOT make the difference between

 - a directory was was created explicitly and then had a few files added 
   to it, and then had those files deleted from it

and

 - we added a few files, we removed them

The end result MUST BE the same, because the  state IN THE WORKING TREE is 
the same!

If the contents are the same, the end result must be the same. It's that 
simple. And it all comes down to: "git tracks contents".

Now, having said that, it doesn't matter *what* the end result is, as long 
as it's the same for both cases. What we do now is that when the files go 
away, the directory is no longer tracked.

But we *could* say that when we remove files, we always add back the 
directory they were in if that directory still exists in the filesystem.

See? Both are consistent with the "git tracks contents" notion. The only 
thing that is *not* consistent with that notion is to have a flag that we 
carry along that says "keep this directory". That's no longer content, and 
now you'd be tracking some internal SCM history instead. And that is a 
mistake. It may sound like a small mistake (and it is), but down that path 
lies madness. It's much better to teach people _why_ git doesn't do it, 
than to say "ok, git tracks content, but we have this special case where 
we also track something else, namely a git internal "stickiness" notion".

SCM is too important to play games with. Git gets things right, and I 
doubt people really _realize_ that the "tracks content" is why git is so 
much better, and why git can do merges so much faster and more reliably 
than anybody else.

So the rule really *must* be:

 - if two trees look the same in the filesystem, they *must* have the same 
   git SHA1, because by definition, they have the same content.

Anything that breaks that very simple statement is fundamentally broken.

			Linus

PS. I realize that nobody actually seems to be writing code, and that this 
is a "paint the bike shed" discussion for everybody else, but just in case 
there are people who don't just masturbate about the color of the shed, 
I'd like to point out that we really *do* need to enhance the "diff" rules 
too, so that you can express the changes in a tree as a diff too. Because 
if we track empty directories, then we need to be able to also *show* the 
difference between a tree that has an empty directory, and one that does 
not.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20 21:48                                               ` Linus Torvalds
@ 2007-07-20 22:36                                                 ` Julian Phillips
  2007-07-21  0:18                                                   ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Julian Phillips @ 2007-07-20 22:36 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Johan Herland, git

On Fri, 20 Jul 2007, Linus Torvalds wrote:

>
>
> On Fri, 20 Jul 2007, Johan Herland wrote:
>>
>> Does this mean that you are firmly opposed to the concept of storing
>> directories in the index/tree as such, or that you are only opposed to
>> (some of) the implementation ideas that have been discussed so far?
>
> I've already sent out a *patch* to do so, for chissake. It handled all
> these cases perfectly fine, as far as I know, but I didn't test it all
> that deeply (and made it clear when I sent that patch out).
>
> In fact, in this whole pointless discussion, I think I'm so far the only
> one to have done anything constructive at all. Sad.

There was Dscho's .gitignore based patch too ...

>
> So here's my standpoint:
>
> - people who use git natively might as well use the ".gitignore" trick.
>   It really *does* work, and there really aren't any downsides. Those
>   directories will stay around forever, until you decide that you don't
>   want them any more. Problem solved.
>
>   Sure, if you export the git archive into some other format, you might
>   well want to do something about the ".gitignore" files (like just
>   delete them, since they won't be meaningful in an SVN environment, for
>   example, but you might also just convert them into SVN's "attributes"
>   or whatever it is that SVN uses to ignore files).

Personally I quite like this approach - I'm going to use it to keep all 
the empty directories from Subversion in my importer.  It seems to address 
everthing quite neatly.

I don't really understand the objections ... especially since I can't see 
why you want an empty directory if you're not going to put _something_ in 
it - in which case, presumably you want to ignore it (so maybe a 
.gitignore containing * would be better than an empty one)?  However, I'm 
sure that if people want it, they have a reason.

> SCM is too important to play games with. Git gets things right, and I
> doubt people really _realize_ that the "tracks content" is why git is so
> much better, and why git can do merges so much faster and more reliably
> than anybody else.

This is the thing that made me interested in git back in April '05.  I 
couldn't see what we were going to end up with at that point - but I was 
_convinced_ that due to the underlying design it was worth watching. 
Being a python type (sorry ... :$) hg looked interesting when it sprang up 
- but they threw away what I considered to be one of the most compelling 
features of git (at the time there wasn't the wealth of really nice tools 
that we now have).

In fact, I really should say "Thank you Linus", since I came that close to 
writing an SCM from scratch myself - having been using Subversion with 
branches for quite some time (and CVS before that - and yes I do mean 
branches + CVS).  Now I no longer feel the need to write an SCM - just a 
longing to use git.  git is probably better than anything I would have 
come up with too. :D

-- 
Julian

  ---
She is descended from a long line that her mother listened to.
 		-- Gypsy Rose Lee

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-20 22:36                                                 ` Julian Phillips
@ 2007-07-21  0:18                                                   ` Linus Torvalds
  2007-07-21  1:23                                                     ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-21  0:18 UTC (permalink / raw
  To: Julian Phillips; +Cc: Johan Herland, git



On Fri, 20 Jul 2007, Julian Phillips wrote:

> On Fri, 20 Jul 2007, Linus Torvalds wrote:
> > 
> > So here's my standpoint:
> > 
> > - people who use git natively might as well use the ".gitignore" trick.
> >   It really *does* work, and there really aren't any downsides. Those
> >   directories will stay around forever, until you decide that you don't
> >   want them any more. Problem solved.
> 
> Personally I quite like this approach - I'm going to use it to keep all the
> empty directories from Subversion in my importer.  It seems to address
> everthing quite neatly.

The really sad part about this discussion is that the ".gitignore trick" 
is really technically no different at all from the one that David Kastrup 
has been advocating a few times, except he calls his ".gitignore" just 
".", and seems to think that it's somehow different.

It is true that ".gitignore" and "." _are_ different.

But they are actually different in the sense that the ".gitignore" thing 
is something you can control, while the "." thing is something that is in 
all directories on UNIX, which is exactly why it _must_not_ be used by git 
to mark existence. Exactly because it has thus lost its ability to be 
something you can tune per-directory in the working tree!

That said, I actually like my patch, because the git tree structures 
actually lend themselves very naturally to the "empty tree", and I know 
people have even built up those kinds of trees on purpose, even if the 
index doesn't support that notion.

So in that sense, teaching the index about an empty tree is in some ways 
the "right thing" to do, if only because it means that the index can 
finally express something that the tree objects themselves have always 
been able to validly encode.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21  0:18                                                   ` Linus Torvalds
@ 2007-07-21  1:23                                                     ` David Kastrup
  2007-07-21  3:54                                                       ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-21  1:23 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> The really sad part about this discussion is that the ".gitignore
> trick" is really technically no different at all from the one that
> David Kastrup has been advocating a few times, except he calls his
> ".gitignore" just ".", and seems to think that it's somehow
> different.

Oh no, I don't think at all that it is somehow different: actually
this is _exactly_ the reason why I think that the implementation will
be doable even by an idiot like myself, and that is because at least
in my first iteration, "."  will appear as an empty regular file to
git, just like ".gitignore".  The main worry I had was that putting
"." inside of a gitignore entry might stop "git add ." from working
like previously.  But I tried it, and it works just like it would with
".gitignore".  Or rather like it would with ".notignore" since
".gitignore" _is_ specially treated by git, after all.

> It is true that ".gitignore" and "." _are_ different.
>
> But they are actually different in the sense that the ".gitignore"
> thing is something you can control, while the "." thing is something
> that is in all directories on UNIX, which is exactly why it
> _must_not_ be used by git to mark existence.

But I don't plan to have it used by git to mark existence.  The
_existence_ can be taken for granted.  But what can't be taken for
granted, like with any other file, is that the file is actually being
tracked by git.  To have it tracked, you need to add it, and it must
not be covered by gitignore.

> Exactly because it has thus lost its ability to be something you can
> tune per-directory in the working tree!

But it should not let the user lose his ability to let or let not git
track the file.

> That said, I actually like my patch, because the git tree structures
> actually lend themselves very naturally to the "empty tree", and I
> know people have even built up those kinds of trees on purpose, even
> if the index doesn't support that notion.

And that is the reason I will be working with the "empty file ."
metaphor: it would be way above my head to make the index support new
file types or even structures, and change the evaporate-when-empty
semantics of trees and so on, while catching all special cases.

I have no chance in hell to implement a new feature with a reasonable
amount of time and work.  That's a task for people with a larger brain
than mine who have my full admiration and respect.  The best I can
hope to achieve is a clever hack.

And if that works, people can still pile exceptions on it and redo it
as a "proper feature".

You are _perfectly_ correct that my proposal is _not_ a jot different
from registering a regular empty file ".notignore", and it is on
_purpose_, because I could not handle the complications if it were.

The only difference is that I am calling the file ".".  Which is in
_all_ respects nothing more than a naming convention.

However, this convention has distinct advantages over ".notignore":

a) I don't have to depart as far from reality.  Whenever I try
registering ".", I can rely on the work directory actually _having_
"." as a _real_, not a pseudofile.  It will not actually be a
_regular_ file as I'll tell git: that's a wart of my prototype
implementation which will, no doubt, eventually be fixed by others
_if_ the code does its job fine apart from being ugly to look at.  It
may not be even necessary internally to think of "." other than as an
empty regular file, but git should probably not talk too loud about it
lest people laugh at it.

b) it already means something to people.  Now this is a two-edged
sword, since "almost, but not quite, entirely unlike" concepts are not
necessarily helpful in computing.  In this case, however, I think the
match is close enough to help people understand what is going on
rather than the other way round.  "." was introduced because people
wanted to have a good way to refer to a directory as an element of
itself.  So using "." as a self-reference for a directory is quite in
the spirit of that name.

> So in that sense, teaching the index about an empty tree is in some
> ways the "right thing" to do, if only because it means that the
> index can finally express something that the tree objects themselves
> have always been able to validly encode.

If you define the tree objects by the physical in-memory or
in-repository data structures encoding them, then you are correct.  I
am somewhat reluctant to parade around another red cape, but in this
particular case, the size of the wet spot in my pants does not as much
relate to the physical layout of the data structure (big deal,
probably 30 lines of code all around), but rather to the extent and
assumptions of functions accessing it.  Namely, data layout and
accessor functions _together_ constitute a tree object.  So for me the
"evaporate-when-empty" property, while not inherent in the physical
layout of the object, is still an inherent part of its structure which
I would not want to touch: finding and fixing and debugging all code
elements which explicitly or implicitly rely on that assumption is
something I would not entrust myself with.

I might have been more inclined to dabble with that approach if the
tree stuff were written in something more object-oriented, say, clean
and concise C++, except that clean and concise C++ code in the wild is
even more of a mythical beast than clean and concise TeX code, and C++
itself is such a mindboggingly complex contraption...  I digress.

All the best,

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21  1:23                                                     ` David Kastrup
@ 2007-07-21  3:54                                                       ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-21  3:54 UTC (permalink / raw
  To: git

David Kastrup <dak@gnu.org> writes:

> The only difference is that I am calling the file ".".  Which is in
> _all_ respects nothing more than a naming convention.
>
> However, this convention has distinct advantages over ".notignore":
>
> a) I don't have to depart as far from reality.  Whenever I try
> registering ".", I can rely on the work directory actually _having_
> "." as a _real_, not a pseudofile.  It will not actually be a
> _regular_ file as I'll tell git: that's a wart of my prototype
> implementation which will, no doubt, eventually be fixed by others
> _if_ the code does its job fine apart from being ugly to look at.

Update: well, I am still digging through the code, but this is all so
well factored that it might be perfectly feasible to have S_ISDIR
entries after all without too much of a hassle.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-18 23:16               ` [RFC PATCH] " Linus Torvalds
  2007-07-18 23:40                 ` Linus Torvalds
  2007-07-18 23:42                 ` David Kastrup
@ 2007-07-21  4:29                 ` David Kastrup
  2007-07-21  4:51                   ` Linus Torvalds
       [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
  2 siblings, 2 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-21  4:29 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> This really updates three different areas, which are nicely
> separated into three different files, so while it's one single
> patch, you can actually follow along the changes by just looking at
> the differences in each file, which directly translate to separate
> conceptual changes:

Ok, I have now acquired enough passing familiarity with the code that
I find part of my way around it.  Most of your patch looks like it
caters for the S_ISDIR type not previously in use in the index (how
about the repository?).  So that makes for quite a bit of nicer looks.
The disadvantage is that it introduces a new data type and thus one
has to check all the code paths to see how older versions of git will
cater with newer data.

My idea of a fake zero-length file would have had predictable side
effects:

For checking out, git would have created the directory it needed to
place the "file", then try to write an empty file called "." and
failing.  Apart from an error message (if we aren't root on Solaris),
this would have worked exactly as intended.

For deletion on checking out, git would have tried deleting "." and
failed.  I have not checked the code to see whether git takes this as
a clue not to attempt deleting the containing directory.  If not,
again stuff would have worked as intended.  If yes, well, the user
needs to clean up manually.

I am not sure what code paths are executed when using S_ISDIR now in
unmodified git.  As a theoretical question for now: do git
repositories carry some versioning inside them?  Something like "don't
touch me if you are not at least version x"?

Anyway, the code becomes quite less of a dirty hack by using that data
type, so I am pretty much taking your code (which has no overlap to
the work I have done already) as is.  Seems like it should play
together quite nicely with my own stuff.

So thanks for doing the heavy lifting in a difficult area.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21  4:29                 ` David Kastrup
@ 2007-07-21  4:51                   ` Linus Torvalds
  2007-07-21  5:08                     ` Linus Torvalds
       [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
  1 sibling, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-21  4:51 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Sat, 21 Jul 2007, David Kastrup wrote:
> 
> Ok, I have now acquired enough passing familiarity with the code that
> I find part of my way around it.  Most of your patch looks like it
> caters for the S_ISDIR type not previously in use in the index (how
> about the repository?).

The object database has always had S_ISDIR (well, "always" is since very 
early on, when I realized that flat trees didn't cut it).

> The disadvantage is that it introduces a new data type and thus one
> has to check all the code paths to see how older versions of git will
> cater with newer data.

Take a look at the "subproject" patches - those did the same (adding the 
ntion of a gitlink to the index), except those also changed how the tree 
object looked, since now a tree could contain pointers to commits too. 

> My idea of a fake zero-length file would have had predictable side
> effects:

As far as I can tell, it would have been exactly the same thing as the 
S_IFDIR, just instead of the S_IFDIR check, you'd have had to check the 
end of the filename for being '/'.

Otherwise? Exactly the same.

Except for the fact that we already supported S_IFGITLINK for subprojects 
(and there it matches the "struct tree" entry, so it really *does* make 
more sense that way), so supporting S_IFDIR was actually easier.

But hey, that's an implementation detail. I don't actually care all that 
much. In many ways, the "long-term" data structures are much more 
important than the index, the index is a purely temporary - and even more 
importantly - a purely local datastructure.

The more important thing is in many ways the object storage, and that's 
also the reason for doing the index the way I did - it more closely 
matches what the object storage does (ie the "index" ends up mirroring a 
linearized and unpacked "tree" object).

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21  4:51                   ` Linus Torvalds
@ 2007-07-21  5:08                     ` Linus Torvalds
  2007-07-21  5:28                       ` David Kastrup
  2007-07-28  8:44                       ` David Kastrup
  0 siblings, 2 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-21  5:08 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Fri, 20 Jul 2007, Linus Torvalds wrote:
> 
> As far as I can tell, it would have been exactly the same thing as the 
> S_IFDIR, just instead of the S_IFDIR check, you'd have had to check the 
> end of the filename for being '/'.

BTW, there is actually one big difference, and the '/' at the end actually 
has one huge advantage.

Why? Because my preliminary patches sort the index entries wrong. A 
directory should always sort *as*if* it had the '/' at the end.

See base_name_compare() for details.

And we've never done that for the index, because the index has never had 
this issue (since it never contained directories). So sit down and compare 
base_name_compare (for tree entries) with cache_name_compare() (for index 
entries), and see how the latter doesn't care about the type of names.

This was actually something that I hit already with subproject support, 
and one of my very first patches even had some (aborted) code to start 
sorting subprojects in the index the way we sort directories.

And I *should* have done it that way, but I never did. It now makes the 
S_ISDIR handling harder, because directories really do have to be sorted 
as if they had the '/' at the end, or "git-fsck" will complain about bad 
sorting.

Sad, sad, sad. It effectively means that S_IFGITLINK is *not* quite the 
same as S_IFDIR, because they sort differently. Duh.

Of course, it seldom matters, but basically, you should test a directory 
structure that has the files

	dir.c
	dir/test

in it, and the "dir" directory should always sort _after_ "dir.c".

And yes, having the index entry with a '/' at the end would handle that 
automatically.

As it is, with the "mode" difference, it instead needs to fix up 
"cache_name_compare()". Admittedly, that would actually be a cleanup 
(since it would now match base_name_compare() in logic, and could actually 
use that to do the name comparison!), but it's a damn painful cleanup 
because we don't even pass in the mode to "cache_name_compare()", since we 
never needed it.

Gaah.

cache_name_compare itself isn't used in that many places, but it's used 
by "index_name_pos()/cache_name_pos()", which *is* used in many places. 
And again, that one doesn't even have the mode, so it cannot pass it down.

So it probably *is* easier to add the '/' at the end of the name instead, 
to make directories sort the right way in the index. I'd still suggest you 
*also* make the mode be S_IFDIR, though (and preferably make git-fsck 
actually verify that the mode and the last character of the name 
matches!).

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
       [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
@ 2007-07-21  5:15                     ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-21  5:15 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sat, 21 Jul 2007, David Kastrup wrote:
>> 
>> Ok, I have now acquired enough passing familiarity with the code
>> that I find part of my way around it.  Most of your patch looks
>> like it caters for the S_ISDIR type not previously in use in the
>> index (how about the repository?).
>
> The object database has always had S_ISDIR (well, "always" is since
> very early on, when I realized that flat trees didn't cut it).

Then I think I have a bit of a problem: I should think that S_ISDIR in
the repository presumably marks a tree object (still very fuzzy around
the concepts here).  An explicitly checked-in directory (under my
scheme always named "." inside of its tree) would presumably also have
S_ISDIR in the repository but behave quite differently.

> As far as I can tell, it would have been exactly the same thing as the 
> S_IFDIR, just instead of the S_IFDIR check, you'd have had to check the 
> end of the filename for being '/'.

Relative file name of ".", more or less.  Both names satisfy S_IFDIR
in the filesystem, though.

> Otherwise? Exactly the same.

> The more important thing is in many ways the object storage, and
> that's also the reason for doing the index the way I did - it more
> closely matches what the object storage does (ie the "index" ends up
> mirroring a linearized and unpacked "tree" object).

I still have to get enough of a clue about the object store to see how
this pans out.  I would not want to have the "." objects marked as
type "tree" and empty if I can avoid it.  It seems unclean, would need
extra case separations all over the place, violate the "empty trees
evaporate" property and also waste a good place for tracking
permissions or other attributes in future.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21  5:08                     ` Linus Torvalds
@ 2007-07-21  5:28                       ` David Kastrup
  2007-07-21 15:53                         ` Linus Torvalds
  2007-07-28  8:44                       ` David Kastrup
  1 sibling, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-21  5:28 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, 20 Jul 2007, Linus Torvalds wrote:
>> 
>> As far as I can tell, it would have been exactly the same thing as the 
>> S_IFDIR, just instead of the S_IFDIR check, you'd have had to check the 
>> end of the filename for being '/'.
>
> BTW, there is actually one big difference, and the '/' at the end actually 
> has one huge advantage.
>
> Why? Because my preliminary patches sort the index entries wrong. A 
> directory should always sort *as*if* it had the '/' at the end.

Hm, that's bad.  The thing is that the directory names I am tracking
are called "." (that's what I was currently trying to reconcile your
code with).

> And I *should* have done it that way, but I never did. It now makes
> the S_ISDIR handling harder, because directories really do have to
> be sorted as if they had the '/' at the end, or "git-fsck" will
> complain about bad sorting.

Hm, I'll have to check what git-fsck does.

> Of course, it seldom matters, but basically, you should test a directory 
> structure that has the files
>
> 	dir.c
> 	dir/test
>
> in it, and the "dir" directory should always sort _after_ "dir.c".
>
> And yes, having the index entry with a '/' at the end would handle
> that automatically.

You completely lost me here.  I guess I'll be able to pick this up
only after investing considerable more time into the data structures.
And I have to goto bed right now.

> As it is, with the "mode" difference, it instead needs to fix up
> "cache_name_compare()". Admittedly, that would actually be a cleanup
> (since it would now match base_name_compare() in logic, and could
> actually use that to do the name comparison!), but it's a damn
> painful cleanup because we don't even pass in the mode to
> "cache_name_compare()", since we never needed it.
>
> Gaah.
>
> cache_name_compare itself isn't used in that many places, but it's
> used by "index_name_pos()/cache_name_pos()", which *is* used in many
> places.  And again, that one doesn't even have the mode, so it
> cannot pass it down.
>
> So it probably *is* easier to add the '/' at the end of the name instead, 
> to make directories sort the right way in the index. I'd still suggest you 
> *also* make the mode be S_IFDIR, though (and preferably make git-fsck 
> actually verify that the mode and the last character of the name 
> matches!).

The _flattened_ directory name would end in /. in my scheme.  I would
not want to use "xxx/" for a directory name, and "xxx" for a tree:
that would be completely backwards.  And I also don't like the
duplication of xxx when listing objects.

Sure, that's an implementation detail, but I don't like
implementations hurting my eyes...

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21  5:28                       ` David Kastrup
@ 2007-07-21 15:53                         ` Linus Torvalds
  2007-07-21 17:38                           ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-21 15:53 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Sat, 21 Jul 2007, David Kastrup wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > Of course, it seldom matters, but basically, you should test a directory 
> > structure that has the files
> >
> > 	dir.c
> > 	dir/test
> >
> > in it, and the "dir" directory should always sort _after_ "dir.c".
> >
> > And yes, having the index entry with a '/' at the end would handle
> > that automatically.
> 
> You completely lost me here.  I guess I'll be able to pick this up
> only after investing considerable more time into the data structures.

So the basic issue is that not only does git obviously think that only 
content matters, but it describes it with a single SHA1. 

That's not an issue at all for a single file, but if you want to describe 
*multiple* files with a single SHA1 (which git obviously very much wants 
to do), the way you generate the SHA1 matters a lot.

In particular, the order.

So git is very very strict about the ordering of tree structures. A tree 
structure is not just a random list of

	<ASCII mode> + <space> + <filename> + <NUL> + <SHA1>

it's very much an _ordered_ list of those things, because we want the SHA1 
of the tree to be well-specified by the contents, and that means that the 
contents of a tree object has have absolutely _zero_ ambiguity.

This means, for example, that git is very fundamentally case sensitive. 
There's no sane way *not* to be, because if you're case insensitive in any 
way at all, you'll end up having two trees that are "the same", but end up 
having different SHA1's.

It also means that git objects have absolutely zero "localization". There 
is no locale at all, and there very fundamnetally *must*not* be. Again, 
for the same reason: if you can describe the same filename with two 
different encodings, you'd have two different SHA1's for the same content.

So git filenames are very much a "stream of bytes", not anything else. And 
they need to sort 100% reliably, always the same way, and never with any 
localized meaning.

And, partly because it seemed most natural, and partly for historical 
reasons, the way git sorts filenames is by sorting by *pathname*. So if 
you have three files named

	a.c
	a/c
	abc

then they sort in that exact order, and no other! They sort as a "memcmp" 
in the full pathname, and that's really nice when you see whole 
collections of files, and you know the list is globally sorted.

So that "global pathname sorting" has nice properties, and it seems 
"obvious", but it means that because git actually *encodes* those three 
files hierarchically as two different trees (because there's a 
subdirectory there), the tree objects themselves sort a bit oddly. The 
tree obejcts themselves will look like

 top-level tree:
	100644 a.c -> blob1
	040000 a   -> tree2
	100644 abc -> blob3

 sub-tree:
	100644 c    -> blob2

and notice how the *tree* is not sorted alphabetically at all. It has a 
subtly different sort, where the entry "a" sorts *after* the entry "a.c", 
because we know that it's a tree entry, and thus will (in the *global* 
order) sort as if it had a "/" at the end!

Traditionally, when we have the index, the index sorting has been very 
simple: you just sort the names as memcmp() would sort them. But note how 
that changes, if "a" is an empty directory. Now the index needs to sort as

	file a.c
	dir  a
	file abc

because when we create the tree entry, it needs to be sorted the same way 
all tree entries are always sorted - as if "a" had a slash at the end!

[ Yeah, yeah, we could make a special case and just say "the empty tree 
  sorts differently", but that actually results in huge problems when 
  doing a "diff" between two trees: our diff machinery very much depends 
  on the fact that the index and the trees always sort the same way, and 
  if we sorted the "a" entry (when it is an empty directory) differently 
  from the "a" entry (when it has entries in it), that would just be 
  insane and cause no end of trouble for comparing two trees - one with an 
  empty directory and one with content added to that directory.

  So the sorting is doubly important: it's what makes "one content" always 
  have the same SHA1, but it is also much easier and efficient to compare 
  directories when we know they are sorted the same way. ]

In other words, introducing tree entries in the index ended up also 
introducing all the issues that we already had with the tree objects since 
they got split up hierarchically, but that the code didn't use to have to 
care about.

The easiest way to solve this really does seem to be to add the rule that 
the index entry for an empty directory has to have the "/" at the end of 
the name - then the "sort mindlessly by name" will just continue to work.

But that was what I said was broken: my patches I sent out didn't actually 
do that.

It's *probably* just a few lines of code, and it actually would result in 
some nice changes ("git ls-files" would show a '/' at the end of an empty 
directory entry, for example), so this is not a big deal, but it's an 
example of how subtly different a directory is from a file when it comes 
to git.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21 15:53                         ` Linus Torvalds
@ 2007-07-21 17:38                           ` David Kastrup
  2007-07-21 17:52                             ` Simon 'corecode' Schubert
                                               ` (2 more replies)
  0 siblings, 3 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-21 17:38 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sat, 21 Jul 2007, David Kastrup wrote:
>
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>> 
>> > Of course, it seldom matters, but basically, you should test a directory 
>> > structure that has the files
>> >
>> > 	dir.c
>> > 	dir/test
>> >
>> > in it, and the "dir" directory should always sort _after_ "dir.c".
>> >
>> > And yes, having the index entry with a '/' at the end would handle
>> > that automatically.
>> 
>> You completely lost me here.  I guess I'll be able to pick this up
>> only after investing considerable more time into the data structures.

[Basic explanation about git sort order and trees sorting as tree/ in
order to be in the right sort order for a prefix]

Ok, I could not have figured this out on my own.  Are there any design
documents or does one just have to pester the list?

> So the basic issue is that not only does git obviously think that only 
> content matters, but it describes it with a single SHA1. 
>
> That's not an issue at all for a single file, but if you want to describe 
> *multiple* files with a single SHA1 (which git obviously very much wants 
> to do), the way you generate the SHA1 matters a lot.
>
> In particular, the order.
>
> So git is very very strict about the ordering of tree structures. A tree 
> structure is not just a random list of
>
> 	<ASCII mode> + <space> + <filename> + <NUL> + <SHA1>

Ok.

> So git filenames are very much a "stream of bytes", not anything
> else. And they need to sort 100% reliably, always the same way, and
> never with any localized meaning.

There is some utf-8/Unicode trouble to be expected in connection with
that eventually: some, but not all operating and/or file systems
canonicalize file names, replacing accented letters by a combining
accent and the letter.  But that's beside the point.

> And, partly because it seemed most natural, and partly for
> historical reasons, the way git sorts filenames is by sorting by
> *pathname*. So if you have three files named
>
> 	a.c
> 	a/c
> 	abc
>
> then they sort in that exact order, and no other! They sort as a
> "memcmp" in the full pathname, and that's really nice when you see
> whole collections of files, and you know the list is globally
> sorted.

It is amusing that my description of git having no external concept of
directories except as an expedience for representing slashes in
filenames was much closer to the mark that I would have expected.

> So that "global pathname sorting" has nice properties, and it seems 
> "obvious", but it means that because git actually *encodes* those three 
> files hierarchically as two different trees (because there's a 
> subdirectory there), the tree objects themselves sort a bit oddly. The 
> tree obejcts themselves will look like
>
>  top-level tree:
> 	100644 a.c -> blob1
> 	040000 a   -> tree2
> 	100644 abc -> blob3
>
>  sub-tree:
> 	100644 c    -> blob2
>
> and notice how the *tree* is not sorted alphabetically at all. It has a 
> subtly different sort, where the entry "a" sorts *after* the entry "a.c", 
> because we know that it's a tree entry, and thus will (in the *global* 
> order) sort as if it had a "/" at the end!
>
> Traditionally, when we have the index, the index sorting has been very 
> simple: you just sort the names as memcmp() would sort them. But note how 
> that changes, if "a" is an empty directory. Now the index needs to sort as
>
> 	file a.c
> 	dir  a
> 	file abc
>
> because when we create the tree entry, it needs to be sorted the same way 
> all tree entries are always sorted - as if "a" had a slash at the end!

Here is the layout as I would scheme it:

tree1:
     0?0000 .   -> dir1
     100644 a.c -> blob1
     040000 a   -> tree2
     100644 abc -> blob3

sub-tree:
     0?0000 .    -> dir2
     100644 c    -> blob2

Remember that a tree evaporates when it is empty, and if we don't want
to mess with that (which appears like a good idea to me), the "don't
delete this" indication belongs in the subtree where its natural name
is ".".  Since the dir entries are _leaves_ in the tree, there is no
necessity for sorting them specially.  They will usually appear first,
but people to all sorts of things, so filenames starting with "!"
might still come before them.

So the sorted flat file list for the above would be
.    [dir]
a.c  [file]
a/   [tree]
a/.  [dir]
a/c  [file]
abc  [file]

Note that a tree is basically just a string arrangement tool which
gets only incidentally mapped to directories when checking out.

So I am quite unhappy that 040000 is already taken by it.  I can't
even say, "ok, let . look like an empty tree" because there should not
be something like an empty tree!  I find the correlation empty->gone
very important.

> [ Yeah, yeah, we could make a special case and just say "the empty
> tree sorts differently", but that actually results in huge problems
> when doing a "diff" between two trees: our diff machinery very much
> depends on the fact that the index and the trees always sort the
> same way, and if we sorted the "a" entry (when it is an empty
> directory) differently from the "a" entry (when it has entries in
> it), that would just be insane and cause no end of trouble for
> comparing two trees - one with an empty directory and one with
> content added to that directory.

It appears to me like our ideas are still out of sync: a directory
under my scheme is _not_ at all an empty tree, rather it is an entry
_inside_ of a tree, making the tree non-empty (which means that git
will not be tempted to delete the corresponing real-world directory
_until_ one deletes the directory entry keeping the tree alive).

>   So the sorting is doubly important: it's what makes "one content"
>   always have the same SHA1, but it is also much easier and
>   efficient to compare directories when we know they are sorted the
>   same way. ]
>
> It's *probably* just a few lines of code, and it actually would
> result in some nice changes ("git ls-files" would show a '/' at the
> end of an empty directory entry, for example), so this is not a big
> deal, but it's an example of how subtly different a directory is
> from a file when it comes to git.

Linus, a directory is simply non-existent inside of git.  Trees are an
indexing mechanism solely determined by their content.  That is not a
subtle difference.  Git _uses_ directories when exporting in order to
simulate a flat namespace.  But it is internally oblivious to their
existence.  And that is a perfectly elegant and reasonable approach
and I like it very much and don't want to mess with it at all.

But I also want to have directories represented within git, because
not doing so leads to awkward problems.  And the proper way as I see
it is _not_ to mess with trees and stick them with "stay when empty"
flags or similar.  This messes up the whole elegance of git's flat
name space.  The proper way is to create a distinct object that
represents a physical directory.  We don't need to represent the
contents of it: those are already tracked in the flat namespace fine,
with trees serving as an implementation detail.

All we need to represent is ".".

So git-ls-files on
.    [dir]
a.c  [file]
a/   [tree]
a/.  [dir]
a/c  [file]
abc  [file]

should likely list

.
a.c
a/.
a/c
abc

If one wants to see the _tree_ because of its SHA1, it may also be
listed.  The SHA1 of a _directory_ like a/., in contrast, is
uninteresting: it will be the same for every directory.

Whether the _tree_ is listed as "a" or "a/" is probably a matter of
taste.  Personally, I think "a/" is better for bringing across the
notion that it is a structuring device not really related to the
physical _directory_ a which is _identical_ (meaning inode-identical,
which is what counts in the physical world) to "a/." even though it is
another name of it.

And using "a/" puts it closer to its natural sort order.

I'd write up a philosophy paper about git's relation between trees,
files, directories if that were not utterly preposterous.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21 17:38                           ` David Kastrup
@ 2007-07-21 17:52                             ` Simon 'corecode' Schubert
  2007-07-21 18:08                               ` David Kastrup
  2007-07-21 23:50                             ` Linus Torvalds
  2007-07-22  4:00                             ` Brian Gernhardt
  2 siblings, 1 reply; 156+ messages in thread
From: Simon 'corecode' Schubert @ 2007-07-21 17:52 UTC (permalink / raw
  To: David Kastrup; +Cc: Linus Torvalds, git

David Kastrup wrote:
> But I also want to have directories represented within git, because
> not doing so leads to awkward problems.  And the proper way as I see
> it is _not_ to mess with trees and stick them with "stay when empty"
> flags or similar.  This messes up the whole elegance of git's flat
> name space.  The proper way is to create a distinct object that
> represents a physical directory.  We don't need to represent the
> contents of it: those are already tracked in the flat namespace fine,
> with trees serving as an implementation detail.
> 
> All we need to represent is ".".

What I still don't get is:  How do you carry this information about "this directory should not be removed" from one checkout to the next commit?  When creating a .gitignore, this file exists in the workdir.  Of course you add some data to the index to stage it.  But how does this work with your "." "file"?  You can't put that in the filesystem.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21 17:52                             ` Simon 'corecode' Schubert
@ 2007-07-21 18:08                               ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-21 18:08 UTC (permalink / raw
  To: Simon 'corecode' Schubert; +Cc: Linus Torvalds, git

Simon 'corecode' Schubert <corecode@fs.ei.tum.de> writes:

> David Kastrup wrote:
>> But I also want to have directories represented within git, because
>> not doing so leads to awkward problems.  And the proper way as I see
>> it is _not_ to mess with trees and stick them with "stay when empty"
>> flags or similar.  This messes up the whole elegance of git's flat
>> name space.  The proper way is to create a distinct object that
>> represents a physical directory.  We don't need to represent the
>> contents of it: those are already tracked in the flat namespace fine,
>> with trees serving as an implementation detail.
>>
>> All we need to represent is ".".
>
> What I still don't get is: How do you carry this information about
> "this directory should not be removed" from one checkout to the next
> commit?

I don't.  The only information in the file system is whether a
directory exists or not.  "Should not removed" is not a property that
is tracked.

> When creating a .gitignore, this file exists in the workdir.  Of
> course you add some data to the index to stage it.  But how does
> this work with your "." "file"?  You can't put that in the
> filesystem.

Either the directory is in the file system or it is not.  Like with
every other file.  And either git tracks the directory, in which case
it will notice its addition (when doing git-add) and removal (when
doing git-rm or git-commit -a) or git doesn't track the directory.

When git tracks the directory (a matter of gitignore settings for
implicit tracking, and git-add for explicit tracking), and considers
it existent, it will not touch it.  If it tracks it but considers it
removed in particular commit, it will attempt to remove it.

    Fineprint: actually, things are more involved here: git does not
    actually attempt to remove directories at the time it deletes them
    from the tree: this is sort of pointless since the sort order
    means that there might still be files it needs to take out from
    the physical directory).  Instead, like before, git attempts to
    remove a physical directory whenever the corresponding tree in git
    becomes empty, and it is a prerequisite to delete a possibly
    tracked directory from it.

After it has attempted to remove it, it will leave it alone since it
is now no longer tracking it.  If you add and remove a contained file,
it will again try to remove the directory.  If you add _both_
directory and a contained file, just removing the contained file will
not make git attempt to delete the directory.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21 17:38                           ` David Kastrup
  2007-07-21 17:52                             ` Simon 'corecode' Schubert
@ 2007-07-21 23:50                             ` Linus Torvalds
  2007-07-22  0:18                               ` David Kastrup
  2007-07-22  0:34                               ` David Kastrup
  2007-07-22  4:00                             ` Brian Gernhardt
  2 siblings, 2 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-21 23:50 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Sat, 21 Jul 2007, David Kastrup wrote:
> 
> tree1:
>      0?0000 .   -> dir1
>      100644 a.c -> blob1
>      040000 a   -> tree2
>      100644 abc -> blob3

No. Totally broken. That "." entry not only doesn't buy you anything, it 
is *impossible*. You  cannot make an object point to itself. Not possible.

Tell me how to calculate the SHA1 for the result. Also, tell me what the 
*point*  is. There is none.

> Linus, a directory is simply non-existent inside of git. 

You need to learn git first.

A directory doesn't exist IN THE INDEX (until my patches). But you need to 
learn about the object database and the SHA1's. That's the real meat of 
git, and it sure as hell knows about directories.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21 23:50                             ` Linus Torvalds
@ 2007-07-22  0:18                               ` David Kastrup
  2007-07-22  0:37                                 ` Linus Torvalds
  2007-07-22  1:16                                 ` Jakub Narebski
  2007-07-22  0:34                               ` David Kastrup
  1 sibling, 2 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-22  0:18 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sat, 21 Jul 2007, David Kastrup wrote:
>> 
>> tree1:
>>      0?0000 .   -> dir1
>>      100644 a.c -> blob1
>>      040000 a   -> tree2
>>      100644 abc -> blob3
>
> No. Totally broken. That "." entry not only doesn't buy you
> anything, it is *impossible*. You cannot make an object point to
> itself. Not possible.

It does not point to itself.  The name "." points to an entry of type
"dir", no content is involved.  trees in the repository have content,
and _only_ content.  directories in the repository imply existence,
and _only_ existence.

> Tell me how to calculate the SHA1 for the result.

Since "." has no content (as long as we don't decide to track any file
permissions at one point of time), _all_ entries "." will have the
same SHA1.

> Also, tell me what the *point* is. There is none.

The point is to have a reflection of the physical existence of a
directory.  Not just as a manner of accommodating slashes in a flat
filespace, allowing certain slash-related operations to be carried out
efficiently.

>> Linus, a directory is simply non-existent inside of git.
>
> You need to learn git first.
>
> A directory doesn't exist IN THE INDEX (until my patches). But you
> need to learn about the object database and the SHA1's. That's the
> real meat of git, and it sure as hell knows about directories.

I have written up a complete explanation about the underlying concept
in a separate thread, maybe it would make sense reading that before
investing too much time meddling over details that don't fit the large
picture.  The point is that the object database and the SHA1 values
track _trees_, not _directories_.  And a _tree_ is just a hashing
mechanism in the repository for files.  Its existence is solely
dependent on the existence of its contents.  The only synchronization
with directories is that when a tree becomes empty, git attempts to do
an rmdir on the corresponding directory.  And of course, if git needs
to check out a file, it creates the necessary parent directories.

Now since the physical _contents_ of a directory are already tracked
in _trees_ by git, the only missing part is the _existence_ of the
directory itself: a directory must exist as long as there is a tree
(and thus content) connected with it, but the reverse does not hold:
without a tree, the directory can still exist.  Which we can represent
by a repository entry named "." without content (the content is
already catered for by the _tree_).  This must _not_ be represented by
a _tree_ node since there is no content, and a tree without content by
_definition_ does not exist.

I must be really bad at explaining things, or I am losing a fight
against preconceptions fixed beyond my imagination.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21 23:50                             ` Linus Torvalds
  2007-07-22  0:18                               ` David Kastrup
@ 2007-07-22  0:34                               ` David Kastrup
  1 sibling, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-22  0:34 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sat, 21 Jul 2007, David Kastrup wrote:
>
>> Linus, a directory is simply non-existent inside of git.
>
> You need to learn git first.
>
> A directory doesn't exist IN THE INDEX (until my patches). But you
> need to learn about the object database and the SHA1's. That's the
> real meat of git, and it sure as hell knows about directories.

To put it in another way: what would happen if trees were removed from
git's repository completely?  Instead we would just stipulate that git
should only track files, not trees, and that it would remove an
outside directory when removing the last file from the repository that
can't be accomodated without such a directory.

Now the effect would be that git would become quite inefficient.  But
it would not change its behavior in any other way.  Because it knows
_zilch_ about directories.  It knows about the hierarchy of the
_contents_, but the directories, the physical entities in the work
tree?  It deduces a convenient point of time to try deleting them
(when a tree collapses), and it deduces that they are there as long as
it is tracking their content, but no information about a _directory_
other than its _contents_ ever enter the repository or index.  About
its _existence_, git only keeps circumstantial evidence.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  0:18                               ` David Kastrup
@ 2007-07-22  0:37                                 ` Linus Torvalds
  2007-07-22  1:05                                   ` David Kastrup
  2007-07-22  1:16                                 ` Jakub Narebski
  1 sibling, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-22  0:37 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Sun, 22 Jul 2007, David Kastrup wrote:
> 
> I must be really bad at explaining things, or I am losing a fight
> against preconceptions fixed beyond my imagination.

I really dont' see the point. But hey, code talks. 

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  0:37                                 ` Linus Torvalds
@ 2007-07-22  1:05                                   ` David Kastrup
  2007-07-22  1:41                                     ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-22  1:05 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 22 Jul 2007, David Kastrup wrote:
>> 
>> I must be really bad at explaining things, or I am losing a fight
>> against preconceptions fixed beyond my imagination.
>
> I really dont' see the point. But hey, code talks. 

Yes, I am working on that.  It would have been nice if IS_DIR was not
already taken by trees, but one can't have everything.  So I need to
decide how to represent the node, and it would appear that I need to
angle for "file" after all.  Since it is really quite closer to a file
or symlink than to a tree or project.  Hm, perhaps a symlink might be
more expedient.  Make it have an empty reference, and it is unique.
And there will be fewer places in the code manipulating symlinks than
files.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  0:18                               ` David Kastrup
  2007-07-22  0:37                                 ` Linus Torvalds
@ 2007-07-22  1:16                                 ` Jakub Narebski
  2007-07-22  1:39                                   ` David Kastrup
  1 sibling, 1 reply; 156+ messages in thread
From: Jakub Narebski @ 2007-07-22  1:16 UTC (permalink / raw
  To: git

David Kastrup wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
>> On Sat, 21 Jul 2007, David Kastrup wrote:

>>> Linus, a directory is simply non-existent inside of git.
>>
>> You need to learn git first.
>>
>> A directory doesn't exist IN THE INDEX (until my patches). But you
>> need to learn about the object database and the SHA1's. That's the
>> real meat of git, and it sure as hell knows about directories.
> 
> I have written up a complete explanation about the underlying concept
> in a separate thread, maybe it would make sense reading that before
> investing too much time meddling over details that don't fit the large
> picture.  The point is that the object database and the SHA1 values
> track _trees_, not _directories_.  And a _tree_ is just a hashing
> mechanism in the repository for files.  Its existence is solely
> dependent on the existence of its contents.  The only synchronization
> with directories is that when a tree becomes empty, git attempts to do
> an rmdir on the corresponding directory.  And of course, if git needs
> to check out a file, it creates the necessary parent directories.
> 
> Now since the physical _contents_ of a directory are already tracked
> in _trees_ by git, the only missing part is the _existence_ of the
> directory itself: a directory must exist as long as there is a tree
> (and thus content) connected with it, but the reverse does not hold:
> without a tree, the directory can still exist.  Which we can represent
> by a repository entry named "." without content (the content is
> already catered for by the _tree_).  This must _not_ be represented by
> a _tree_ node since there is no content, and a tree without content by
> _definition_ does not exist.
> 
> I must be really bad at explaining things, or I am losing a fight
> against preconceptions fixed beyond my imagination.

I don't understand you, or you don't understand git. "Tree" object
in object database (in repository) represents a directory in the
working area. There was never any problem with having empty trees
in object database, or having links to empty directory in the superdir.
We don't have to change anything about object database.

The problems with git problems with empty directories stems from the
fact that index didn't have directories. Index is flattened version
of root tree, and before subproject support it contained _only_ info
about blobs (file contents). At least till Linus patch...
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  1:16                                 ` Jakub Narebski
@ 2007-07-22  1:39                                   ` David Kastrup
  2007-07-22 12:06                                     ` Jakub Narebski
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-22  1:39 UTC (permalink / raw
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> David Kastrup wrote:
>
>> I must be really bad at explaining things, or I am losing a fight
>> against preconceptions fixed beyond my imagination.
>
> I don't understand you, or you don't understand git. "Tree" object
> in object database (in repository) represents a directory in the
> working area. There was never any problem with having empty trees in
> object database, or having links to empty directory in the superdir.
> We don't have to change anything about object database.

I disagree here.  The object database _can_ represent an _empty_
directory that has been added explicitly, because up to now no
operations existed that actually left an empty tree.  But it can't
distinguish a _non_-empty directory that has been added explicitly
from non-empty directory that has not been added explicitly.

To wit: after the sequence

mkdir a
touch a/b
git-add a
git-commit -m x
git-rm a/b
git-commit -m x

I expect git to retain an empty directory a.  But the _tree_ now can't
be different from the tree in the situation

mkdir a
touch a/b
git-add a/b
git-commit -m x
git-rm a/b
git-commit -m x

because after step 1, the trees have identical contents, and so there
is nothing at the _identical_ step 2 that could cause different
behavior.

But in the second case, git must _not_ retain a.  So we need to record
the information that in the first case, a was added explicitly.  And
this can't be done with the current repository layout.  It doesn't buy
us anything that we _have_ a representation available for an _empty_
tree added explicitly.  We need this "added explicitly" information
for _every_ tree, not just empty ones.

And a perfectly consistent way is to make those trees with an
explicitly added directory _non-empty_, by virtue of putting a file
"." in them.  This file, of course, exists in every physical
directory, but we may or may not decide to let it be tracked by git,
using the gitignore mechanism on the pattern ".".  Perfectly
expedient.

> The problems with git problems with empty directories stems from the
> fact that index didn't have directories.

That basically implies that no information about directories could be
tracked in the repository.  And yes, we need appropriate information
in the index.  Again, the information whether a directory was added
explicitly.

> Index is flattened version of root tree, and before subproject
> support it contained _only_ info about blobs (file contents).

And the repository is a versioned and hierarchically hashed version of
the index, but its trees contain _no_ information that is not already
inherently represented by the files alone.  Permitting empty trees
would change that fundamental property, and it would not buy us the
ability to actually track directories: see above.  So it is not worth
the trouble to assign any meaningful concept to persisting empty trees
rather than make them a case for git-fsck.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  1:05                                   ` David Kastrup
@ 2007-07-22  1:41                                     ` Linus Torvalds
  2007-07-22  2:39                                       ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-22  1:41 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Sun, 22 Jul 2007, David Kastrup wrote:
>  Make it have an empty reference, and it is unique.

I *really* don't see the point.

And you seem to have igored totally my treatise on "content" and how the 
stuff git tracks must be stuff that is visible and detectable in the 
trees. And if I understand you correctly, you also wouldn't be backwards 
compatible. 

IOW, there's a lot of "why's" at all levels.

I don't see the *point*. What's the problem you're trying to solve?

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  1:41                                     ` Linus Torvalds
@ 2007-07-22  2:39                                       ` David Kastrup
  2007-07-22  3:43                                         ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-22  2:39 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 22 Jul 2007, David Kastrup wrote:
>>  Make it have an empty reference, and it is unique.
>
> I *really* don't see the point.
>
> And you seem to have igored totally my treatise on "content" and how
> the stuff git tracks must be stuff that is visible and detectable in
> the trees.

Oh please.  Just because you refuse to read a point-to-point reply
does not mean it has not been made.

"." _is_ visible and detectable in every tree.  But that does not mean
it is automatically tracked by git unless it gets added explicitly, or
implicitly (as long as the gitignore mechanism does not kick in) by
adding a higher level directory.

If a file does not get added explicitly or implicitly, it does not end
up in the repository and git behaves like it knows nothing about it.

And that's just the way it is going to be with directories.  Nothing
more, nothing less, nothing new.

> And if I understand you correctly, you also wouldn't be backwards
> compatible.

Define backwards compatible.  Anyway, you are the repository wizard:
here are the semantics I need supported for backwards compatibility:

I need an entry type in the index and in the repository with the
following features:

a) if part of a tree, the tree is not considered empty.  Should be
   easy.
b) it has the name ".".  This is not absolutely necessary, but it
   means that the gitignore mechanism can be used for dealing with it,
   and that's intuitive and has exactly the expressive power required
   for the job.  Now the gitignore mechanism is isolated very locally
   in dir.c: whether one makes the actual representation in the
   repository based on an attribute like "filemode" rather than on a
   separate entry does not actually complicate the code all too much.
   There is, however, some level of complication since the consulted
   .gitignore file for ignoring "." must, of course, be the .gitignore
   file situated _in_ the directory.  So making "." sit _in_ the tree
   rather than _on_ the tree simplifies the code considerably.  It is
   a small amount of code, nevertheless, so it is not a major
   strategic decision.

   One conceivable implementation would be indeed similar to what the
   "filemode" thing does: let us keep open the option to track, at one
   time, permissions.  The current format has, as far as I understand,
   all zeros in the permissions field of trees (I have not checked,
   though).  Now if we stipulate that this is the kind of directory
   permissions we will in all eternity _not_ support outside of git,
   we are all set with regard to backwards compatibility: a tree with
   permissions all zero will behave as previously: it will get removed
   when it becomes empty (taking the corresponding work tree directory
   with it, if possible).  And that's it.  But a tree with nonzero
   permissions (whether they correspond to outward permissions or are
   just a placeholder) will _not_ evaporate when becoming empty.  It
   will be possible to explicitly or implicitly delete it: that will
   just set its permissions all to zero so that it has the chance to
   evaporate next time it becomes empty.

> IOW, there's a lot of "why's" at all levels.
>
> I don't see the *point*. What's the problem you're trying to solve?

rm -rf ./*
git-commit -m "all empty" -a
unzip /tmp/something-with-empty-dirs.zip
git-add .
git-commit -m "something-with-empty-dirs"
git-checkout HEAD~1
# Now I don't want empty directories and their parents lying around.
git-checkout master
# Now the state after unzip should be restored faithfully
rm -rf ./*
unzip /tmp/something-else-with-empty-dirs
git-commit -a -m "something-else"
# Now I want to have the state of something-else registered faithfully
# even if it contains top-level files and directories not present in
# something-with-empty-dirs, because supposedly . is being tracked,
# not just every file element in it.

Actually, oops.  This last criterion is not met when .'s relation to
the tree is such that it is only considered _part_ of tree.

Looks like it might be prudent to focus on the permissions-coupled
representation.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  2:39                                       ` David Kastrup
@ 2007-07-22  3:43                                         ` Linus Torvalds
  2007-07-22  4:28                                           ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-22  3:43 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Sun, 22 Jul 2007, David Kastrup wrote:
>
> "." _is_ visible and detectable in every tree.

I'm going to add you to my "clueless" filter, because it's not worth my 
time to answr you any more.

I told you. Several times. That "." is pointless exactly because it's in 
_every_ tree, and as such is no longer "content". It's not something that 
the user can care about, because it has no meaning. There's no point in 
tracking it, because even if we do *not* track it, it's there, and we 
cannot do anything about it.

That was the whole difference between "." and ".gitignore", and I 
explicitly pointed out that that was the difference (and the _only_ one), 
and why it mattered.

And you didn't listen. And now you claim that I don't read your emails. I 
do. They just don't make any sense.

Consider this discussion ended. I simply don't care any more.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21 17:38                           ` David Kastrup
  2007-07-21 17:52                             ` Simon 'corecode' Schubert
  2007-07-21 23:50                             ` Linus Torvalds
@ 2007-07-22  4:00                             ` Brian Gernhardt
  2 siblings, 0 replies; 156+ messages in thread
From: Brian Gernhardt @ 2007-07-22  4:00 UTC (permalink / raw
  To: David Kastrup; +Cc: Linus Torvalds, git


On Jul 21, 2007, at 1:38 PM, David Kastrup wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> So git filenames are very much a "stream of bytes", not anything
>> else. And they need to sort 100% reliably, always the same way, and
>> never with any localized meaning.
>
> There is some utf-8/Unicode trouble to be expected in connection with
> that eventually: some, but not all operating and/or file systems
> canonicalize file names, replacing accented letters by a combining
> accent and the letter.  But that's beside the point.

This issue exists today.  OS X does a number of things to filenames,  
one of which is normalizing all UTF.  The resulting error is wholly  
non-intuitive, but easy to solve.  Git thinks both that the file  
exists under the name it expects and that the file is being ignored  
as the name OS X uses.  The solution is to put the OS X normalized  
form into .git/info/exclude.  Any other solution involves platform- 
dependent hackery and inclusion of Unicode libraries.  I perused this  
for a short while some months ago, but was convinced to leave it be.

~~ Brian

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  3:43                                         ` Linus Torvalds
@ 2007-07-22  4:28                                           ` David Kastrup
  2007-07-22  6:38                                             ` david
                                                               ` (3 more replies)
  0 siblings, 4 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-22  4:28 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 22 Jul 2007, David Kastrup wrote:
>>
>> "." _is_ visible and detectable in every tree.
>
> I'm going to add you to my "clueless" filter, because it's not worth
> my time to answr you any more.

Too bad I can't do the same.

> I told you. Several times. That "." is pointless exactly because
> it's in _every_ tree, and as such is no longer "content".

"." is in every _non-empty_ directory tree.  But we are talking about
permitting _empty_ trees in the repository.  And for an empty tree in
the repository, "." may or may not be in the corresponding work
directory tree, depending on whether the directory exists or not.  So
when we are talking about a repository tree _becoming_ empty, we need
the information whether or whether not we should remove it upon
becoming empty.  _That_ is the information content of "." being or not
being considered part of the trackable material.  And the information
is no longer available at the time the repository tree becomes empty
_unless_ we already store it there when the tree is still populated.

> It's not something that the user can care about, because it has no
> meaning. There's no point in tracking it, because even if we do
> *not* track it, it's there, and we cannot do anything about it.

Ok, here we go _again_.  Test case 1:

mkdir a
touch a/b
git-add a/b
git-commit -m x
git-rm a/b
git-commit -m x

Now we want to have the directory a _removed_.

Test case 2:

mkdir a
touch a/b
git-add a
git-commit -m x
git-rm a/b
git-commit -m x

Now we want to have the directory a _retained_.

After the first commit in _both_ test cases, the only file in the
trees / and /a is a/b.  The working directory state is _identical_ at
this point, and we do identical commands afterwards.

The end result is not identical, so there must be some information
different in the repository after the first commit.  This information
_can't_ be encoded in a remaining empty tree, because both the trees /
and /a are _non_-empty yet.

So we _must_ encode the evaporate-or-not-when-empty information
_otherwise_ into the repository.  And we do that by _not_ having
/a/. in the set of tracked files in test case 1, and by _having_ it in
the set of tracked files in test case 2.

> That was the whole difference between "." and ".gitignore", and I
> explicitly pointed out that that was the difference (and the _only_
> one), and why it mattered.

You are underestimating the power of ".gitignore": while it is true
that its _physical_ presence will reliably keep git from removing the
directory, its physical presence is not _actually_ required.

It is sufficient that git _believes_ in its continuing physical
existence.  And if we tell it "it is still there" whenever it takes a
look, then git will keep the record of .gitignore in its tree, and
consequently won't remove the tree and not try deleting the directory.
However, once we explicitly tell it "remove the record of .gitignore
from the repository", it will do so, and in the course of doing so
remove the directory in the work directory together with the tree in
the repository.

>From a user interface and logical standpoint, adding or not adding "."
to the tracked content is a perfectly consistent and convenient way of
having the directory kept around or not.

>From the viewpoint of the internal data structures, I'll likely go
with tampering with (pseudo-)permissions.

> And you didn't listen. And now you claim that I don't read your
> emails. I do. They just don't make any sense.
>
> Consider this discussion ended. I simply don't care any more.

It is painfully clear that I could invest a few weeks of time in
coding better than in explaining stuff.  And I guess that's what I'll
have to do.  And afterwards it will be your job to wrack your head
about why something does all the right things for the wrong reasons
and come up with a different explanation how and why the code works.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  4:28                                           ` David Kastrup
@ 2007-07-22  6:38                                             ` david
  2007-07-22  9:08                                               ` David Kastrup
  2007-07-22 17:28                                             ` Linus Torvalds
                                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 156+ messages in thread
From: david @ 2007-07-22  6:38 UTC (permalink / raw
  To: David Kastrup; +Cc: Linus Torvalds, git

On Sun, 22 Jul 2007, David Kastrup wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> On Sun, 22 Jul 2007, David Kastrup wrote:
>>>
>>> "." _is_ visible and detectable in every tree.
>>
>> I'm going to add you to my "clueless" filter, because it's not worth
>> my time to answr you any more.
>
> Too bad I can't do the same.
>
>> I told you. Several times. That "." is pointless exactly because
>> it's in _every_ tree, and as such is no longer "content".
>
> "." is in every _non-empty_ directory tree.  But we are talking about
> permitting _empty_ trees in the repository.  And for an empty tree in
> the repository, "." may or may not be in the corresponding work
> directory tree, depending on whether the directory exists or not.  So
> when we are talking about a repository tree _becoming_ empty, we need
> the information whether or whether not we should remove it upon
> becoming empty.  _That_ is the information content of "." being or not
> being considered part of the trackable material.  And the information
> is no longer available at the time the repository tree becomes empty
> _unless_ we already store it there when the tree is still populated.

David, the point where you and Linus are talking past each other is that 
Linus is assuming that you only want to track some specific directories, 
and for that tracking "." doesn't work becouse it's in every directory

you apparently consider every directory equal and therefor the fact that 
"." exists in every directory doesn't bother you becouse you want to track 
every directory.

what you are not hearing is that while Linus and the other git developers 
can see reasons to track directories sometimes, they definantly don't 
agree that you want to track directories all the time.

sometimes the fact that a directory exists is significant, most of the 
time it's not. and the difference between what is and what isn't 
significant isn't a per-repository or per-project thing, it's a 
per-directory thing.

in one repository you will have some directories that only exist becouse 
files are in them, and you may have some directories that exist becouse 
you explicitly want them to exist.

both types have the "." file in them (or appear to, some OS's/filesystems 
don't actually have a "." on disk, they add it when needed when reporting 
to userspace), so git has no way to tell which ones you explicitly want 
tracked.

creating .gitignore in the directories that you want tracked lets the 
other directories not be trackes.

David Lang

>> It's not something that the user can care about, because it has no
>> meaning. There's no point in tracking it, because even if we do
>> *not* track it, it's there, and we cannot do anything about it.
>
> Ok, here we go _again_.  Test case 1:
>
> mkdir a
> touch a/b
> git-add a/b
> git-commit -m x
> git-rm a/b
> git-commit -m x
>
> Now we want to have the directory a _removed_.
>
> Test case 2:
>
> mkdir a
> touch a/b
> git-add a
> git-commit -m x
> git-rm a/b
> git-commit -m x
>
> Now we want to have the directory a _retained_.
>
> After the first commit in _both_ test cases, the only file in the
> trees / and /a is a/b.  The working directory state is _identical_ at
> this point, and we do identical commands afterwards.
>
> The end result is not identical, so there must be some information
> different in the repository after the first commit.  This information
> _can't_ be encoded in a remaining empty tree, because both the trees /
> and /a are _non_-empty yet.
>
> So we _must_ encode the evaporate-or-not-when-empty information
> _otherwise_ into the repository.  And we do that by _not_ having
> /a/. in the set of tracked files in test case 1, and by _having_ it in
> the set of tracked files in test case 2.
>
>> That was the whole difference between "." and ".gitignore", and I
>> explicitly pointed out that that was the difference (and the _only_
>> one), and why it mattered.
>
> You are underestimating the power of ".gitignore": while it is true
> that its _physical_ presence will reliably keep git from removing the
> directory, its physical presence is not _actually_ required.
>
> It is sufficient that git _believes_ in its continuing physical
> existence.  And if we tell it "it is still there" whenever it takes a
> look, then git will keep the record of .gitignore in its tree, and
> consequently won't remove the tree and not try deleting the directory.
> However, once we explicitly tell it "remove the record of .gitignore
> from the repository", it will do so, and in the course of doing so
> remove the directory in the work directory together with the tree in
> the repository.
>
> From a user interface and logical standpoint, adding or not adding "."
> to the tracked content is a perfectly consistent and convenient way of
> having the directory kept around or not.
>
> From the viewpoint of the internal data structures, I'll likely go
> with tampering with (pseudo-)permissions.
>
>> And you didn't listen. And now you claim that I don't read your
>> emails. I do. They just don't make any sense.
>>
>> Consider this discussion ended. I simply don't care any more.
>
> It is painfully clear that I could invest a few weeks of time in
> coding better than in explaining stuff.  And I guess that's what I'll
> have to do.  And afterwards it will be your job to wrack your head
> about why something does all the right things for the wrong reasons
> and come up with a different explanation how and why the code works.
>
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  6:38                                             ` david
@ 2007-07-22  9:08                                               ` David Kastrup
  2007-07-22 17:30                                                 ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-22  9:08 UTC (permalink / raw
  To: david; +Cc: Linus Torvalds, git

david@lang.hm writes:

> On Sun, 22 Jul 2007, David Kastrup wrote:
>
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>>
>>> I told you. Several times. That "." is pointless exactly because
>>> it's in _every_ tree, and as such is no longer "content".
>>
>> "." is in every _non-empty_ directory tree.  But we are talking
>> about permitting _empty_ trees in the repository.  And for an empty
>> tree in the repository, "." may or may not be in the corresponding
>> work directory tree, depending on whether the directory exists or
>> not.  So when we are talking about a repository tree _becoming_
>> empty, we need the information whether or whether not we should
>> remove it upon becoming empty.  _That_ is the information content
>> of "." being or not being considered part of the trackable
>> material.  And the information is no longer available at the time
>> the repository tree becomes empty _unless_ we already store it
>> there when the tree is still populated.
>
> David, the point where you and Linus are talking past each other is
> that Linus is assuming that you only want to track some specific
> directories, and for that tracking "." doesn't work becouse it's in
> every directory
>
> you apparently consider every directory equal and therefor the fact
> that "." exists in every directory doesn't bother you becouse you
> want to track every directory.

Sigh.  No, I don't want to track every directory.  I want to have
every directory _trackable_.  Whether it is _tracked_ depends on
whether you _add_ it to the index.  And that depends, among other
things, on the gitignore patterns, and those can be specified on a
per-directory, per-project, per-user preference.

> what you are not hearing is that while Linus and the other git
> developers can see reasons to track directories sometimes, they
> definantly don't agree that you want to track directories all the
> time.

And that is why one can use per-directory, per-project and per-user
settings to turn the tracking off, _and_ one can decide at what level
one adds information to the index.  If you always make it a habit to
only ever use git-add -f and git-rm -f on _files_ and never on
directories, you won't _ever_ see a difference on whether directories
are tracked, and the contents of .gitignore won't make a difference,
either.

But if you use git-add and git-rm on directories, then for the
specified directory and its children, .gitignore gets consulted.

> sometimes the fact that a directory exists is significant, most of
> the time it's not. and the difference between what is and what isn't
> significant isn't a per-repository or per-project thing, it's a
> per-directory thing.

Which is why one can control it per-directory using either the
.gitignore mechanism _or_ by including the directory level in question
in the git-add and git-rm commands or not.

> in one repository you will have some directories that only exist
> becouse files are in them, and you may have some directories that
> exist becouse you explicitly want them to exist.
>
> both types have the "." file in them (or appear to, some
> OS's/filesystems don't actually have a "." on disk, they add it when
> needed when reporting to userspace), so git has no way to tell which
> ones you explicitly want tracked.

Like with any other file, git _has_ a way to tell.  If I don't git-add
or git-rm the directory or one of its parents to the index, I don't
want to have it tracked.  And if I add the directory or one of its
parents to the index recursively, but it is covered by .gitignore, I
don't want to have it tracked.

It is a pity that you have seemingly not read on, because there
follows a simple example:

>> Ok, here we go _again_.  Test case 1:
>>
>> mkdir a
>> touch a/b
>> git-add a/b
>> git-commit -m x
>> git-rm a/b
>> git-commit -m x
>>
>> Now we want to have the directory a _removed_.
>>
>> Test case 2:
>>
>> mkdir a
>> touch a/b
>> git-add a
>> git-commit -m x
>> git-rm a/b
>> git-commit -m x
>>
>> Now we want to have the directory a _retained_.
>>
>> After the first commit in _both_ test cases, the only file in the
>> trees / and /a is a/b.  The working directory state is _identical_ at
>> this point, and we do identical commands afterwards.
>>
>> The end result is not identical, so there must be some information
>> different in the repository after the first commit.  This information
>> _can't_ be encoded in a remaining empty tree, because both the trees /
>> and /a are _non_-empty yet.
>>
>> So we _must_ encode the evaporate-or-not-when-empty information
>> _otherwise_ into the repository.  And we do that by _not_ having
>> /a/. in the set of tracked files in test case 1, and by _having_ it in
>> the set of tracked files in test case 2.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  1:39                                   ` David Kastrup
@ 2007-07-22 12:06                                     ` Jakub Narebski
  2007-07-22 13:53                                       ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Jakub Narebski @ 2007-07-22 12:06 UTC (permalink / raw
  To: David Kastrup; +Cc: git, Linus Torvalds

On Sun, 22 July 2007, David Kastrup wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>> David Kastrup wrote:
>>
>>> I must be really bad at explaining things, or I am losing a fight
>>> against preconceptions fixed beyond my imagination.

Or you are wrong...

>> I don't understand you, or you don't understand git. "Tree" object
>> in object database (in repository) represents a directory in the
>> working area. There was never any problem with having empty trees in
>> object database, or having links to empty directory in the superdir.
>> We don't have to change anything about object database.
> 
> I disagree here.  The object database _can_ represent an _empty_
> directory that has been added explicitly, because up to now no
> operations existed that actually left an empty tree.  But it can't
> distinguish a _non_-empty directory that has been added explicitly
> from non-empty directory that has not been added explicitly.

True. I forgot about that.

Although I'd rather say that we want distinguish between automatically 
cleaned up directory (directory which will be deleted if all files in 
it would be deleted, and would be untracked if all tracked files in it 
would be deleted), and "sticky" directory, which is explicitely tracked 
and have to be explicitely deleted.

The fact that it was added explicitely or non explicitely is orthogonal 
to that.

IMHO it would be best to first provide plumbing infrastructure (as e.g. 
it was the case of submodule support), then add option to 
git-update-index to change the "stickiness"/"autoremoval" status of a 
directory (of a tree), and _last_ think about how to change the 
porcelain (git-add and git-rm).

[...]
> But in the second case, git must _not_ retain a.  So we need to record
> the information that in the first case, a was added explicitly.  And
> this can't be done with the current repository layout.  It doesn't buy
> us anything that we _have_ a representation available for an _empty_
> tree added explicitly.  We need this "added explicitly" information
> for _every_ tree, not just empty ones.
> 
> And a perfectly consistent way is to make those trees with an
> explicitly added directory _non-empty_, by virtue of putting a file
> "." in them.  This file, of course, exists in every physical
> directory, but we may or may not decide to let it be tracked by git,
> using the gitignore mechanism on the pattern ".".  Perfectly
> expedient.

Here we disagree. I think putting "." in a tree as marker of having it 
not be automatically deleted when empty, as opposed to marking tree 
using filemode in the parent, is not a good idea.

The only advantage to the "." idea is that it can use gitignore 
mechanism (both in-tree .gitignore, tracked or not, and info/exclude 
file). But I also think that the fact that gitignore mechanism is 
recursive is more of disadvantage than advantage.

First, it is _not_ consistent. Working directory trees _always_ have '.' 
in them, while trees would have or would have not it, depending if they 
would be "sticky" or "autoremoved".

Second, the "easy implementation" is anything but easy. "git add ." as
a way to mark directory as "sticky" is not backward compatibile: 
currently it mean to add _all contents_ of current directory. 
Implementation is tricky: as we have seen trying to unlink '.' or 
create '.' can unfortunately succeed on [some Sun OS, and UFS 
filesystem] (which follows POSIX stupidly to the letter) f**king
up the filesystem. The alternative proposal of adding "magic mode" to 
mark directory as "not remove when empty" is largely tested; it is very 
similar to the subproject support.

Third, is contrary to the git philosophy of tracking contents. 
"Stickiness" is an attribute; the fact that directory is explicitely 
tracked or not does not change contents of a directory. Compare to 
'blob' which contains only contents of a file: not a filename, not a 
pathname, not [subset of] filemode.

Fourth, is very artificial. What would you put for filemode for '.'?
040000 (i.e. directory)? What would you put for sha1? Sha1 of an empty 
directory? Of an empty blob? 0{40} (which is bad idea because 
git-diff-tree uses 0{40} to represent 'not existance')?

>> The problems with git problems with empty directories stems from the
>> fact that index didn't have directories.
> 
> That basically implies that no information about directories could be
> tracked in the repository.  And yes, we need appropriate information
> in the index.  Again, the information whether a directory was added
> explicitly.

Whether directory is automatically managed by git (automatically removed 
or untracked). But we need directory entry in index for git-diff, for 
example to recognize if there is or there is not empty directory, or if 
a directory is automanaged or not.
 
>> Index is flattened version of root tree, and before subproject
>> support it contained _only_ info about blobs (file contents).
> 
> And the repository is a versioned and hierarchically hashed version of
> the index, but its trees contain _no_ information that is not already
> inherently represented by the files alone. [...]

The above sentence is nonsensical. Index is helper for repository,
and can be derived from repository. Not vice versa.

Trees do contain information which is not inherently present by the 
blobs.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22 12:06                                     ` Jakub Narebski
@ 2007-07-22 13:53                                       ` David Kastrup
  2007-07-22 20:26                                         ` Jakub Narebski
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-22 13:53 UTC (permalink / raw
  To: Jakub Narebski; +Cc: git


Jakub, this mail is too long already, and it does not make sense to
tack a changed proposal to its end since then the readers will be
exhausted at the time they come there.  So I'll instead tack a
followup to the "big picture" mail instead where I outline a modified
approach which is presumably easier to understand and completely
backwards-compatible, incorporating your feedback.

There is probably little sense in wasting your time on a detailed
response: feel free to point out where you don't see myself making
sense.  I have no problem with people coming to different conclusions
that I do, but I would prefer it if it is not because they consider
myself a raving lunatic, but because they have different opinions
regarding the details.

"I can follow you, but I disagree with your conclusion" is perfectly
fine for now since I am going to propose something else, anyway.

Thanks for the feedback.  It gave me some good ideas.

Jakub Narebski <jnareb@gmail.com> writes:

> On Sun, 22 July 2007, David Kastrup wrote:
>> Jakub Narebski <jnareb@gmail.com> writes:
>>> David Kastrup wrote:
>>>
>>>> I must be really bad at explaining things, or I am losing a fight
>>>> against preconceptions fixed beyond my imagination.
>
> Or you are wrong...

Well, there is little reason for you to take my word on it, but I
happen to have a history of designing and implementing systems where I
have been responsible for every single byte, bootloader, firmware,
applications, target compiler, assembler, whatever.  I have been
exposed to Unix and working with it several years before Linux even
existed.  I also have a track record of being not exactly stupid.

So I pretty much can rule out that I am wrong on the factual side.

But where I may be wrong is in estimating the how obvious the design
can appear to others, and how useful and maintainable for others it
may be in the long run.  Linus says "code talks", but that's actually
not half the story.  If my code says that it works and the evidence is
there, but nobody is able to understand _why_ it works, it has no
place in a project where I am not permanently around.

If smart people don't get what I am talking about, it does not matter
that the patch is surprisingly well-contained: it will be a
maintenance nightmare because people will never figure out why
something stopped working after some particular change.

>> I disagree here.  The object database _can_ represent an _empty_
>> directory that has been added explicitly, because up to now no
>> operations existed that actually left an empty tree.  But it can't
>> distinguish a _non_-empty directory that has been added explicitly
>> from non-empty directory that has not been added explicitly.
>
> True. I forgot about that.

Thanks.  It is almost a revelation that anybody can agree on any point
with me at the moment.

> IMHO it would be best to first provide plumbing infrastructure (as
> e.g.  it was the case of submodule support), then add option to
> git-update-index to change the "stickiness"/"autoremoval" status of
> a directory (of a tree), and _last_ think about how to change the
> porcelain (git-add and git-rm).

Sure.  It does no harm to think about reducing the amount of breaking
porcelain, though.

> [...]
>
>> And a perfectly consistent way is to make those trees with an
>> explicitly added directory _non-empty_, by virtue of putting a file
>> "." in them.  This file, of course, exists in every physical
>> directory, but we may or may not decide to let it be tracked by
>> git, using the gitignore mechanism on the pattern ".".  Perfectly
>> expedient.
>
> Here we disagree. I think putting "." in a tree as marker of having
> it not be automatically deleted when empty, as opposed to marking
> tree using filemode in the parent, is not a good idea.

Well, "not a good idea" is a far step forward from "stupid idiot
babbling nonsense", so we may make progress towards actually being
able to _weigh_ different options.  I can actually associate with "not
a good idea", not least because nobody else seems to get the idea, and
that makes it infeasible for maintenance.

So I'll address some points and then propose a different way of
implementing what will in the end amount to rather similar semantics,
but with a different view of looking at those semantics, one that
corresponds well with the implementation.

> The only advantage to the "." idea is that it can use gitignore
> mechanism (both in-tree .gitignore, tracked or not, and info/exclude
> file). But I also think that the fact that gitignore mechanism is
> recursive is more of disadvantage than advantage.
>
> First, it is _not_ consistent. Working directory trees _always_ have
> '.'  in them, while trees would have or would have not it, depending
> if they would be "sticky" or "autoremoved".

Let me point out again that this inconsistency is already present in
the difference of tracked and untracked _files_: they are always in
the working directory, while trees have or not have them, depending on
whether they are "registered" or "not".

There is no inconsistency involved here, but it seems to make people
_very_ uncomfortable to factor out the "stays around even if empty"
functionality and call it "dir/." from the "can hold content"
functionality which is in effect called "dir/", and basically
associate tracked physical existence just with the former.

The recursiveness of the gitignore mechanism has the advantage that
when maintaining a large repository with actual or logical
subprojects, one does not need to pick a single policy for all
subprojects.  I think that is quite important.  It could possibly be
achieved with some other method of having per-subproject
configuration, but I see little wrong in using what is there and
documented already.

> Second, the "easy implementation" is anything but easy. "git add ."
> as a way to mark directory as "sticky" is not backward compatibile:
> currently it mean to add _all contents_ of current directory.
> Implementation is tricky: as we have seen trying to unlink '.' or
> create '.' can unfortunately succeed on [some Sun OS, and UFS
> filesystem] (which follows POSIX stupidly to the letter) f**king up
> the filesystem.

I was not suggesting actually leaving any such calls in place: after
all, they would presumably lead to error messages.  But I agree that
this could lead to nasty surprises when somebody with a legacy version
of git worked with a repository containing "." as explicit entries of
some file type.

> The alternative proposal of adding "magic mode" to mark directory as
> "not remove when empty" is largely tested; it is very similar to the
> subproject support.

Good.  Because it is what I converged to last night.

> Third, is contrary to the git philosophy of tracking contents.
> "Stickiness" is an attribute; the fact that directory is explicitely
> tracked or not does not change contents of a directory. Compare to
> 'blob' which contains only contents of a file: not a filename, not a
> pathname, not [subset of] filemode.
>
> Fourth, is very artificial. What would you put for filemode for '.'?
> 040000 (i.e. directory)?

Taken already.  By something very artificial, namely a tree...  Yes,
this was a wart in my proposal.

> What would you put for sha1?  Sha1 of an empty directory?

Some fixed value.  Everywhere the same.  Not really relevant.

>> That basically implies that no information about directories could
>> be tracked in the repository.  And yes, we need appropriate
>> information in the index.  Again, the information whether a
>> directory was added explicitly.
>
> Whether directory is automatically managed by git (automatically
> removed or untracked). But we need directory entry in index for
> git-diff, for example to recognize if there is or there is not empty
> directory, or if a directory is automanaged or not.

One conclusion that I have come to (and I think I am in agreement with
Linus here) is that the information "empty or not" is actually useless
separately: when I add files below a directory to the repository, the
directory _can't_ be empty.  And git has no way of knowing whether it
is non-empty because I wanted the directory to be there, or whether it
is non-empty because I could not have checked in the files into the
tree below it otherwise.

>> And the repository is a versioned and hierarchically hashed version
>> of the index, but its trees contain _no_ information that is not
>> already inherently represented by the files alone. [...]
>
> The above sentence is nonsensical. Index is helper for repository,
> and can be derived from repository. Not vice versa.
>
> Trees do contain information which is not inherently present by the 
> blobs.

Could you give examples for such information?  As long as we are not
talking about _history_, I am at a loss at what else you mean.  File
names and permissions?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  4:28                                           ` David Kastrup
  2007-07-22  6:38                                             ` david
@ 2007-07-22 17:28                                             ` Linus Torvalds
  2007-07-22 17:33                                             ` Linus Torvalds
       [not found]                                             ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org>
  3 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-22 17:28 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Sun, 22 Jul 2007, David Kastrup wrote:
> 
> > I told you. Several times. That "." is pointless exactly because
> > it's in _every_ tree, and as such is no longer "content".
> 
> "." is in every _non-empty_ directory tree.

You're pointless.

We have no problems at all with non-empty trees. We know exactly what they 
are. We keep track of them fine, and we do not need a totally pointless 
"." entry for them.

>  But we are talking about
> permitting _empty_ trees in the repository.

And WE ALREADY DO.

The empty tree looks like this: "". It has a SHA1 of 
4b825dc642cb6eb9a060e54bf8d69288fbee4904. It works today, and in fact, git 
uses it already. 

Try this:

	git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904

in the git repository. What do you think that is?

Your "." is *pointless*.

And it's _worse_ than pointless: it's not "content". It doesn't add any 
information. It's not something you can match up  against the working tree 
meaningfully, exactly because *every* working tree has it. As such, it's 
total non-information.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  9:08                                               ` David Kastrup
@ 2007-07-22 17:30                                                 ` Linus Torvalds
  2007-07-22 17:59                                                   ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-07-22 17:30 UTC (permalink / raw
  To: David Kastrup; +Cc: david, git



On Sun, 22 Jul 2007, David Kastrup wrote:
>
> Sigh.  No, I don't want to track every directory.  I want to have
> every directory _trackable_.

And they already are. 

Your point is pointless. You don't understand the git data structures, and 
you are trying to do something that makes no sense.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22  4:28                                           ` David Kastrup
  2007-07-22  6:38                                             ` david
  2007-07-22 17:28                                             ` Linus Torvalds
@ 2007-07-22 17:33                                             ` Linus Torvalds
       [not found]                                             ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org>
  3 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-22 17:33 UTC (permalink / raw
  To: David Kastrup; +Cc: git



On Sun, 22 Jul 2007, David Kastrup wrote:
>
> So  when we are talking about a repository tree _becoming_ empty, we 
> need the information whether or whether not we should remove it upon
> becoming empty.

You don't seem to realize - although I've told you now abotu a million 
times - that what you are talking about is:

 - technically exactly the same as ".gitignore", which for some 
   unfathomable reason you cannot seem to accept.

 - except your use of "." is 100% INFERIOR exactly because the "." entry 
   has no meaning in the target filesystem, so it means that the bit of 
   information is no longer something that is trackable in the working 
   tree.

Quite frankly, Junio would be a total idiot to take any patches that do 
what you want to do. Happily, he is anything but.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22 17:30                                                 ` Linus Torvalds
@ 2007-07-22 17:59                                                   ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-22 17:59 UTC (permalink / raw
  To: Linus Torvalds; +Cc: david, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 22 Jul 2007, David Kastrup wrote:
>>
>> Sigh.  No, I don't want to track every directory.  I want to have
>> every directory _trackable_.
>
> And they already are.

Their contents are.

> Your point is pointless. You don't understand the git data
> structures, and you are trying to do something that makes no sense.

That makes no sense to you and apparently quite a few other people,
after a lot of explaining.  That does not mean that it wouldn't work,
but it does mean that it is going nowhere: it is irrelevant whether I
consider the concept easy to understand and explain when nobody else
does: that makes it unmaintainable.

Fortunately, a few other participants, notably Junio and Jakub, have
focused a bit more on technical details rather than my sanity in their
somewhat more nuanced feedback, and thus I have (in a separate thread)
made a new proposal that addresses a few technical shortcomings and
that does no longer require splitting tree-ness/directory-ness into
separate concepts and records, something which I considered elegant
and others gibberish.

It boils down to encoding the "don't-evaporate-when-empty" or "I told
you to keep track of it" property in the directory access permissions:
if those are zero, git does not track the corresponding directory and
will attempt a remove-on-empty.  If they are non-zero (probably 755 as
long as git stores only a sanitized version of the actual state
there), this means that git has been told to track the directory and
will not attempt to delete it until it is told to stop tracking it
again.

The proposal of allowing "." "!." as a gitignore pattern to specify
the tracking/non-tracking indicator does still stand, but its
semantics are now so much decoupled from that of
"don't-evaporate-when-empty" that the code would not actually overlap
with that of the tracking, and so discussing it is orthogonal to the
actual proposal and can be postponed separately, and an implementation
proferred separately once the rest is in place.

So do both of us a favor and skip the rest of the mail queue with
"Empty directories..." in its title.

Actually, the code (and later comments for it) you produced matches
the areas of work and what I think needs to be done quite closer now
than with my original proposal.

So while the discussion with you has not really been much of a help
except to show without reasonable doubt that my original approach
would have been unmaintainable by other persons, the code _is_ very
helpful.

Thanks,

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
       [not found]                                             ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org>
@ 2007-07-22 18:58                                               ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-22 18:58 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 22 Jul 2007, David Kastrup wrote:
>>
>> So  when we are talking about a repository tree _becoming_ empty, we 
>> need the information whether or whether not we should remove it upon
>> becoming empty.
>
> You don't seem to realize - although I've told you now abotu a million 
> times - that what you are talking about is:
>
>  - technically exactly the same as ".gitignore", which for some 
>    unfathomable reason you cannot seem to accept.

Linus?  Do both of us a favor and forget about the "." proposal.
Since I already dropped it, we can save time if you rant about the
proposal I have replaced it with and call me an idiot for a different
reason.

> Quite frankly, Junio would be a total idiot to take any patches that do 
> what you want to do. Happily, he is anything but.

And he does not come across as one.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22 13:53                                       ` David Kastrup
@ 2007-07-22 20:26                                         ` Jakub Narebski
  2007-07-22 22:57                                           ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Jakub Narebski @ 2007-07-22 20:26 UTC (permalink / raw
  To: David Kastrup; +Cc: git

David Kastrup wrote:
 
> "I can follow you, but I disagree with your conclusion" is perfectly
> fine for now since I am going to propose something else, anyway.
> 
> Thanks for the feedback.  It gave me some good ideas.

You are welcome.
 
> Jakub Narebski <jnareb@gmail.com> writes: 
>> On Sun, 22 July 2007, David Kastrup wrote:
>>> Jakub Narebski <jnareb@gmail.com> writes:
>>>> David Kastrup wrote:
>>>>
>>>>> I must be really bad at explaining things, or I am losing a fight
>>>>> against preconceptions fixed beyond my imagination.
>>
>> Or you are wrong...
> 
> Well, there is little reason for you to take my word on it, but I
> happen to have a history of designing and implementing systems where I
> have been responsible for every single byte, bootloader, firmware,
> applications, target compiler, assembler, whatever.  I have been
> exposed to Unix and working with it several years before Linux even
> existed.  I also have a track record of being not exactly stupid.
> 
> So I pretty much can rule out that I am wrong on the factual side.

Big words.

First, there is little matter of something like area of competence.
You might be systems master, but your idea about snapshot based 
distributed revision control systems can be wrong because DSCM are 
outside the area you know most about.

Second, even if you are a master at given topic, you can still be wrong.

Mind you, I was not saying you are wrong. I was saying you could be.


[...] 
>> The only advantage to the "." idea is that it can use gitignore
>> mechanism (both in-tree .gitignore, tracked or not, and info/exclude
>> file). But I also think that the fact that gitignore mechanism is
>> recursive is more of disadvantage than advantage.
[...]
> The recursiveness of the gitignore mechanism has the advantage that
> when maintaining a large repository with actual or logical
> subprojects, one does not need to pick a single policy for all
> subprojects.  I think that is quite important.  It could possibly be
> achieved with some other method of having per-subproject
> configuration, but I see little wrong in using what is there and
> documented already.

I think it would be best implemented by repository config, e.g. 
core.dirManagement or something like that, which could be set to
 1. "autoremove" or something like that, which gives old behavior
    of untracking directory if it doesn't have any tracked files
    in it, and removing directory if it doesn't have any files
    in it.
 2. "noremove" or something like that, which changes the behaviour
    to _never_ untrack directory automatically. This can be done
    without any changes to 'tree' object nor index. It could be useful
    for git-svn repositories.
 3. "marked" or something like that, for which you have to explicitely
    mark directories which are not to be removed when empty.
 4. "recursive" or something like that, which would automatically mark
    as "sticky" all subdirectories added in a "sticky" repository.
    OR directory is not removed when empty if it is marked as such,
    or one of its parents is marked as such.
 
>> Second, the "easy implementation" is anything but easy. "git add ."
>> as a way to mark directory as "sticky" is not backward compatibile:
>> currently it mean to add _all contents_ of current directory.
>> Implementation is tricky: as we have seen trying to unlink '.' or
>> create '.' can unfortunately succeed on [some Sun OS, and UFS
>> filesystem] (which follows POSIX stupidly to the letter) f**king up
>> the filesystem.
> 
> I was not suggesting actually leaving any such calls in place: after
> all, they would presumably lead to error messages.  But I agree that
> this could lead to nasty surprises when somebody with a legacy version
> of git worked with a repository containing "." as explicit entries of
> some file type.

The "magic mode" solution _should_ work also with older git, I think.
 

>> Fourth, is very artificial. What would you put for filemode for '.'?
>> 040000 (i.e. directory)?
[...]
>> What would you put for sha1?  Sha1 of an empty directory?
> 
> Some fixed value.  Everywhere the same.  Not really relevant.

Relevant because it has to work with legacy git on strange operating 
systems. Because git has to fsck it (and adding special casing this 
"some fixed value" to git-fsck is bad, bad idea).

Note that sha1 cannot be sha1 of the tree. In working area '.' is self 
link. You cannot create self link in git repository object.

[...]
>>> And the repository is a versioned and hierarchically hashed version
>>> of the index, but its trees contain _no_ information that is not
>>> already inherently represented by the files alone. [...]
[...]
>> Trees do contain information which is not inherently present by the 
>> blobs.
> 
> Could you give examples for such information?  As long as we are not
> talking about _history_, I am at a loss at what else you mean.  File
> names and permissions?

File names and permissions. And they bind blobs and trees together.
Trees do not contain any info about history.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
       [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
@ 2007-07-22 21:08                     ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-22 21:08 UTC (permalink / raw
  To: git


Well, coming back to this posting in order to focus on some points
that were at a level more relevant to the implementation.  And I'll go
through the questions assuming my permissions-based proposal.

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 19 Jul 2007, David Kastrup wrote:
>> 
>> Well, kudos.  Together with the analysis from Junio, this seems like a
>> good start.  Would you have any recommendations about what stuff one
>> should really read in order to get up to scratch about git internals?
>
> Well, you do need to understand the index. That's where all the new 
> subtlety happens.
>
> The data structures themselves are trivial, and we've supported
> empty trees (at the top level) from the beginning, so that part is
> not anything new.
>
> However, now having a new entry type in the index (S_IFDIR) means
> that anything that interacts with the index needs to think
> twice. But a lot of that is just testing what happens, and so the
> first thing to do is to have a test-suite.

Yes.

> There's also the question about how to show an empty tree in a
> diff.

Well, there are two possibilities involved here, a more and a less
chatty one.  Assuming that we want to do as little work as possible,
the transition between a tracked and a non-tracked directory will be
given in one of the following manners:

Either:
a) xxx: old mode 000000
   xxx: new mode 040755

when a directory gets tracked and

   xxx: new mode 040755
   xxx: old mode 000000

when it gets untracked again.

or
b)
   xxx: new directory mode 040755

when a directory gets tracked and

   xxx: deleted directory mode 040755

when it gets untracked again.  Note that "new" does not mean that git
did not previously have had files that absolutely have required a
directory for placing.  It just means that it has now actively gained
knowledge about the directory.

In a similar vein, "deleted" means that git is just deleting its
knowledge about the directory, _scheduling_ it for a single deletion
attempt at the earliest (and actually also latest) opportunity: when
git happens to know about no more files that require keeping the
directory around.  So perhaps the following would be more readable:

   xxx: tracking directory mode 040755

   xxx: forgetting directory mode 040755

Now in order to cut down on the verbiage, it might be an option to
transmit those strings only when something happens that can't be
deduced from other data.  Because _if_ it can be deduced from other
data (like a directory being present when files in it are), then at
least the working copies are identical as long as both persons don't
start deleting files from the repository.  If they do so, when a
directory becomes empty, the other side needs to know whether the
directory is being tracked or not if it still wants to maintain the
same state in the working tree.  But if we really want to have not
just the working tree but also the repositories in SHA1-lockstep, we
can't delay transmitting this information.

> We've never had that: the only time we had empty trees was when we
> compared a totally empty "root" tree against another tree, and then
> it was obvious.  But what if the empty tree is a subdirectory of
> another tree - how do you express that in a diff? Do you care? Right
> now, since we always recurse into the tree (and then not find
> anything), empty trees will simply not show up _at_all_ in any
> diffs.

One would still recurse.

> And what about usability issues elsewhere? With my patch, doing something 
> like a
>
> 	git add directory/
>
> still won't do anything, because the behaviour of "git add" has always 
> been to recurse into directories.

This will remain the same, but the directory itself will be added if
and only if the corresponding preference variable is set, regardless
of whether the directory is empty.

> So to add a new empty directory, you'd have to do
>
> 	git update-index --add directory
>
> and that's not exactly user-friendly.

Presumably one could, if one really wanted an explicit way, have
git add --directory directory
in analogy to the --directory option of the ls command.  But I think
that in most cases one would not want to treat one directory different
from the whole tree, so the implicit behavior regulated by a
project-wide preference should be sufficient in general.

> So do you add a "-n" flag to "git add" to tell it to not recurse? Or
> do you always recurse, but then if you notice that the end result is
> empty, you add it as a directory?

I always recurse (unless there is a --directory option and I have some
strange desire to actually use it).  I add it as a directory,
regardless of whether it is empty or not, if my preference setting (or
gitignore or whatever) is set to tracking directories.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  6:53     ` Junio C Hamano
       [not found]       ` <867ioyqhgc.fsf@lola.quinscape.zz>
  2007-07-20  8:29       ` Johan Herland
@ 2007-07-22 21:35       ` David Kastrup
  2 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-22 21:35 UTC (permalink / raw
  To: git


Coming full circle...

Junio C Hamano <gitster@pobox.com> writes:

> The right approach to take probably would be to allow entries of
> mode 040000 in the index.  Traditionally, we allowed only 100644
> (blobs as regular files) and 120000 (blobs as symlinks).  We
> recently added 160000 (commit from outer space, aka subproject).
>
> And we do that for all directories, not just empty ones.  So if
> you have fileA, empty/, sub/fileB tracked, your index would
> probably have these four entries, immediately after read-tree
> of an existing tree object:
>
> 	100644 15db6f1f27ef7a... 0	fileA
> 	040000 4b825dc642cb6e... 0	empty
> 	040000 e125e11d3b63e3... 0	sub
> 	100644 52054201c2a872... 0	sub/fileB

This would be very much what I am proposing now, except that instead
of 040000 we would have 040755 usually, so that when the index makes
it into the repository where 040000 already has a meaning (a
disappear-when-empty tree) we get the right information.  Also note
that the above comes about when doing
git-add *
but not when doing
git-add fileA empty sub/fileB (in the latter case, the entry for sub
                               would be missing)

> If you add sub/fileC, with "update-index" (and "add"), you
> invalidate the SHA-1 object name you stored for "sub" (because
> there is no point recomputing the tree object until you know you
> need a subtree for "sub" part, which does not happen until the
> next "write-tree"), and end up with something like:
>
> 	100644 15db6f1f27ef7a... 0	fileA
> 	040000 4b825dc642cb6e... 0	empty
> 	040000 00000000000000... 0	sub
> 	100644 52054201c2a872... 0	sub/fileB
> 	100644 705bf16c546f32... 0	sub/fileC
>
> These "missing" SHA-1 would need to be recomputed on-demand.

Ah, ok.  Does it even make sense to compute the SHA-1 values in the
index in advance?  What would they be useful for?

> We have had necessary infrastructure to do this "keeping
> untouched tree object names in the index" for quite some time,
> but it is not a part of the index proper (it is stored in an
> extension section in the index file, to keep the index
> compatible with older versions of git).

What is the application for which this is being used?

> Having made it sound so easy, here are the issues I would expect
> to be nontrivial (but probably not rocket surgery either).
>
>  * unpack-trees, which is the workhorse for twoway merge (aka
>    "switching branches") and threeway merge, has a convoluted
>    logic to avoid D/F conflicts; it can probably be cleaned up
>    once we do the above conversion so that the index starts
>    saying "Hey, I have a directory here" more explicitly.  The
>    end result would probably be a code easier to follow.

I am afraid that this is unlikely to happen, and that is because
directory tracking remains optional at a fundamental level as long as
we want to support the current behavior as an option.  However, one
could conceivably add 040000 entries (rather than 040755) for
directories that have not been passed into tracking but are required
by git, if this simplifies matters.  But it sounds like something that
might complicate working with several different git versions on the
same index.

>  * status, update-index --refresh, and diff-files cares about
>    the information cached in the index from the last time
>    lstat(2) is run on each entry.  What we should store there
>    for "tree" entries is very unclear to me, but probably we
>    should teach them to ignore the stat-matching logic for
>    these entries.

At the current point of time, git tracks just the u+x bit for normal
files, and for directories, there is really nothing worth tracking as
long as no attempt of restoring more mode bits is done.  Modification
times are probably a bit too risky to pay attention to.

>  * diff-index walks the index and a tree in parallel but does
>    not currently expect to see a tree object in the index.  It
>    needs to be taught to ignore these "tree" entries.

Or do something sensible when comparing.  Understood.

>  * merge-recursive and merge-index walk the index, coming up
>    with the merge results one path at a time.  They also need to
>    be taught to ignore these "tree" entries.

Same here.

>  * diff-index and "read-tree -m" should be taught to take
>    advantage of the "tree" entries in the index.  For example,
>    if diff-index finds the "tree" entry in the index and the
>    subtree found from the tree object exactly match, it does not
>    even have to descend into the tree, which would be a huge
>    performance win (because you do not have to open the subtree
>    and its subtrees from the tree side; you already have read
>    everything on the index side, and still have to skip the
>    entries in the directory).  "read-tree -m" also should be
>    able to optimize two identical subtrees in the 2 or 3 trees
>    involved.
>
>    Even if we follow the "lazy invalidate" strategy to maintain
>    the "tree" entries in the normal codepath, we could have a
>    special operation that says "now update all the tree entries
>    by recomputing the tree object names as needed".  Perhaps we
>    might want to initiate such an operation before "read-tree
>    -m" automatically.

Over my head, but it would appear that it can safely left for later.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22 20:26                                         ` Jakub Narebski
@ 2007-07-22 22:57                                           ` David Kastrup
  2007-07-23  6:05                                             ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-22 22:57 UTC (permalink / raw
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> David Kastrup wrote:
>
>> So I pretty much can rule out that I am wrong on the factual side.
>
> Big words.

Sure.  It is not relevant, however.

> First, there is little matter of something like area of competence.
> You might be systems master, but your idea about snapshot based
> distributed revision control systems can be wrong because DSCM are
> outside the area you know most about.

Slicing the concept of directory and tree into two separate things and
thinking separately about them and their relation in working tree and
repository is not exactly concerned with the internals.  It obviously
was too artificial a concept to be understandable, and likely a worse
idea than necessary (whether one wants to call it too smart or too
stupid for its own good may be a matter of taste).

Anyway, it would be more productive if we managed to focus on the
technical aspects again.  I accept that my previous proposal was not
fit for inclusion.

> Second, even if you are a master at given topic, you can still be
> wrong.
>
> Mind you, I was not saying you are wrong. I was saying you could be.

We can leave that open since no code is going to come of the first
proposal.

> [...]
>> The recursiveness of the gitignore mechanism has the advantage that
>> when maintaining a large repository with actual or logical
>> subprojects, one does not need to pick a single policy for all
>> subprojects.
>
> I think it would be best implemented by repository config, e.g. 
> core.dirManagement or something like that, which could be set to
>  1. "autoremove" or something like that, which gives old behavior
>     of untracking directory if it doesn't have any tracked files
>     in it, and removing directory if it doesn't have any files
>     in it.

That's actually not _tracking_ a directory at all, but rather
maintaining an independent directory in the parallel repository
universe.  No information specific to directories passes the index.

>  2. "noremove" or something like that, which changes the behaviour
>     to _never_ untrack directory automatically. This can be done
>     without any changes to 'tree' object nor index. It could be useful
>     for git-svn repositories.

I don't see how this could occur.  Automatic _untracking_ would happen
when one untracks (aka removes) a parent directory.  But one would not
do this while keeping the child.

>  3. "marked" or something like that, for which you have to explicitely
>     mark directories which are not to be removed when empty.

Equivalent to 1 in my scheme.

>  4. "recursive" or something like that, which would automatically mark
>     as "sticky" all subdirectories added in a "sticky" repository.

If they are covered by the add and not just implied by childs.  That is,
git-add a/b
will not make "a" sticky while
git-add a
will make a/b sticky.

>     OR directory is not removed when empty if it is marked as such,
>     or one of its parents is marked as such.

I'd not throw too much inheritance into the equation, or things become
intractable too easily.

> The "magic mode" solution _should_ work also with older git, I
> think.

I think so, too, for the repository.  But of course what happens in
the index with old code when new data types get added is a case for
review, testing and praying.

>>> Fourth, is very artificial. What would you put for filemode for '.'?
>>> 040000 (i.e. directory)?
> [...]
>>> What would you put for sha1?  Sha1 of an empty directory?
>> 
>> Some fixed value.  Everywhere the same.  Not really relevant.
>
> Relevant because it has to work with legacy git on strange operating 
> systems. Because git has to fsck it (and adding special casing this 
> "some fixed value" to git-fsck is bad, bad idea).

I did not mean "arbitrary value", but the value would be computed in a
standard way from the node, and since the node would be the same
everywhere, the hash would be too.

> Note that sha1 cannot be sha1 of the tree. In working area '.' is
> self link. You cannot create self link in git repository object.

Certainly.  And the idea was to have "." be isolated from the contents
of the tree, basically treating it as a sibling of the other entries.
Which is, in a way, how "." shared one namespace in Unix with what
amounts to _children_ of the corresponding tree.

So that was some inspiration here, probably too much so.

> [...]
>>>> And the repository is a versioned and hierarchically hashed version
>>>> of the index, but its trees contain _no_ information that is not
>>>> already inherently represented by the files alone. [...]
> [...]
>>> Trees do contain information which is not inherently present by the 
>>> blobs.
>> 
>> Could you give examples for such information?  As long as we are not
>> talking about _history_, I am at a loss at what else you mean.  File
>> names and permissions?
>
> File names and permissions. And they bind blobs and trees together.

Trees bind blobs and trees together?  Anyway, I consider the names and
permissions properties of the files and their identity.  Stripping out
the blobs from under them does not actually add any information: the
trees still don't contain any information that would have necessitated
looking at directories rather than just files, their names,
permissions and content in the work space.

But you are right in that the tree can't be replaced by the blobs.  It
actually needs the files (namely their full names and permissions) to
reconstruct it.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-22 22:57                                           ` David Kastrup
@ 2007-07-23  6:05                                             ` David Kastrup
  2007-07-23  7:45                                               ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-23  6:05 UTC (permalink / raw
  To: Jakub Narebski; +Cc: git

David Kastrup <dak@gnu.org> writes:

> Jakub Narebski <jnareb@gmail.com> writes:
>
>> I think it would be best implemented by repository config, e.g.

I got sidetracked here: the gitignore stuff has in the dirmod scheme
actually no code or concept overlap with the actual scheme, so it can
be considered a distraction for now and its implementation and
discussion tabled.  It has one disadvantage: in order to get
_recursive_ behavior in one tree, one needs to use the "." pattern in
the .gitignore file of the respective directory, and having a
.gitignore file in that directory sort of defeats the idea of not
having .gitignore directories around...  Of course, a single
.gitignore file is better than ones one has to distribute through the
tree.

>> core.dirManagement or something like that, which could be set to
>>  1. "autoremove" or something like that, which gives old behavior
>>     of untracking directory if it doesn't have any tracked files
>>     in it, and removing directory if it doesn't have any files
>>     in it.
>
> That's actually not _tracking_ a directory at all, but rather
> maintaining an independent directory in the parallel repository
> universe.  No information specific to directories passes the index.

Note: that was merely a comment on semantics, not on the matter.

>>  2. "noremove" or something like that, which changes the behaviour
>>     to _never_ untrack directory automatically. This can be done
>>     without any changes to 'tree' object nor index. It could be useful
>>     for git-svn repositories.
>
> I don't see how this could occur.  Automatic _untracking_ would happen
> when one untracks (aka removes) a parent directory.  But one would not
> do this while keeping the child.

Correction: if there was a --directory option and one used it for
git-rm (or no -r was given, so just one directory level was effected),
one _could_ untrack stuff on the git side accidentally.  And for
something like git-svn, this might be a bad idea.  So there is
conceivably a market for an option that never untracks a non-empty
tree.

>>  3. "marked" or something like that, for which you have to explicitely
>>     mark directories which are not to be removed when empty.
>
> Equivalent to 1 in my scheme.

At least if scheme 1 does not forbid some _explicit_ way of saying
"track this and I really mean it".

>>  4. "recursive" or something like that, which would automatically mark
>>     as "sticky" all subdirectories added in a "sticky" repository.
>
> If they are covered by the add and not just implied by childs.  That is,
> git-add a/b
> will not make "a" sticky while
> git-add a
> will make a/b sticky.

Addition: I was thinking so much of my implementation and its
semantics that I did not consider one possibility that you might mean
here:

When adding a/b, always also add a (and the whole hierarchy above it)
automatically as sticky.  Namely disallow unsticky directories in the
repository at all.  That would mean that

  git-add a/b;git-commit -m x;git-rm a/b;git-commit -m x

might not be a noop if a was not in the repository previously: it
would cause a to stay around sticky until removed.  With all other
schemes, however, it would cause a to be removed "on behalf of the
user" even if the user intended it to stay around.

Indeed, this scheme might by far be the easiest to understand.  Having
no autoremoval at all in levels higher than the deleted level is
something that people might easily understand: delayed removal just
does not happen anymore, and git never deletes a directory unless told
to.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23  6:05                                             ` David Kastrup
@ 2007-07-23  7:45                                               ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-23  7:45 UTC (permalink / raw
  To: git

David Kastrup <dak@gnu.org> writes:

> Addition: I was thinking so much of my implementation and its
> semantics that I did not consider one possibility that you might mean
> here:
>
> When adding a/b, always also add a (and the whole hierarchy above it)
> automatically as sticky.  Namely disallow unsticky directories in the
> repository at all.  That would mean that
>
>   git-add a/b;git-commit -m x;git-rm a/b;git-commit -m x
>
> might not be a noop if a was not in the repository previously: it
> would cause a to stay around sticky until removed.  With all other
> schemes, however, it would cause a to be removed "on behalf of the
> user" even if the user intended it to stay around.
>
> Indeed, this scheme might by far be the easiest to understand.
> Having no autoremoval at all in levels higher than the deleted level
> is something that people might easily understand: delayed removal
> just does not happen anymore, and git never deletes a directory
> unless told to.

And of course, it would be a nuisance for people managing a
patch-based workflow.  But those can actually easily set the
repository preferences differently, and even
find -type d -empty -delete
is not too hard to do.  So it would even be feasible as default.

But I think that in practice, the "track only what has been added
recursively" approach is a good default.  And since patches without
dir information never add anything recursively, it would mostly keep
the directories clean.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-19 12:46                                   ` David Kastrup
@ 2007-07-23 20:18                                     ` Nix
  2007-07-23 20:49                                       ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Nix @ 2007-07-23 20:18 UTC (permalink / raw
  To: David Kastrup; +Cc: git

On 19 Jul 2007, David Kastrup stated:
> Tomash Brechko <tomash.brechko@gmail.com> writes:
>> Please consider this: I myself use Git to track my own local
>> projects, and for this usage you proposal have no value for me,
>> i.e. as a _Source_ Code Management system Git is rather complete.
>> But I also track /etc and ~/ in Git, and for this I'd love to have
>> directories, permissions, ownership, other attributes, to be
>> tracked.  I have Perl script wrapping Git that allows me to filter
>> tracked paths by full regexps instead of Git's file globs, and also
>> to filter out too big files assuming that they are binary anyway.
>
> Look, git _tracks_ contents.  Your permissions managements needs to be
> told explicitly when and how things change.  So you end up with git
> _tracking_ material and your permissions/directory management needing
> the level of manual handholding Subversion demands.

Actually, if we had a post-checkout hook, we could use a pre-commit hook
to keep track of directory existence, permissions, et seq, and a post-
checkout hook to restore them.

(But we don't, at least not yet. Adding one is probably quite easy.)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 20:18                                     ` Nix
@ 2007-07-23 20:49                                       ` David Kastrup
  2007-07-23 21:49                                         ` Nix
  0 siblings, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-23 20:49 UTC (permalink / raw
  To: Nix; +Cc: git

Nix <nix@esperi.org.uk> writes:

> On 19 Jul 2007, David Kastrup stated:
>> Tomash Brechko <tomash.brechko@gmail.com> writes:
>>> Please consider this: I myself use Git to track my own local
>>> projects, and for this usage you proposal have no value for me,
>>> i.e. as a _Source_ Code Management system Git is rather complete.
>>> But I also track /etc and ~/ in Git, and for this I'd love to have
>>> directories, permissions, ownership, other attributes, to be
>>> tracked.  I have Perl script wrapping Git that allows me to filter
>>> tracked paths by full regexps instead of Git's file globs, and also
>>> to filter out too big files assuming that they are binary anyway.
>>
>> Look, git _tracks_ contents.  Your permissions managements needs to
>> be told explicitly when and how things change.  So you end up with
>> git _tracking_ material and your permissions/directory management
>> needing the level of manual handholding Subversion demands.
>
> Actually, if we had a post-checkout hook, we could use a pre-commit
> hook to keep track of directory existence, permissions, et seq, and
> a post- checkout hook to restore them.

Actually, tracking permissions would be cheap: one just needs to
replace the permission-munging macros in git with identity.  Ownership
-- well, that's harder.

But my sentiment remains: git _tracks_ stuff: it notices when things
move around and follows them.  Statically snapshotting permissions
creates a layer that is quite less flexible.  The information gets
detached.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 20:49                                       ` David Kastrup
@ 2007-07-23 21:49                                         ` Nix
  2007-07-23 22:05                                           ` Nix
  2007-07-23 22:16                                           ` David Kastrup
  0 siblings, 2 replies; 156+ messages in thread
From: Nix @ 2007-07-23 21:49 UTC (permalink / raw
  To: David Kastrup; +Cc: git

On 23 Jul 2007, David Kastrup uttered the following:
> Nix <nix@esperi.org.uk> writes:
>> Actually, if we had a post-checkout hook, we could use a pre-commit
>> hook to keep track of directory existence, permissions, et seq, and
>> a post- checkout hook to restore them.
>
> Actually, tracking permissions would be cheap: one just needs to
> replace the permission-munging macros in git with identity.  Ownership
> -- well, that's harder.
>
> But my sentiment remains: git _tracks_ stuff: it notices when things
> move around and follows them.  Statically snapshotting permissions
> creates a layer that is quite less flexible.  The information gets
> detached.

Not if you record it in a file which is checked in in the same commit
that is tracked, it isn't (that's what the pre-commit hook is for). It's
true that git won't natively have any knowledge of that data, but Linus
has fairly effectively shown that it shouldn't have any such knowledge
and doesn't need it.

(You might want to give git-diff knowledge of it, just so it can skip
it unless a new flag is given. Give the file a nice format, and bingo,
readable permission/ownership diffs!)

(I'd recommend storing the names of user/group file owners as well as
the uids, so you can --- given suitable permissions --- chown to the
right username in preference to uid if that user exists at checkout
time.)


Doing this *efficiently* is another matter: probably a pair of hooks are
needed, run on pre-checkout and post-checkout: they can communicate so
as only to fiddle permissions on things which are newly appeared or
whose permissions have changed.

Obviously because the permissions, ownerships et al aren't recorded in
the index this will slow committing down, but given that
git-update-index will already have sucked the entire tree's inodes into
the page cache anyway, I don't think a second pass over the working tree
snarfing permissions would slow it down much.


As I need this anyway (I'm backing up a filesystem via git, yes, I'm
insane but I need version control and it's horrifically redundant so
packing it will save heaps of space), I guess I'd better get off my
rear and write the code.

(The recent commit-as-a-builtin's introduction of a run_hook() function
will be pretty damn useful: good timing, I guess.)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 21:49                                         ` Nix
@ 2007-07-23 22:05                                           ` Nix
  2007-07-23 22:52                                             ` Jakub Narebski
  2007-07-23 22:16                                           ` David Kastrup
  1 sibling, 1 reply; 156+ messages in thread
From: Nix @ 2007-07-23 22:05 UTC (permalink / raw
  To: David Kastrup; +Cc: git

On 23 Jul 2007, nix@esperi.org.uk outgrape:
> (I'd recommend storing the names of user/group file owners as well as
> the uids, so you can --- given suitable permissions --- chown to the
> right username in preference to uid if that user exists at checkout
> time.)

Suddenly this gets more complex. git-merge-file(1) has to understand the
contents of this file, so as not to consider merges conflicting unless
two files actually have different permissions (i.e. doing a line by line
diff, and combining the two such that at most one file with a given name
exists in the result), and so as not to consider lines with differing
ownerships conflicting unless we're running under a uid in which we can
change ownerships at all. (I'd like to track ownership but it's looking
like a bit of a nest of snakes.)

And the problem is that while git has a lot of strategies for merging
*trees*, its file merge system is totally unpluggable: it just falls
back to xdiff's merging system. I guess I'll have to add that feature :)

(How does this cope with binary files, I wonder? I seem to recall
something about that flying past back before the volume of the git list
overwhelmed me...)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 21:49                                         ` Nix
  2007-07-23 22:05                                           ` Nix
@ 2007-07-23 22:16                                           ` David Kastrup
  2007-07-23 22:31                                             ` Linus Torvalds
  1 sibling, 1 reply; 156+ messages in thread
From: David Kastrup @ 2007-07-23 22:16 UTC (permalink / raw
  To: Nix; +Cc: git

Nix <nix@esperi.org.uk> writes:

> On 23 Jul 2007, David Kastrup uttered the following:
>> Nix <nix@esperi.org.uk> writes:
>>> Actually, if we had a post-checkout hook, we could use a pre-commit
>>> hook to keep track of directory existence, permissions, et seq, and
>>> a post- checkout hook to restore them.
>>
>> Actually, tracking permissions would be cheap: one just needs to
>> replace the permission-munging macros in git with identity.  Ownership
>> -- well, that's harder.
>>
>> But my sentiment remains: git _tracks_ stuff: it notices when things
>> move around and follows them.  Statically snapshotting permissions
>> creates a layer that is quite less flexible.  The information gets
>> detached.
>
> Not if you record it in a file which is checked in in the same
> commit that is tracked, it isn't (that's what the pre-commit hook is
> for).

I have my doubts that anybody but git actually has a clue what to
snapshot when, and where to place it: don't forget that index
manipulation and committing are done at different times, and you need
not even commit all of the index.

> It's true that git won't natively have any knowledge of that data,
> but Linus has fairly effectively shown that it shouldn't have any
> such knowledge and doesn't need it.

Last time I looked, git tracked the executable bit.  For kernel
development, this is pretty much what it takes, and with colloborative
work, tracking anything but the owner permissions is going to lead to
annoying and verbose merge behavior quite a lot.  And of the owner
permissions, r and w complicate proper handling when unset.

But being able to specify other masks for applications other than
multi-site colloborative development would likely not hurt.

> Doing this *efficiently* is another matter: probably a pair of hooks
> are needed, run on pre-checkout and post-checkout: they can
> communicate so as only to fiddle permissions on things which are
> newly appeared or whose permissions have changed.
>
> Obviously because the permissions, ownerships et al aren't recorded
> in the index this will slow committing down,

It will also detach the time where the file contents and the
permissions get recorded.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 22:16                                           ` David Kastrup
@ 2007-07-23 22:31                                             ` Linus Torvalds
  2007-07-23 23:32                                               ` Nix
       [not found]                                               ` <86ps2ithyl.fsf@lola.quinscape.zz>
  0 siblings, 2 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-23 22:31 UTC (permalink / raw
  To: David Kastrup; +Cc: Nix, git



On Tue, 24 Jul 2007, David Kastrup wrote:

> Nix <nix@esperi.org.uk> writes:
> 
> > It's true that git won't natively have any knowledge of that data,
> > but Linus has fairly effectively shown that it shouldn't have any
> > such knowledge and doesn't need it.
> 
> Last time I looked, git tracked the executable bit.

Actually, originally it tracked the whole mode word.

It was a total disaster. People who had different umasks etc got mode 
clashes all the time, and you ended up having silly and unnecessary 
conflicts.

The same would be true (to an even higher degree) if we tracked owner and 
group information etc.

So practically speaking, you want to track the *minimal* possible state, 
not the maximal one. 

This is one of those "in theory" vs "in practice" things. In *theory*, it 
would be nice for an SCM to track everything that is known about a file. 
In *practice*, that sucks.

So this does mean that if you want to explicitly track certain things 
(ownership and more complete file permissions, or ACL's, or "resource 
forks", or any number of other things that a file *could* have on various 
systems), you end up havign to track them in something else than git, or 
you end up having to track them as a separate "metadata file".

One such metadata file is, for example, the ".gitattributes" file. It 
*could* be used to contain things like path-based rules for ownership, 
not just things like whether to check out with CRLF etc.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 22:05                                           ` Nix
@ 2007-07-23 22:52                                             ` Jakub Narebski
  2007-07-25 22:43                                               ` Nix
  0 siblings, 1 reply; 156+ messages in thread
From: Jakub Narebski @ 2007-07-23 22:52 UTC (permalink / raw
  To: git

Nix wrote:

> And the problem is that while git has a lot of strategies for merging
> *trees*, its file merge system is totally unpluggable: it just falls
> back to xdiff's merging system. I guess I'll have to add that feature :)

Not true. You can add custom diff driver for files using gitattributes
system.
 
> (How does this cope with binary files, I wonder? I seem to recall
> something about that flying past back before the volume of the git list
> overwhelmed me...)

xdiff has binary diff, and git has some kind of "ascii-armored" binary diff
output. As to how to merge binary files: I suspect that they always
conflict, unless the merge is trivial.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 22:31                                             ` Linus Torvalds
@ 2007-07-23 23:32                                               ` Nix
  2007-07-23 23:57                                                 ` Linus Torvalds
       [not found]                                               ` <86ps2ithyl.fsf@lola.quinscape.zz>
  1 sibling, 1 reply; 156+ messages in thread
From: Nix @ 2007-07-23 23:32 UTC (permalink / raw
  To: Linus Torvalds; +Cc: David Kastrup, git

On 23 Jul 2007, Linus Torvalds spake thusly:
> So practically speaking, you want to track the *minimal* possible state, 
> not the maximal one. 

I think it depends on your use case. For source code and indeed anything
with heavy merges, this is true: but I'm increasingly using git as a
sort of `merged historical tar' to store images of entire random
filesystem trees across time, and gaining the benefit of the packer's
lovely space-efficiency as well (doing this with svn would be a lost
cause, twice the space usage before you even think about the
repository). And in that case, preserving everything you can makes
sense.

(Perhaps what I should be doing is tarring the directory tree up and
storing the *tarball* in git. I'll try that and see what it does to pack
sizes. These are version-controlled backups of my mother's magnum opus
in progress so you can understand that I don't want to destroy them
accidentally: I'd never hear the end of it! ;) )

> So this does mean that if you want to explicitly track certain things 
> (ownership and more complete file permissions, or ACL's, or "resource 
> forks", or any number of other things that a file *could* have on various 
> systems), you end up havign to track them in something else than git, or 
> you end up having to track them as a separate "metadata file".

Yes indeed: that's why I proposed doing this using a couple of new hooks
driving entirely optional permissions-preservation stuff. Most use cases
really won't want to track this, so this sort of stuff shouldn't impose
upon the git core or upon anyone who doesn't want it. (However, the
ability to have alternative file merging strategies *may* be useful
elsewhere, perhaps.)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 23:32                                               ` Nix
@ 2007-07-23 23:57                                                 ` Linus Torvalds
  0 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-07-23 23:57 UTC (permalink / raw
  To: Nix; +Cc: David Kastrup, git



On Tue, 24 Jul 2007, Nix wrote:
>
> On 23 Jul 2007, Linus Torvalds spake thusly:
> > So practically speaking, you want to track the *minimal* possible state, 
> > not the maximal one. 
> 
> I think it depends on your use case. For source code and indeed anything
> with heavy merges, this is true

Yes, very obviously. Git is targeted towards source code and working in a 
distributed manner across a very wide variety of users and setups, while 
something that would be more targeted towards a special scenario and much 
stricter usage would find that the "minimum" set is much bigger, and might 
well include ACL's and usr information.

> but I'm increasingly using git as a sort of `merged historical tar' to 
> store images of entire random filesystem trees across time, and gaining 
> the benefit of the packer's lovely space-efficiency as well (doing this 
> with svn would be a lost cause, twice the space usage before you even 
> think about the repository). And in that case, preserving everything you 
> can makes sense.

On the other hand, almost all the space-efficiency comes from things that 
delta well, and change quickly. That includes the file data itself (and 
very much the tree contents), but it doesn't necessarily include things 
like permissions and user information - mainly because that doesn't 
actually delta at all (not because it can't, but because it hardly ever 
changes, and when it does change, it often changes all over the map).

To make an example of your "tar" situation: if you want to be space- 
efficient in a tar-like setting, you should *not* make user information be 
something that is per-file at all! Why? Because in 99% of all tar-files, 
there is a single user name.

So even your usage *may* actually be much better off using git as a "data 
backend", and using something totally different for "user/group" 
information. Yes, you'd have to make a "shim layer" on top of git to hide 
the fact that the user information is handled separately, but that 
shouldn't be that hard per se.

> (Perhaps what I should be doing is tarring the directory tree up and
> storing the *tarball* in git. I'll try that and see what it does to pack
> sizes. These are version-controlled backups of my mother's magnum opus
> in progress so you can understand that I don't want to destroy them
> accidentally: I'd never hear the end of it! ;) )

You don't want to do this. 

There's a few reasons, but the two big ones are:

 - the git delta logic is strictly a "single delta base" thing.

   Yes, git would be able to find the delta's between two tar-files (as 
   long as you don't compress them), and express one tar-file in terms of 
   the other, and it would probably save a fair amount of disk.

   But it would not be able to do _nearly_ as well as it can if you store 
   individual files, and let git just find the best delta per-file (and 
   not just "one delta base for the whole tar-ball")

 - git is very much optimized for "many small files". Yes, you can check 
   in large files, and it works fine, but quite frankly, all the design 
   and heavy optimizations have been about having trees with tens of 
   thousands of files, but the files individually reasonably small.

   A lot of the speed advantages of git come from efficiently pruning away 
   whole sub-directory structures, for example, and not even touching the 
   data at all!

   So if you track just one file that changes in every version, all the 
   things that make git fly are basically disabled, and you won't take 
   full advantage of what git does.

> Yes indeed: that's why I proposed doing this using a couple of new hooks
> driving entirely optional permissions-preservation stuff. Most use cases
> really won't want to track this, so this sort of stuff shouldn't impose
> upon the git core or upon anyone who doesn't want it. (However, the
> ability to have alternative file merging strategies *may* be useful
> elsewhere, perhaps.)

The ".gitattributes" file really could be used for some of that. Using it 
to track ownership and full permissions would not be impossible, and it 
could have interesting semantics (especially as .gitattibutes is path 
pattern based - so you could literally do a "user" attribute, and say that 
everything in a particular subdirectory is owned by a particular user).

That wouldn't be UNIX-like semantics, of course, but it can be very useful 
for certain things. 

Taking an example of something totally independent of git, look at how 
"udev" handles permissions, for example. In situations like that, static 
user information is useless, and it actually ends up setting up modes and 
ownership based on name-based patterns rather than having each file have a 
permission/user (because individual files appear and disappear, the 
name-based patterns are the things that matter).

So if you *just* want to track a regular filesystem layout, that's not the 
right thing, but "udev" does show an example of a totally different way of 
describing ownership and permissions, and one which wouldn't actually be 
at all foreign to git.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
       [not found]                                               ` <86ps2ithyl.fsf@lola.quinscape.zz>
@ 2007-07-24  6:56                                                 ` Nix
  0 siblings, 0 replies; 156+ messages in thread
From: Nix @ 2007-07-24  6:56 UTC (permalink / raw
  To: David Kastrup; +Cc: Linus Torvalds, git

On 24 Jul 2007, David Kastrup spake thusly:
> But merging will become nicer if the permissions actually stay
> associated with the file rather than the file name.  Even in things
> like /etc backups, blobs not infrequently relocate from one place to
> another when the system gets updated.

Even without that we'd need to merge without context, i.e. with totally
independent lines, for such a file. So it's not the standard git file
merge.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-23 22:52                                             ` Jakub Narebski
@ 2007-07-25 22:43                                               ` Nix
  0 siblings, 0 replies; 156+ messages in thread
From: Nix @ 2007-07-25 22:43 UTC (permalink / raw
  To: Jakub Narebski; +Cc: git

On 23 Jul 2007, Jakub Narebski spake thusly:

> Nix wrote:
>
>> And the problem is that while git has a lot of strategies for merging
>> *trees*, its file merge system is totally unpluggable: it just falls
>> back to xdiff's merging system. I guess I'll have to add that feature :)
>
> Not true. You can add custom diff driver for files using gitattributes
> system.

Oo. Excellent, I didn't notice that. Thank you.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-18  0:13 Empty directories David Kastrup
                   ` (2 preceding siblings ...)
  2007-07-18  2:23 ` Junio C Hamano
@ 2007-07-26 23:33 ` Robin Rosenberg
  2007-07-27  5:22   ` David Kastrup
  3 siblings, 1 reply; 156+ messages in thread
From: Robin Rosenberg @ 2007-07-26 23:33 UTC (permalink / raw
  To: David Kastrup; +Cc: git


(
	I don't know which mail is the best to reply to and I probably missed 
	something in the thread, so bear with me if I'm repeating anything.
)

David. Reconsider "tracking" all directories and what that would give, 
compared to explicitly tracking specific ones and the requires magic entries.

Say we have a config setting that tells git never to remove empty trees. Linus 
patches could be a start for representing trees in the index. As an 
optimization the index could prune trees from the index if they contain 
things as long as the index *effectively* remembers all trees.

Using the patches again we could add empty directories to the index and remove 
them. No directory would be removed automatically, except maybe by a merge.

We would probably have only a few empty directories and new unexpected ones
would only pop up when we remove all blobs from one. Git status could tell us
about them so we will not forget them. It could even tell us about "new" empty
directories, which is probably the most important thing you'd want to know. 

Forgetting to untrack an empty directory would not be a big deal.

Whether to retain empty trees or not should be a repository policy, but an all 
or nothing setting.

-- robin

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty directories...
  2007-07-26 23:33 ` Robin Rosenberg
@ 2007-07-27  5:22   ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-27  5:22 UTC (permalink / raw
  To: Robin Rosenberg; +Cc: git

Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes:

> (
> 	I don't know which mail is the best to reply to and I probably missed 
> 	something in the thread, so bear with me if I'm repeating anything.
> )
>
> David. Reconsider "tracking" all directories and what that would
> give, compared to explicitly tracking specific ones and the requires
> magic entries.

It would be quite a nuisance for a patch-based workflow, since patches
don't talk about the creation and deletion of directories.

The "track only when entered approach" has the advantage that
directories that were only created to accommodate patches will be
removed again when becoming empty.

Of course, once doing "git-add top-level" will level the difference.

> Say we have a config setting that tells git never to remove empty
> trees.

Why wouldn't I have tree/zap removed when doing git-rm tree?

> Linus patches could be a start for representing trees in the
> index. As an optimization the index could prune trees from the index
> if they contain things as long as the index *effectively* remembers
> all trees.

But it doesn't.  If you do git-add tree, optimizing the dir entry away
since tree/zap exists, then subsequently do git-rm tree/zap, of course
there is nothing to do except remove tree/zap, and the tree is gone.

One can't start tracking trees explicitly only when they become empty,
because one can't know whether to track them then.

> Using the patches again we could add empty directories to the index
> and remove them. No directory would be removed automatically, except
> maybe by a merge.

I currently have the problem that

rm -rf *
unzip some-archive
git-add some-archive
git-commit -a -m whatever
git-checkout something else

leaves empty directory skeletons lying around.

> We would probably have only a few empty directories and new
> unexpected ones would only pop up when we remove all blobs from
> one. Git status could tell us about them so we will not forget
> them.

I don't want a source management system to tell me whenever it is
going to annoy me.

> It could even tell us about "new" empty directories, which is
> probably the most important thing you'd want to know.
>
> Forgetting to untrack an empty directory would not be a big deal.
>
> Whether to retain empty trees or not should be a repository policy,
> but an all or nothing setting.

With that approach idea the workflow

"Apply a patch creating something/hello"
"Undo the patch creating something/hello"

will leave something lying around.  For somebody managing hundreds of
directories, that would be a nuisance.

I don't say that a "track all parents automatically" approach would
not have its merits: it would likely prevent some mistakes and be
easily understandable to most users.  But for managing a patch
workflow, it would appear to get in the way.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [RFC PATCH] Re: Empty directories...
  2007-07-21  5:08                     ` Linus Torvalds
  2007-07-21  5:28                       ` David Kastrup
@ 2007-07-28  8:44                       ` David Kastrup
  1 sibling, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-07-28  8:44 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Why? Because my preliminary patches sort the index entries wrong. A 
> directory should always sort *as*if* it had the '/' at the end.
>
> See base_name_compare() for details.
>
> And we've never done that for the index, because the index has never had 
> this issue (since it never contained directories). So sit down and compare 
> base_name_compare (for tree entries) with cache_name_compare() (for index 
> entries), and see how the latter doesn't care about the type of names.
>
> This was actually something that I hit already with subproject support, 
> and one of my very first patches even had some (aborted) code to start 
> sorting subprojects in the index the way we sort directories.
>
> And I *should* have done it that way, but I never did. It now makes the 
> S_ISDIR handling harder, because directories really do have to be sorted 
> as if they had the '/' at the end, or "git-fsck" will complain about bad 
> sorting.
>
> Sad, sad, sad. It effectively means that S_IFGITLINK is *not* quite the 
> same as S_IFDIR, because they sort differently. Duh.
>
> Of course, it seldom matters, but basically, you should test a directory 
> structure that has the files
>
> 	dir.c
> 	dir/test
>
> in it, and the "dir" directory should always sort _after_ "dir.c".
>
> And yes, having the index entry with a '/' at the end would handle that 
> automatically.

Personally, I am not much in favor of using different names in index
and repository.

> As it is, with the "mode" difference, it instead needs to fix up 
> "cache_name_compare()". Admittedly, that would actually be a cleanup 
> (since it would now match base_name_compare() in logic, and could actually 
> use that to do the name comparison!), but it's a damn painful cleanup 
> because we don't even pass in the mode to "cache_name_compare()", since we 
> never needed it.
>
> Gaah.
>
> cache_name_compare itself isn't used in that many places,

dir.c and readcache.c

> but it's used by "index_name_pos()/cache_name_pos()", which *is*
> used in many places.

cache_name_pos:
builtin-apply.c
builtin-blame.c
builtin-checkout-index.c
builtin-ls-files.c
builtin-mv.c
builtin-read-tree.c
builtin-rm.c
builtin-update-index.c
diff.c
diff-lib.c
dir.c
merge-index.c
sha1_name.c
unpack-trees.c
wt-status.c

index_name_pos:
read-cache.c

> And again, that one doesn't even have the mode, so it cannot pass it
> down.

> So it probably *is* easier to add the '/' at the end of the name
> instead, to make directories sort the right way in the index. I'd
> still suggest you *also* make the mode be S_IFDIR, though (and
> preferably make git-fsck actually verify that the mode and the last
> character of the name matches!).

Actually, pretty much all of the above files are likely to get touched
by directory support one way or another anyway.  One really should aim
for the cleanest solution in the long run, and this for me more or
less means that it makes no sense to have different names in index and
repository.  Putting that slash in always would probably simplify some
logic in the repository as well, but I don't really like something as
marker-like as "/" in the data structures.  Putting a slash there
would involve a three-phase plan:

a) make fsck and the other code deal gracefully with either slash or
   no slash.
Wait until everybody uses this code.

b) make the code actually _put_ slashes there.
Wait until everybody has used this code.

c) deal with it for all eternity, oops: since rewriting the
   cryptographic history of existing repositories is pretty much out
   as far as I understand (which might be insufficient), one has to
   navigate around slash/noslash all the time when accessing
   repositories, including the sorting.  The index, however, can at
   one point of time phase out the slash-specific sorting.  There is
   no such thing as prehistoric indexes we would need to mind.

I guess that looks like not being worth the pain.  Double the code or
no money back.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* empty directories
@ 2007-08-21 17:14 Josh England
  2007-08-21 17:40 ` Sean
                   ` (2 more replies)
  0 siblings, 3 replies; 156+ messages in thread
From: Josh England @ 2007-08-21 17:14 UTC (permalink / raw
  To: git

Hi,

Git doesn't seem to allow me to add an empty directory to the index, or
even nested empty directories.  Is there any way to do this?  What is
the reasoning?  I've got a use case where having empty directories in my
git repository would be *very* valuable.  Any information and help is
greatly appreciated.

-JE

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-21 17:14 empty directories Josh England
@ 2007-08-21 17:40 ` Sean
  2007-08-22 21:25   ` Josh England
  2007-08-22  0:06 ` Jakub Narebski
  2007-08-22  4:31 ` Salikh Zakirov
  2 siblings, 1 reply; 156+ messages in thread
From: Sean @ 2007-08-21 17:40 UTC (permalink / raw
  To: Josh England; +Cc: git

On Tue, 21 Aug 2007 11:14:21 -0600
"Josh England" <jjengla@sandia.gov> wrote:

> Git doesn't seem to allow me to add an empty directory to the index, or
> even nested empty directories.  Is there any way to do this?  What is
> the reasoning?  I've got a use case where having empty directories in my
> git repository would be *very* valuable.  Any information and help is
> greatly appreciated.

Hi Josh,

Git doesn't track empty directories.  There is a brief note about it in
the FAQ:

 http://git.or.cz/gitwiki/GitFaq#head-1fbd4a018d45259c197b169e87dafce2a3c6b5f9

Sean

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-21 17:14 empty directories Josh England
  2007-08-21 17:40 ` Sean
@ 2007-08-22  0:06 ` Jakub Narebski
  2007-08-22  4:31 ` Salikh Zakirov
  2 siblings, 0 replies; 156+ messages in thread
From: Jakub Narebski @ 2007-08-22  0:06 UTC (permalink / raw
  To: git

Josh England wrote:


> Git doesn't seem to allow me to add an empty directory to the index, or
> even nested empty directories.  Is there any way to do this?  What is
> the reasoning?  I've got a use case where having empty directories in my
> git repository would be *very* valuable.  Any information and help is
> greatly appreciated.

Git does not track empty directories [yet], but you can use empty .gitignore
file trick to mark "empty" directories to be added.

There were some discussion about this on git mailing list (see archives),
and this issue is most probably mentioned on GitFaq page in git wiki.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-21 17:14 empty directories Josh England
  2007-08-21 17:40 ` Sean
  2007-08-22  0:06 ` Jakub Narebski
@ 2007-08-22  4:31 ` Salikh Zakirov
  2007-08-22 18:46   ` Linus Torvalds
  2 siblings, 1 reply; 156+ messages in thread
From: Salikh Zakirov @ 2007-08-22  4:31 UTC (permalink / raw
  To: git

Josh England wrote:
> Git doesn't seem to allow me to add an empty directory to the index, or
> even nested empty directories.  Is there any way to do this?  What is
> the reasoning?  I've got a use case where having empty directories in my
> git repository would be *very* valuable.  Any information and help is
> greatly appreciated.

While the the other replies provided a historical background of how exactly
git handles directories and why it wasn't storing empty directories,
there is no fundamental reason for empty directories not being stored,
it's just nobody got to implement it.

Linus Torvalds posted an untested patch in a recent discussion and requested
that anyone interested in this functionality continued development and testing.

Design discussion: http://lists-archives.org/git/624494-empty-directories.html
Patch: http://marc.info/?l=git&m=118480075313827&w=2

Johannes Schindelin also posted an alternative implementation, which emulates
empty dirs by adding empty .gitignore placeholder to the index.
http://marc.info/?l=git&m=118484785410247&w=2

You could also read the long discussion of the subtle semantic issues that storing empty
directories introduces in the mail thread accessible from above links.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-22  4:31 ` Salikh Zakirov
@ 2007-08-22 18:46   ` Linus Torvalds
  2007-08-22 19:12     ` David Kastrup
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2007-08-22 18:46 UTC (permalink / raw
  To: Salikh Zakirov; +Cc: git



On Wed, 22 Aug 2007, Salikh Zakirov wrote:
> 
> Linus Torvalds posted an untested patch in a recent discussion and requested
> that anyone interested in this functionality continued development and testing.

That untested patch was seriously broken - it didn't do the sorting of 
empty directories right. So it would need a lot of other work.

So I'm firmly back in the "just add a '.gitignore' file to the directory" 
camp.

Or you can fake it out entirely by making it an empty subproject, which 
also gives you an empty directory.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-22 18:46   ` Linus Torvalds
@ 2007-08-22 19:12     ` David Kastrup
  0 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-08-22 19:12 UTC (permalink / raw
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, 22 Aug 2007, Salikh Zakirov wrote:
>> 
>> Linus Torvalds posted an untested patch in a recent discussion and
>> requested that anyone interested in this functionality continued
>> development and testing.
>
> That untested patch was seriously broken - it didn't do the sorting
> of empty directories right.

Well, it depends on where one wants to see directories sorted in the
index: the index sort order does not necessarily need to be the same
as the repository sort order: merge conflict detection could benefit
from sorting the directory "early" in the index.  Of course, this
would mean that one needed to stash away directories temporarily while
processing the index until the corresponding tree in the repository
comes up.

> So it would need a lot of other work.

With either choice of sort order, yes.  One place or the other.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-21 17:40 ` Sean
@ 2007-08-22 21:25   ` Josh England
  2007-08-22 23:25     ` Linus Torvalds
  2007-08-22 23:40     ` Jakub Narebski
  0 siblings, 2 replies; 156+ messages in thread
From: Josh England @ 2007-08-22 21:25 UTC (permalink / raw
  To: git

On Wed, 22 Aug 2007, Linus Torvalds wrote:
> On Wed, 22 Aug 2007, Salikh Zakirov wrote:
> > 
> > Linus Torvalds posted an untested patch in a recent discussion and requested
> > that anyone interested in this functionality continued development and testing.
> 
> That untested patch was seriously broken - it didn't do the sorting of 
> empty directories right. So it would need a lot of other work.
> 
> So I'm firmly back in the "just add a '.gitignore' file to the directory" 
> camp.

Woah.  I just spent much of the morning reading the history of this
thread. My eyes are still bleeding, but I think I'm sufficiently
informed enough to be dangerous.

Without actually sticking my head in the honey pot surrounded by giant
bears, I just want to relate a revision control scenario that I've been
wanting to solve for several years. I deploy/maintain many linux
clusters that each have a single system image to boot all nodes on the
machines. My desire is to shove an *entire* image into a git
repository, and simply have it do the right thing.  Doing so and using
clones/branches/merges to maintain these images would be extremely
useful.  I've attempted this concept with several SCMs using various
workarounds for each but have abandoned each attempt mainly due to
performance issues.  Git shows the best performance by far (to the
point of actually being usable) for this purpose.

Forget about special files as those are almost certainly a lost cause.
I'm willing to use .gitignore in empty directories until a better
solution presents itself.  The main need is for file
ownership/permission, which has been touched on before.  When I clone
an image, I really want an *identical* clone, in every way.  It seems
as though git had this functionality but scrapped it due to issues with
umask and merge type problems?  So the question is:  would there be any
way to bring this functionality back as a non-default configurable
option?  For those of us who need the functionality, we'd be more than
willing to live with some of the side-effects.

The alternatives (involving wrappers and strict policy) just haven't
been idiot-proof enough to be truly viable.  It almost has to be a
built-in capability.  It looks like Nax is doing something close to
this.  Is there anyone else using trying to use git in a similar way?

-JE

PS:  I know this falls outside of git's intended use, but its the
closest thing to something that could work.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-22 21:25   ` Josh England
@ 2007-08-22 23:25     ` Linus Torvalds
  2007-08-22 23:55       ` David Kastrup
                         ` (2 more replies)
  2007-08-22 23:40     ` Jakub Narebski
  1 sibling, 3 replies; 156+ messages in thread
From: Linus Torvalds @ 2007-08-22 23:25 UTC (permalink / raw
  To: Josh England; +Cc: git



On Wed, 22 Aug 2007, Josh England wrote:
>
> The main need is for file ownership/permission, which has been touched 
> on before.  When I clone an image, I really want an *identical* clone, 
> in every way.  It seems as though git had this functionality but 
> scrapped it due to issues with umask and merge type problems?

Well, git had all permission bits, but never ownership. And yes, using 
more than the one user-x-bit ended up being totally unusable for source 
code, because of different people having different umask, so we 
effectively dropped the permission bits too (although the data format was 
retained, so we could re-introduce then with some flag that says "honor 
all permission bits, not just the x bit").

But the ownership thing we've never even tried to support, since it was so 
obviously not something that was appropriate for a distributed project. So 
if you want an identical clone with ownership and (full) permissions, you 
really do need to have some alternate way to fill in the blanks.

I've argued that ".gitattributes" may be an acceptable alternate, 
especially since ownership is often something that is less than "per 
file", and more often "has certain patterns".

> So the question is:  would there be any way to bring this functionality 
> back as a non-default configurable option?  For those of us who need the 
> functionality, we'd be more than willing to live with some of the 
> side-effects.

Full permissions might be easy enough to resurrect, but since it's still 
pointless without ownership, that really isn't even relevant.

But if .gitattributes would work, you probably could introduce both full 
permissions and ownership rules there. We read git attributes for *other* 
reasons when checking files out _anyway_, ie we need the CRLF attribute 
stuff, so adding ownership attributes would not be at all odd.

		Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-22 21:25   ` Josh England
  2007-08-22 23:25     ` Linus Torvalds
@ 2007-08-22 23:40     ` Jakub Narebski
  1 sibling, 0 replies; 156+ messages in thread
From: Jakub Narebski @ 2007-08-22 23:40 UTC (permalink / raw
  To: git

[Cc: Josh England <jjengla@sandia.gov>, git@vger.kernel.org]

Josh England wrote:

> [...]  The main need is for file
> ownership/permission, which has been touched on before.  When I clone
> an image, I really want an *identical* clone, in every way.  It seems
> as though git had this functionality but scrapped it due to issues with
> umask and merge type problems?  So the question is:  would there be any
> way to bring this functionality back as a non-default configurable
> option?  For those of us who need the functionality, we'd be more than
> willing to live with some of the side-effects.
> 
> The alternatives (involving wrappers and strict policy) just haven't
> been idiot-proof enough to be truly viable.  It almost has to be a
> built-in capability.  It looks like Nax is doing something close to
> this.  Is there anyone else using trying to use git in a similar way?

Check out (via e.g. http://git.or.cz/gitwiki/InterfacesFrontendsAndTools
wiki page) IsiSetup which is tool to manage configuration files (including
permissions) which uses git as engine, and metastore which is meant as
tool to use in appropriate hook for storing/restoring permissions etc.

And as Linus told you, if you have time to work on it, you can try to
make .gitattributes work for this...

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-22 23:25     ` Linus Torvalds
@ 2007-08-22 23:55       ` David Kastrup
  2007-08-23 15:24       ` Josh England
  2007-08-24 17:10       ` Jason Garber
  2 siblings, 0 replies; 156+ messages in thread
From: David Kastrup @ 2007-08-22 23:55 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Josh England, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Full permissions might be easy enough to resurrect, but since it's
> still pointless without ownership, that really isn't even relevant.

I'd not call it entirely pointless without ownership: under most
systems, only root can do chown, so for example a private backup of a
home directory usually has unique ownership (and nothing but the
normal ownership could be restored by a user, anyway).

However, once the user is member of more than a single group and
actually makes _use_ of that, we are getting on thin ice.  But at
least different group ownership is usually much better contained (and
thus reconstructible manually in the case of an emergency) as the
permissions are.

Since tracking permissions would be a per-project decision (nothing
else makes any sense), it should be workable to amend the tree records
themselves by adding ownership and ACL and whatever else optionally
right there in-place if one figures out a good syntax for it.

One still needs to come up with a good and flexible way to implement
policies: what kind of permissions/ownership data will be let into the
repository from workdir/pushing, and what won't?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: empty directories
  2007-08-22 23:25     ` Linus Torvalds
  2007-08-22 23:55       ` David Kastrup
@ 2007-08-23 15:24       ` Josh England
  2007-08-24 17:10       ` Jason Garber
  2 siblings, 0 replies; 156+ messages in thread
From: Josh England @ 2007-08-23 15:24 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

On Wed, 2007-08-22 at 16:25 -0700, Linus Torvalds wrote:
> But if .gitattributes would work, you probably could introduce both full 
> permissions and ownership rules there. We read git attributes for *other* 
> reasons when checking files out _anyway_, ie we need the CRLF attribute 
> stuff, so adding ownership attributes would not be at all odd.

OK, this looks like it has the desired effect.  commits/pulls/etc catch
and update the execute bit.  I'll try to find how .gitattributes hooks
in.  Any pointers/tips are appreciated.

-JE

^ permalink raw reply	[flat|nested] 156+ messages in thread

* RE: empty directories
  2007-08-22 23:25     ` Linus Torvalds
  2007-08-22 23:55       ` David Kastrup
  2007-08-23 15:24       ` Josh England
@ 2007-08-24 17:10       ` Jason Garber
  2 siblings, 0 replies; 156+ messages in thread
From: Jason Garber @ 2007-08-24 17:10 UTC (permalink / raw
  To: git

> But if .gitattributes would work, you probably could introduce both
full 
> permissions and ownership rules there. We read git attributes for
*other* 
> reasons when checking files out _anyway_, ie we need the CRLF
attribute 
> stuff, so adding ownership attributes would not be at all odd.
>
> 		Linus

And as a side-note, it would be quite trivial to write a script to
initially populate a .gitattributes file cleanly (and regen when
needed).

~ JasonG

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Empty Directories
@ 2008-08-15 21:20 Trans
  2008-08-15 21:42 ` Marcus Griep
                   ` (2 more replies)
  0 siblings, 3 replies; 156+ messages in thread
From: Trans @ 2008-08-15 21:20 UTC (permalink / raw
  To: Git Mailing List

New to git...

Is it true there is no way to track empty directories?

Thanks,
T.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty Directories
  2008-08-15 21:20 Empty Directories Trans
@ 2008-08-15 21:42 ` Marcus Griep
  2008-08-15 21:46 ` Miklos Vajna
  2008-08-16  1:53 ` Paul Franz
  2 siblings, 0 replies; 156+ messages in thread
From: Marcus Griep @ 2008-08-15 21:42 UTC (permalink / raw
  To: Trans; +Cc: Git Mailing List

Directly, no.  An easy workaround is to drop an empty file
(such as a .gitignore) into the directory and track that.
Then the directory will come along for the ride.

Trans wrote:
> New to git...
> 
> Is it true there is no way to track empty directories?
> 
> Thanks,
> T.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Marcus Griep
GPG Key ID: 0x5E968152
——
http://www.boohaunt.net
את.ψο´

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty Directories
  2008-08-15 21:20 Empty Directories Trans
  2008-08-15 21:42 ` Marcus Griep
@ 2008-08-15 21:46 ` Miklos Vajna
  2008-08-16  1:53 ` Paul Franz
  2 siblings, 0 replies; 156+ messages in thread
From: Miklos Vajna @ 2008-08-15 21:46 UTC (permalink / raw
  To: Trans; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 233 bytes --]

On Fri, Aug 15, 2008 at 05:20:01PM -0400, Trans <transfire@gmail.com> wrote:
> Is it true there is no way to track empty directories?

See the FAQ entry:

http://git.or.cz/gitwiki/GitFaq#head-1fbd4a018d45259c197b169e87dafce2a3c6b5f9

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Empty Directories
  2008-08-15 21:20 Empty Directories Trans
  2008-08-15 21:42 ` Marcus Griep
  2008-08-15 21:46 ` Miklos Vajna
@ 2008-08-16  1:53 ` Paul Franz
  2 siblings, 0 replies; 156+ messages in thread
From: Paul Franz @ 2008-08-16  1:53 UTC (permalink / raw
  To: Trans; +Cc: Git Mailing List

T.,
    It is interesting that you post the same thing in the Mercurial and 
git mailing lists. Did you post it in Monotone and other distributed 
version control system mailing list?

Paul Franz

Trans wrote:
> New to git...
>
> Is it true there is no way to track empty directories?
>
> Thanks,
> T.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

-- 

-------------------------------------------

There are seven sins in the world.
     Wealth without work.
     Pleasure without conscience.
     Knowledge without character.
     Commerce without morality.
     Science without humanity.
     Worship without sacrifice.
     Politics without principle.

   -- Mohandas Gandhi

-------------------------------------------

^ permalink raw reply	[flat|nested] 156+ messages in thread

end of thread, other threads:[~2008-08-16  1:54 UTC | newest]

Thread overview: 156+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-18  0:13 Empty directories David Kastrup
2007-07-18  0:35 ` Johannes Schindelin
2007-07-18  6:07   ` David Kastrup
2007-07-18 10:26     ` Johannes Schindelin
     [not found]       ` <86tzs2m1h7.fsf@lola.quinscape.zz>
2007-07-18 11:24         ` Johannes Schindelin
2007-07-18 11:40           ` Matthieu Moy
2007-07-18 12:12             ` David Kastrup
2007-07-18 16:23     ` Linus Torvalds
2007-07-18 16:33       ` Linus Torvalds
2007-07-18 17:38         ` David Kastrup
2007-07-18 18:05           ` Linus Torvalds
2007-07-18 16:39       ` Matthieu Moy
2007-07-18 17:06         ` Linus Torvalds
2007-07-18 21:37           ` David Kastrup
2007-07-18 21:45             ` Linus Torvalds
2007-07-18 23:13               ` David Kastrup
2007-07-18 23:16               ` [RFC PATCH] " Linus Torvalds
2007-07-18 23:40                 ` Linus Torvalds
2007-07-18 23:42                 ` David Kastrup
2007-07-19  0:22                   ` Linus Torvalds
2007-07-19  5:28                     ` Junio C Hamano
2007-07-19  5:38                       ` Shawn O. Pearce
2007-07-19  6:08                         ` David Kastrup
2007-07-19  7:10                           ` Geoff Russell
2007-07-19  6:09                         ` Shawn O. Pearce
2007-07-19  8:13                           ` Matthieu Moy
2007-07-19 10:51                             ` Tomash Brechko
2007-07-19 11:31                               ` David Kastrup
2007-07-19 12:32                                 ` Tomash Brechko
2007-07-19 12:46                                   ` David Kastrup
2007-07-23 20:18                                     ` Nix
2007-07-23 20:49                                       ` David Kastrup
2007-07-23 21:49                                         ` Nix
2007-07-23 22:05                                           ` Nix
2007-07-23 22:52                                             ` Jakub Narebski
2007-07-25 22:43                                               ` Nix
2007-07-23 22:16                                           ` David Kastrup
2007-07-23 22:31                                             ` Linus Torvalds
2007-07-23 23:32                                               ` Nix
2007-07-23 23:57                                                 ` Linus Torvalds
     [not found]                                               ` <86ps2ithyl.fsf@lola.quinscape.zz>
2007-07-24  6:56                                                 ` Nix
2007-07-19 12:38                                 ` David Kastrup
2007-07-19 13:21                                   ` David Kastrup
2007-07-19 12:16                               ` Johannes Schindelin
2007-07-19 12:24                                 ` David Kastrup
2007-07-19 14:44                                   ` Brian Gernhardt
2007-07-19 15:43                                     ` Johannes Schindelin
2007-07-19 16:06                                       ` Brian Gernhardt
2007-07-19 16:17                                         ` Johannes Schindelin
2007-07-19 16:28                                           ` David Kastrup
2007-07-19 16:34                                           ` Brian Gernhardt
2007-07-19 17:30                                             ` Johannes Schindelin
     [not found]                                             ` <Pine.LNX.4.64.070719 1829530.14781@racer.site>
2007-07-19 17:47                                               ` David Kastrup
2007-07-19 16:17                                       ` Matthieu Moy
2007-07-19 16:21                                       ` David Kastrup
     [not found]                         ` <9436820E-53D1-425D-922E-D4C76578E40A@silverinsanity.com>
     [not found]                           ` <863azk78yp.fsf@lola.quinscape.zz>
2007-07-19 15:08                             ` Brian Gernhardt
2007-07-19 15:27                               ` David Kastrup
2007-07-19 15:50                                 ` Brian Gernhardt
2007-07-20  0:01                               ` Junio C Hamano
2007-07-20  0:15                                 ` Linus Torvalds
2007-07-20  0:33                                   ` Linus Torvalds
2007-07-20  2:24                                     ` Junio C Hamano
2007-07-20  2:31                                       ` Linus Torvalds
2007-07-20  5:55                                         ` David Kastrup
2007-07-20  5:58                                       ` David Kastrup
2007-07-20 15:31                                         ` Linus Torvalds
2007-07-20  5:35                                     ` David Kastrup
2007-07-20  9:27                                       ` Simon 'corecode' Schubert
2007-07-20 10:11                                         ` David Kastrup
2007-07-20 10:34                                         ` Junio C Hamano
2007-07-20 13:23                                           ` David Kastrup
2007-07-20 19:24                                           ` Linus Torvalds
2007-07-20 21:02                                             ` Johan Herland
2007-07-20 21:48                                               ` Linus Torvalds
2007-07-20 22:36                                                 ` Julian Phillips
2007-07-21  0:18                                                   ` Linus Torvalds
2007-07-21  1:23                                                     ` David Kastrup
2007-07-21  3:54                                                       ` David Kastrup
     [not found]                                     ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net>
2007-07-20  5:53                                       ` David Kastrup
2007-07-20 10:19                                   ` Olivier Galibert
2007-07-19  5:59                       ` David Kastrup
2007-07-19  9:54                         ` David Kastrup
     [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
2007-07-22 21:08                     ` David Kastrup
2007-07-21  4:29                 ` David Kastrup
2007-07-21  4:51                   ` Linus Torvalds
2007-07-21  5:08                     ` Linus Torvalds
2007-07-21  5:28                       ` David Kastrup
2007-07-21 15:53                         ` Linus Torvalds
2007-07-21 17:38                           ` David Kastrup
2007-07-21 17:52                             ` Simon 'corecode' Schubert
2007-07-21 18:08                               ` David Kastrup
2007-07-21 23:50                             ` Linus Torvalds
2007-07-22  0:18                               ` David Kastrup
2007-07-22  0:37                                 ` Linus Torvalds
2007-07-22  1:05                                   ` David Kastrup
2007-07-22  1:41                                     ` Linus Torvalds
2007-07-22  2:39                                       ` David Kastrup
2007-07-22  3:43                                         ` Linus Torvalds
2007-07-22  4:28                                           ` David Kastrup
2007-07-22  6:38                                             ` david
2007-07-22  9:08                                               ` David Kastrup
2007-07-22 17:30                                                 ` Linus Torvalds
2007-07-22 17:59                                                   ` David Kastrup
2007-07-22 17:28                                             ` Linus Torvalds
2007-07-22 17:33                                             ` Linus Torvalds
     [not found]                                             ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org>
2007-07-22 18:58                                               ` David Kastrup
2007-07-22  1:16                                 ` Jakub Narebski
2007-07-22  1:39                                   ` David Kastrup
2007-07-22 12:06                                     ` Jakub Narebski
2007-07-22 13:53                                       ` David Kastrup
2007-07-22 20:26                                         ` Jakub Narebski
2007-07-22 22:57                                           ` David Kastrup
2007-07-23  6:05                                             ` David Kastrup
2007-07-23  7:45                                               ` David Kastrup
2007-07-22  0:34                               ` David Kastrup
2007-07-22  4:00                             ` Brian Gernhardt
2007-07-28  8:44                       ` David Kastrup
     [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
2007-07-21  5:15                     ` David Kastrup
2007-07-18 17:34       ` David Kastrup
2007-07-18  0:39 ` Matthieu Moy
2007-07-18  6:16   ` David Kastrup
2007-07-18  6:30     ` Shawn O. Pearce
2007-07-18  2:23 ` Junio C Hamano
2007-07-18  5:56   ` David Kastrup
2007-07-18  6:34     ` Wincent Colaiuta
2007-07-18  6:53     ` Junio C Hamano
     [not found]       ` <867ioyqhgc.fsf@lola.quinscape.zz>
2007-07-18 23:34         ` Junio C Hamano
2007-07-20  8:29       ` Johan Herland
2007-07-20  8:41         ` David Kastrup
2007-07-20 10:20           ` Johan Herland
2007-07-20 10:54             ` David Kastrup
2007-07-20 12:18               ` Johan Herland
     [not found]                 ` <86odi7utdj.fsf@lola.quinscape.zz>
2007-07-20 13:20                   ` Johan Herland
2007-07-20 13:33                     ` David Kastrup
2007-07-22 21:35       ` David Kastrup
2007-07-26 23:33 ` Robin Rosenberg
2007-07-27  5:22   ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2008-08-15 21:20 Empty Directories Trans
2008-08-15 21:42 ` Marcus Griep
2008-08-15 21:46 ` Miklos Vajna
2008-08-16  1:53 ` Paul Franz
2007-08-21 17:14 empty directories Josh England
2007-08-21 17:40 ` Sean
2007-08-22 21:25   ` Josh England
2007-08-22 23:25     ` Linus Torvalds
2007-08-22 23:55       ` David Kastrup
2007-08-23 15:24       ` Josh England
2007-08-24 17:10       ` Jason Garber
2007-08-22 23:40     ` Jakub Narebski
2007-08-22  0:06 ` Jakub Narebski
2007-08-22  4:31 ` Salikh Zakirov
2007-08-22 18:46   ` Linus Torvalds
2007-08-22 19:12     ` David Kastrup
2007-04-23 15:40 Yakov Lerner
2007-04-23 16:19 ` Alex Riesen
2007-04-23 16:49   ` Yakov Lerner

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).