git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Thorsten Glaser <tg@mirbsd.de>
Cc: Michael J Gruber <git@drmicha.warpmail.net>,
	Richard Hartmann <richih.mailinglist@gmail.com>,
	Git List <git@vger.kernel.org>
Subject: Re: Tracking file metadata in git -- fix metastore or enhance git?
Date: Fri, 8 Apr 2011 14:45:48 -0500	[thread overview]
Message-ID: <20110408194548.GA26094@elie> (raw)
In-Reply-To: <Pine.BSM.4.64L.1104081903550.22999@herc.mirbsd.org>

Thorsten Glaser wrote:
> Jonathan Nieder dixit:

>> I think the most native-looking way to store metadata associated to
>> paths is .gitattributes.  It also has the nice feature of allowing a
>> single attribute to apply to multiple files.
>
> Eh, no. Think of extended attributes like, say, NTFS Resource Forks.
> They’re just different “lines” into the “plane” a file can be, if
> you excuse the metapher. (All parallel, of course.)

Do you mean no, it doesn't have that feature? ;-)

Each git commit (try it with "git cat-file commit HEAD) looks like so:

	tree <tree name>
	parent <commit name for first parent>
	parent <commit name for second parent>
	...
	author <author identity and time of authorship>
	committer <committer identity and time committed>
	encoding <encoding of log message (optional)>

	<free-form change description>

Where could one sneak in some per-path metadata?

 - as new header fields after "encoder" (teaching git fsck, git commit
   --amend, and so on about it)?  That can work but it would slow down
   operations not interested in this metadata.  It is best not to have
   O(number of paths) header fields.

 - in the change description?  Yes, that can work, too, and it doesn't
   even require changing the commit format.

 - a new header field pointing to another object?  That is possible as
   a last resort.

Anyway, filenames and associated content are not what commits are
about; commits are just nodes in a revision graph, with trees representing
the tracked trees.

Okay, so what about the trees?

	<mode> SP <filename> NUL <object name>
	...

Where can we sneak something in?

 - use a currently invalid <mode>?  No, tracking metadata is probably
   not worth breaking old git clients.
 - use an invalid object name?  No (for the same reason).
 - use a special filename?  Then old git clients will treat the file
   as a regular file, so they still get access to the data.

So you see, using ordinary files (whether called .gitattributes or
foo.c.ntfs-resource-fork) to track this extra data makes a lot of
sense.

Now Michael mentioned an alternative, which is to store this
information in separate objects.  That way, you could push your
history without the extra metadata, you could edit the metadata
without changing the commit names of the history, separately
garbage-collect metadata you're not interested in, etc.  If that is
your goal, then "git notes" is exactly the right solution.

> They are just
> another facet of each file.

Sure, like the atime, the inode number, the uid of the user who wrote
them, and the model number of the disk used to store it.

Oh, you mean they're _relevant_ facets?  Yes, that's believable,
though I suspect that's only going to sometimes be the case.  So the
operator should say "yes, I'm interested in tracking this extra
information".  To summarize the above, some ways this could work
behind the scenes:

 * dotfiles with metadata;

 * a Makefile to install files with metadata (i.e., the "source"
   consists of plain files, while the "build product" has the
   specified metadata);

 * something else.  Hopefully the above explains the relevant
   constraints so you can surprise us.

Hope that helps.
Jonathan

  reply	other threads:[~2011-04-08 19:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-07 19:16 Tracking file metadata in git -- fix metastore or enhance git? Richard Hartmann
2011-04-07 19:27 ` Thorsten Glaser
2011-04-08  0:29   ` Richard Hartmann
2011-04-08 10:01     ` Michael J Gruber
2011-04-08 18:59       ` Jonathan Nieder
2011-04-08 19:05         ` Thorsten Glaser
2011-04-08 19:45           ` Jonathan Nieder [this message]
2011-04-08 19:58             ` Thorsten Glaser
2011-04-08 21:23               ` Richard Hartmann
2011-04-09  8:11                 ` Chris Webb
2011-04-09  9:09                   ` Richard Hartmann
2011-04-10  0:15                     ` Jonathan Nieder
2011-04-10  1:03                       ` Junio C Hamano
2011-04-10  1:31                         ` Richard Hartmann
2011-04-11  0:12                           ` Richard Hartmann
2011-04-18  0:21 ` Richard Hartmann
2011-04-18  0:45   ` Jonathan Nieder
2011-12-14  4:54     ` johnnyutahh
2011-12-20  0:55       ` Richard Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110408194548.GA26094@elie \
    --to=jrnieder@gmail.com \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=richih.mailinglist@gmail.com \
    --cc=tg@mirbsd.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).