From: Thomas Gummerer <t.gummerer@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, trast@student.ethz.ch, mhagger@alum.mit.edu,
pclouds@gmail.com, robin.rosenberg@dewire.com
Subject: Re: [PATCH/RFC v3 04/13] Add documentation of the index-v5 file format
Date: Fri, 10 Aug 2012 01:10:39 +0200 [thread overview]
Message-ID: <20120809231039.GC5127@tommy-fedora.scientificnet.net> (raw)
In-Reply-To: <7vobmjn0wv.fsf@alter.siamese.dyndns.org>
On 08/09, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
>
> > +GIT index format
> > +================
> > +
> > +== The git index file format
> > +
> > + The git index file (.git/index) documents the status of the files
> > + in the git staging area.
> > +
> > + The staging area is used for preparing commits, merging, etc.
>
> The above two are not about "index file format". It is an
> explanation of what the index is.
>
> > + All binary numbers are in network byte order. Version 5 is described
> > + here.
>
> I had to read between these two lines something like
>
> ""The index file consists of various sections; the sections
> appear in the following order in the file."""
>
> to make sense of the document.
Thanks, I'll add that.
> > + - A 20-byte header consisting of
> > +
> > + sig (32-bits): Signature:
> > + The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
> > +
> > + vnr (32-bits): Version number:
> > + The current supported versions are 2, 3, 4 and 5.
> > +
> > + ndir (32-bits): number of directories in the index.
> > +
> > + nfile (32-bits): number of file entries in the index.
> > +
> > + fblockoffset (32-bits): offset to the file block, relative to the
> > + beginning of the file.
>
> Ok.
>
> > + - Offset to the extensions.
> >
> > + nextensions (32-bits): number of extensions.
> > +
> > + extoffset (32-bits): offset to the extension. (Possibly none, as
> > + many as indicated in the 4-byte number of extensions)
>
> OK.
>
> > + headercrc (32-bits): crc checksum for the header and extension
> > + offsets
>
> This may have to have the same " - <section title>" at the same
> level as "A 20-byte header" and "Offset to the ext"; as it stands,
> it looks as if it is part of "Offset to the ext" which consists of
> 12 bytes.
Thanks, I'll try to write it down more clearly.
> > + - diroffsets (ndir * directory offsets): A directory offset for each
> > + of the ndir directories in the index, sorted by pathname (of the
> > + directory it's pointing to) (see below). The diroffsets are relative
> > + to the beginning of the direntries block. [1]
>
> "ndir * diroffsets" confused me. I think you meant to say that this
> "diroffsets" section consists of ndir entries of something and that
> each of that something is a directory offset. It is unclear how "a
> directory offset" is represented, except that it is "relative to the
> beginning of direntry block" (and it is unclear what and where the
> direntry block is from the information given up to this point) and
> the reader can guess it is in "network byte order" (assuming it is a
> binary number). Perhaps
>
> diroffsets (ndir entries of "directory offset"): A 4-byte
> offset relative to the beginning of the "direntries block"
> (see below) for each of the ...
>
> and drop the last sentence?
>
> Other tables may want to be adjusted in a similar fashion.
Yes, that's what I menat to say. Thanks.
> > +== Directory offsets (diroffsets)
> > +
> > + diroffset (32-bits): offset to the directory relative to the beginning
> > + of the index file. There are ndir + 1 offsets in the diroffset table,
> > + the last is pointing to the end of the last direntry. With this last
> > + entry, we can replace the strlen when reading each filename, by
> > + calculating its length with the offsets.
>
> The mention of "strlen" looks very out of place. The reader may be
> able to guess that you want to say that the nth "string" is between
> diroffset[n] and diroffset[n+1], and these "string"s are densely
> packed so strlen(diroffset[n]) and diroffset[n+1]-diroffset[n] are
> either the same thing (or with a fixed difference, if each "string"
> is accompanied by some fixed-length data), but it is unclear what
> these "strings" represent, especially because the name of the table
> implies that you are talking about directories but strlen talks
> about filename.
Hrm maybe better like this:
+ diroffset (32-bits): offset to the directory relative to the beginning
+ of the index file. There are ndir + 1 offsets in the diroffset table,
+ the last is pointing to the end of the last direntry. With this last
+ entry, we are able to replace the strlen of when reading the directory
+ name, by calculating it from diroffset[n+1]-diroffset[n]-61. 61 is the
+ size of the directory data, which follows each each directory + the
+ crc sum + the NUL byte.
> > +== Design explanations
> > + ...
> > +[3] The data of the cache-tree extension and the resolve undo
> > + extension is now part of the index itself, but if other extensions
> > + come up in the future, there is no need to change the index, they
> > + can simply be added at the end.
>
> Interesting. When we added extensions, we said that there is no
> need to change the index to add new features, they can simply be
> added at the end. Perhaps the file offset table can be added as an
> extension to v2 to give us the same bisectability, allowing us a
> single entry in-place replacementability, without defining an
> entirely different format?
Only part of this is true. v2 would allow us to add the file offset
table as extension, but the problem is the design of the sha-1 over
the whole file at the end. That would only allow single entry
replacements, if we then re-read the file and calculate the sha-1 at
the end. Partial reading also could only be implemented when reading
the whole file first to check the sha-1, which defeats it's purpose.
next prev parent reply other threads:[~2012-08-09 23:10 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-08 11:17 [PATCH/RFC v3 0/13] Introduce index file format version 5 Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 01/13] Move index v2 specific functions to their own file Thomas Gummerer
2012-08-08 12:04 ` Nguyen Thai Ngoc Duy
2012-08-08 19:21 ` Thomas Gummerer
2012-08-09 22:02 ` Junio C Hamano
2012-08-09 22:54 ` Thomas Gummerer
2012-08-10 0:13 ` Junio C Hamano
2012-08-10 2:23 ` Nguyen Thai Ngoc Duy
2012-08-10 14:24 ` Thomas Rast
2012-08-10 14:58 ` Junio C Hamano
2012-08-10 15:40 ` Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 02/13] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 03/13] t3700: Avoid interfering with the racy code Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 04/13] Add documentation of the index-v5 file format Thomas Gummerer
2012-08-09 22:41 ` Junio C Hamano
2012-08-09 23:10 ` Thomas Gummerer [this message]
2012-08-09 23:13 ` Junio C Hamano
2012-08-08 11:17 ` [PATCH/RFC v3 05/13] Make in-memory format aware of stat_crc Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 06/13] Read index-v5 Thomas Gummerer
2012-08-08 12:05 ` Nguyen Thai Ngoc Duy
2012-08-08 12:18 ` Johannes Sixt
2012-08-08 17:05 ` Junio C Hamano
2012-08-08 19:29 ` Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 07/13] Read resolve-undo data Thomas Gummerer
2012-08-09 22:51 ` Junio C Hamano
2012-08-09 23:23 ` Thomas Gummerer
2012-08-10 0:02 ` Junio C Hamano
2012-08-10 9:27 ` Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 08/13] Read cache-tree in index-v5 Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 09/13] Write index-v5 Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 10/13] Write index-v5 cache-tree data Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 11/13] Write resolve-undo data for index-v5 Thomas Gummerer
2012-08-08 11:18 ` [PATCH/RFC v3 12/13] update-index.c: always rewrite the index when index-version is given Thomas Gummerer
2012-08-08 11:18 ` [PATCH/RFC v3 13/13] p0002-index.sh: add perf test for the index formats Thomas Gummerer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120809231039.GC5127@tommy-fedora.scientificnet.net \
--to=t.gummerer@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=robin.rosenberg@dewire.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).