git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>
Subject: Re: [PATCH 13/22] documentation: add documentation of the index-v5 file format
Date: Thu, 11 Jul 2013 13:39:10 +0200	[thread overview]
Message-ID: <87mwptcom9.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8ALSBPq1+TP_YxJ=ecUwpKRY-i2O=+q8qMjtXbjShg3mA@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Sun, Jul 7, 2013 at 3:11 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +== File entry (fileentries)
>> +
>> +  File entries are sorted in ascending order on the name field, after the
>> +  respective offset given by the directory entries. All file names are
>> +  prefix compressed, meaning the file name is relative to the directory.
>> +
>> +  filename (variable length, nul terminated). The exact encoding is
>> +    undefined, but the filename cannot contain a NUL byte (iow, the same
>> +    encoding as a UNIX pathname).
>> +
>> +  flags (16-bits): 'flags' field split into (high to low bits)
>> +
>> +    assumevalid (1-bit): assume-valid flag
>> +
>> +    intenttoadd (1-bit): intent-to-add flag, used by "git add -N".
>> +      Extended flag in index v3.
>> +
>> +    stage (2-bit): stage of the file during merge
>> +
>> +    skipworktree (1-bit): skip-worktree flag, used by sparse checkout.
>> +      Extended flag in index v3.
>> +
>> +    smudged (1-bit): indicates if the file is racily smudged.
>> +
>> +    10-bit unused, must be zero [6]
>> +
>> +  mode (16-bits): file mode, split into (high to low bits)
>> +
>> +    objtype (4-bits): object type
>> +      valid values in binary are 1000 (regular file), 1010 (symbolic
>> +      link) and 1110 (gitlink)
>> +
>> +    3-bit unused
>> +
>> +    permission (9-bits): unix permission. Only 0755 and 0644 are valid
>> +      for regular files. Symbolic links and gitlinks have value 0 in
>> +      this field.
>> +
>> +  mtimes (32-bits): mtime seconds, the last time a file's data changed
>> +    this is stat(2) data
>> +
>> +  mtimens (32-bits): mtime nanosecond fractions
>> +    this is stat(2) data
>> +
>> +  file size (32-bits): The on-disk size, trucated to 32-bit.
>> +    this is stat(2) data
>> +
>> +  statcrc (32-bits): crc32 checksum over ctime seconds, ctime
>> +    nanoseconds, ino, dev, uid, gid (All stat(2) data
>> +    except mtime and file size). If the statcrc is 0 it will
>> +    be ignored. [7]
>> +
>> +  objhash (160-bits): SHA-1 for the represented object
>> +
>> +  entrycrc (32-bits): crc32 checksum for the file entry. The crc code
>> +    includes the offset to the offset to the file, relative to the
>> +    beginning of the file.
>
> Question about the possibility of updating index file directly. If git
> updates a few fields of an entry (but not entrycrc yet) and crashes,
> the entry would become corrupt because its entrycrc does not match the
> content. What do we do? Do we need to save a copy of the entry
> somewhere in the index file (maybe in the conflict data section), so
> that the reader can recover the index? Losing the index because of
> bugs is big deal in my opinion. pre-v5 never faces this because we
> keep the original copy til the end.
>
> Maybe entrycrc should not cover stat fields and statcrc. It would make
> refreshing safer. If the above happens during refresh, only statcrc is
> corrupt and we can just refresh the entry. entrycrc still says the
> other fields are good (and they are).

The original idea was to change the lock-file for partial writing to
make it work for this case.  The exact structure of the file still has
to be defined, but generally it would be done in the following steps:

  1. Write the changed entry to the lock-file
  2. Change the entry in the index
  3. If we succeed delete the lock-file (commit the transaction)

If git crashes, and leaves the index corrupted, we can recover the
information from the lock-file and write the new information to the
index file and then delete the lock-file.

  reply	other threads:[~2013-07-11 11:39 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-07  8:11 [PATCH 00/22] Index v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 01/22] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-07  8:11 ` [PATCH 02/22] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-07  8:11 ` [PATCH 03/22] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-07  8:11 ` [PATCH 04/22] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-07  8:11 ` [PATCH 05/22] read-cache: add index reading api Thomas Gummerer
2013-07-08  2:01   ` Duy Nguyen
2013-07-08 11:40     ` Thomas Gummerer
2013-07-08  2:19   ` Duy Nguyen
2013-07-08 11:20     ` Thomas Gummerer
2013-07-08 12:45       ` Duy Nguyen
2013-07-08 13:37         ` Thomas Gummerer
2013-07-08 20:54         ` [PATCH 5.5/22] Add documentation for the index api Thomas Gummerer
2013-07-09 15:42           ` Duy Nguyen
2013-07-09 20:10             ` Thomas Gummerer
2013-07-10  5:28               ` Duy Nguyen
2013-07-11 11:30                 ` Thomas Gummerer
2013-07-11 11:42                   ` Duy Nguyen
2013-07-11 12:27                     ` Duy Nguyen
2013-07-08 16:36   ` [PATCH 05/22] read-cache: add index reading api Junio C Hamano
2013-07-08 20:10     ` Thomas Gummerer
2013-07-08 23:09       ` Junio C Hamano
2013-07-09 20:13         ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 06/22] make sure partially read index is not changed Thomas Gummerer
2013-07-08 16:31   ` Junio C Hamano
2013-07-08 18:33     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 07/22] dir.c: use index api Thomas Gummerer
2013-07-07  8:11 ` [PATCH 08/22] tree.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 09/22] name-hash.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 10/22] grep.c: Use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 11/22] ls-files.c: use the " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 12/22] read-cache: make read_blob_data_from_index use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 13/22] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-11 10:39   ` Duy Nguyen
2013-07-11 11:39     ` Thomas Gummerer [this message]
2013-07-11 11:47       ` Duy Nguyen
2013-07-11 12:26         ` Thomas Gummerer
2013-07-11 12:50           ` Duy Nguyen
2013-07-07  8:11 ` [PATCH 14/22] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-07  8:11 ` [PATCH 15/22] read-cache: read index-v5 Thomas Gummerer
2013-07-07 20:18   ` Eric Sunshine
2013-07-08 11:40     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 16/22] read-cache: read resolve-undo data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 17/22] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-07 20:41   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 18/22] read-cache: write index-v5 Thomas Gummerer
2013-07-07 20:43   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 19/22] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 20/22] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 21/22] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-07  8:12 ` [PATCH 22/22] p0003-index.sh: add perf test for the index formats Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwptcom9.fsf@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=trast@inf.ethz.ch \
    --subject='Re: [PATCH 13/22] documentation: add documentation of the index-v5 file format' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).