git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
4213087075c8dcafa1f73aaa912c005b3394f8cd blob 11253 bytes (raw)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
 
GIT index format
================

== The git index

   The git index file (.git/index) documents the status of the files
     in the git staging area.

   The staging area is used for preparing commits, merging, etc.

== The git index file format

   All binary numbers are in network byte order. Version 5 is described
     here. The index file consists of various sections. They appear in
     the following order in the file.

   - header: the description of the index format, including it's signature,
     version and various other fields that are used internally.

   - diroffsets (ndir entries of "direcotry offset"): A 4-byte offset
       relative to the beginning of the "direntries block" (see below)
       for each of the ndir directories in the index, sorted by pathname
       (of the directory it's pointing to). [1]

   - direntries (ndir entries of "directory offset"): A directory entry
       for each of the ndir directories in the index, sorted by pathname
       (see below). [2]

   - fileoffsets (nfile entries of "file offset"): A 4-byte offset
       relative to the beginning of the fileentries block (see below)
       for each of the nfile files in the index. [1]

   - fileentries (nfile entries of "file entry"): A file entry for
       each of the nfile files in the index (see below).

   - crdata: A number of entries for conflicted data/resolved conflicts
       (see below).

   - Extensions (Currently none, see below in the future)

     Extensions are identified by signature. Optional extensions can
     be ignored if GIT does not understand them.

     GIT supports an arbitrary number of extension, but currently none
     is implemented. [3]

     extsig (32-bits): extension signature. If the first byte is 'A'..'Z'
     the extension is optional and can be ignored.

     extsize (32-bits): size of the extension, excluding the header
       (extsig, extsize, extchecksum).

     extchecksum (32-bits): crc32 checksum of the extension signature
       and size.

    - Extension data.

== Header
   sig (32-bits): Signature:
     The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")

   vnr (32-bits): Version number:
     The current supported versions are 2, 3, 4 and 5.

   ndir (32-bits): number of directories in the index.

   nfile (32-bits): number of file entries in the index.

   fblockoffset (32-bits): offset to the file block, relative to the
     beginning of the file.

   - Offset to the extensions.

     nextensions (32-bits): number of extensions.

     extoffset (32-bits): offset to the extension. (Possibly none, as
       many as indicated in the 4-byte number of extensions)

   headercrc (32-bits): crc checksum including the header and the
     offsets to the extensions.


== Directory offsets (diroffsets)

  diroffset (32-bits): offset to the directory relative to the beginning
    of the index file. There are ndir + 1 offsets in the diroffset table,
    the last is pointing to the end of the last direntry. With this last
    entry, we are able to replace the strlen of when reading the directory
    name, by calculating it from diroffset[n+1]-diroffset[n]-61.  61 is the
    size of the directory data, which follows each each directory + the
    crc sum + the NUL byte.

  This part is needed for making the directory entries bisectable and
    thus allowing a binary search.

== Directory entry (direntries)

  Directory entries are sorted in lexicographic order by the name
    of their path starting with the root.

  pathname (variable length, nul terminated): relative to top level
    directory (without the leading slash). '/' is used as path
    separator. A string of length 0 ('') indicates the root directory.
    The special path components ".", and ".." (without quotes) are
    disallowed. The path also includes a trailing slash. [9]

  foffset (32-bits): offset to the lexicographically first file in
    the file offsets (fileoffsets), relative to the beginning of
    the fileoffset block.

  cr (32-bits): offset to conflicted/resolved data at the end of the
    index. 0 if there is no such data. [4]

  ncr (32-bits): number of conflicted/resolved data entries at the
    end of the index if the offset is non 0. If cr is 0, ncr is
    also 0.

  nsubtrees (32-bits): number of subtrees this tree has in the index.

  nfiles (32-bits): number of files in the directory, that are in
    the index.

  nentries (32-bits): number of entries in the index that is covered
    by the tree this entry represents. (-1 if the entry is invalid).
    This number includes all the files in this tree, recursively.

  objname (160-bits): object name for the object that would result
    from writing this span of index as a tree. This is only valid
    if nentries is valid, meaning the cache-tree is valid.

  flags (16-bits): 'flags' field split into (high to low bits) (For
    D/F conflicts)

    stage (2-bits): stage of the directory during merge

    14-bit unused

  dircrc (32-bits): crc32 checksum for each directory entry.

  The last 24 bytes (4-byte number of entries + 160-bit object name) are
    for the cache tree. An entry can be in an invalidated state which is
    represented by having -1 in the entry_count field.

  The entries are written out in the top-down, depth-first order. The
    first entry represents the root level of the repository, followed by
    the first subtree - let's call it A - of the root level, followed by
    the first subtree of A, ... There is no prefix compression for
    directories.

== File offsets (fileoffsets)

  fileoffset (32-bits): offset to the file relative to the beginning of
    the fileentries block.

  This part is needed for making the file entries bisectable and
    thus allowing a binary search. There are nfile + 1 offsets in the
    fileoffset table, the last is pointing to the end of the last
    fileentry. With this last entry, we can replace the strlen when
    reading each filename, by calculating its length with the offsets.

== File entry (fileentries)

  File entries are sorted in ascending order on the name field, after the
  respective offset given by the directory entries. All file names are
  prefix compressed, meaning the file name is relative to the directory.

  filename (variable length, nul terminated). The exact encoding is
    undefined, but the filename cannot contain a NUL byte (iow, the same
    encoding as a UNIX pathname).

  flags (16-bits): 'flags' field split into (high to low bits)

    assumevalid (1-bit): assume-valid flag

    intenttoadd (1-bit): intent-to-add flag, used by "git add -N".
      Extended flag in index v3.

    stage (2-bit): stage of the file during merge

    skipworktree (1-bit): skip-worktree flag, used by sparse checkout.
      Extended flag in index v3.

    smudged (1-bit): indicates if the file is racily smudged.

    10-bit unused, must be zero [6]

  mode (16-bits): file mode, split into (high to low bits)

    objtype (4-bits): object type
      valid values in binary are 1000 (regular file), 1010 (symbolic
      link) and 1110 (gitlink)

    3-bit unused

    permission (9-bits): unix permission. Only 0755 and 0644 are valid
      for regular files. Symbolic links and gitlinks have value 0 in
      this field.

  mtimes (32-bits): mtime seconds, the last time a file's data changed
    this is stat(2) data

  mtimens (32-bits): mtime nanosecond fractions
    this is stat(2) data

  file size (32-bits): The on-disk size, trucated to 32-bit.
    this is stat(2) data

  statcrc (32-bits): crc32 checksum over ctime seconds, ctime
    nanoseconds, ino, dev, uid, gid (All stat(2) data
    except mtime and file size). If the statcrc is 0 it will
    be ignored. [7]

  objhash (160-bits): SHA-1 for the represented object

  entrycrc (32-bits): crc32 checksum for the file entry. The crc code
    includes the offset to the offset to the file, relative to the
    beginning of the file.

== Conflict data

  A conflict is represented in the index as a set of higher stage entries.
  These entries are stored at the end of the index. When a conflict is
  resolved (e.g. with "git add path"). A bit is flipped, to indicate that
  the conflict is resolved, but the entries will be kept, so that
  conflicts can be recreated (e.g. with "git checkout -m", in case users
  want to redo a conflict resolution from scratch.

  The first part of a conflict (usually stage 1) will be stored both in
  the entries part of the index and in the conflict part. All other parts
  will only be stored in the conflict part.

  filename (variable length, nul terminated): filename of the entry,
    relative to its containing directory).

  nfileconflicts (32-bits): number of conflicts for the file [8]

  flags (nfileconflicts entries of "flags") (16-bits): 'flags' field
    split into:

    conflicted (1-bit): conflicted state (conflicted/resolved) (1 if
      conflicted)

    stage (2-bits): stage during merge.

    13-bit unused

  entry_mode (nfileconflicts entries of "entry mode") (16-bits):
    octal numbers, entry mode of eache entry in the different stages.
    (How many is defined by the 4-byte number before)

  objectnames (nfileconflicts entries of "object name") (160-bits):
    object names  of the different stages.

  conflictcrc (32-bits): crc32 checksum over conflict data.

== Design explanations

[1] The directory and file offsets are included in the index format
    to enable bisectability of the index, for binary searches.Updating
    a single entry and partial reading will benefit from this.

[2] The directories are saved in their own block, to be able to
    quickly search for a directory in the index. They include a
    offset to the (lexically) first file in the directory.

[3] The data of the cache-tree extension and the resolve undo
    extension is now part of the index itself, but if other extensions
    come up in the future, there is no need to change the index, they
    can simply be added at the end.

[4] To avoid rewrites of the whole index when there are conflicts or
    conflicts are being resolved, conflicted data will be stored at
    the end of the index. To mark the conflict resolved, just a bit
    has to be flipped. The data will still be there, if a user wants
    to redo the conflict resolution.

[5] Since only 4 modes are effectively allowed in git but 32-bit are
    used to store them, having a two bit flag for the mode is enough
    and saves 4 byte per entry.

[6] The length of the file name was dropped, since each file name is
    nul terminated anyway.

[7] Since all stat data (except mtime and ctime) is just used for
    checking if a file has changed a checksum of the data is enough.
    In addition to that Thomas Rast suggested ctime could be ditched
    completely (core.trustctime=false) and thus included in the
    checksum. This would save 24 bytes per index entry, which would
    be about 4 MB on the Webkit index.
    (Thanks for the suggestion to Michael Haggerty)

[8] Since there can be more stage #1 entries, it is necessary to know
    the number of conflict data entries there are.

[9] As Michael Haggerty pointed out on the mailing list, storing the
    trailing slash will simplify a few operations.
debug log:

solving 4213087 ...
found 4213087 in https://public-inbox.org/git/1373184720-29767-14-git-send-email-t.gummerer@gmail.com/ ||
	https://public-inbox.org/git/1373650024-3001-11-git-send-email-t.gummerer@gmail.com/

applying [1/2] https://public-inbox.org/git/1373184720-29767-14-git-send-email-t.gummerer@gmail.com/
diff --git a/Documentation/technical/index-file-format-v5.txt b/Documentation/technical/index-file-format-v5.txt
new file mode 100644
index 0000000..4213087

Checking patch Documentation/technical/index-file-format-v5.txt...
Applied patch Documentation/technical/index-file-format-v5.txt cleanly.

skipping https://public-inbox.org/git/1373650024-3001-11-git-send-email-t.gummerer@gmail.com/ for 4213087
index at:
100644 4213087075c8dcafa1f73aaa912c005b3394f8cd	Documentation/technical/index-file-format-v5.txt

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git