git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Exact format of tree objets
@ 2013-06-11 16:25 Chico Sokol
  2013-06-11 18:26 ` Ilari Liusvaara
  2013-06-11 18:38 ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Chico Sokol @ 2013-06-11 16:25 UTC (permalink / raw)
  To: git

Is there any official documentation of tree objets format? Are tree
objects encoded specially in some way? How can I parse the inflated
contents of a tree object?

We're suspecting that there is some kind of special format or
encoding, because the command "git cat-file -p <sha>" show me the
expected output, something like:

100644 blob 2beae51a0e14b3167fd7e81119972caef95779f4    .gitignore
100644 blob 7c817960e954f0278a6eee8d58611f61445167e8    LICENSE.txt
100644 blob 30e849cba985d74bfd29696f6dee5a40abaacb03    README
...


While "git cat-file tree <sha>" generate an strange output, which
indicate some kink of encoding problem. Something like:

100644 .gitignore+��▒����,��Wy�100644
LICENSE.txt|�y`�T�'�n��XaaDQg�100644 README0�I˩��K�)


Thanks,







--
Chico Sokol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Exact format of tree objets
  2013-06-11 16:25 Exact format of tree objets Chico Sokol
@ 2013-06-11 18:26 ` Ilari Liusvaara
  2013-06-18 15:15   ` Chico Sokol
  2013-06-11 18:38 ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Ilari Liusvaara @ 2013-06-11 18:26 UTC (permalink / raw)
  To: Chico Sokol; +Cc: git

On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote:
> Is there any official documentation of tree objets format? Are tree
> objects encoded specially in some way? How can I parse the inflated
> contents of a tree object?

Tree object consists of entries, each concatenation of:
- Octal mode (using ASCII digits 0-7).
- Single SPACE (0x20)
- Filename
- Single NUL (0x00)
- 20-byte binary SHA-1 of referenced object.

At least following octal modes are known:
40000: Directory (tree).
100644: Regular file (blob).
100755: Executable file (blob).
120000: Symbolic link (blob).
160000: Submodule (commit).

The entries are always sorted in (bytewise) lexicographical order,
except directories sort like there was impiled '/' at the end.

So e.g.:
! < 0 < 9 < a < a- < a- (directory) < a (directory) < a0 < ab < b < z.


The idea of sorting directories specially is that if one recurses
upon hitting a directory and uses '/' as path separator, then the
full filenames are in bytewise lexicographical order.

-Ilari

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Exact format of tree objets
  2013-06-11 16:25 Exact format of tree objets Chico Sokol
  2013-06-11 18:26 ` Ilari Liusvaara
@ 2013-06-11 18:38 ` Junio C Hamano
  2013-06-12 14:06   ` Jakub Narebski
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-06-11 18:38 UTC (permalink / raw)
  To: Chico Sokol; +Cc: git

Chico Sokol <chico.sokol@gmail.com> writes:

> Is there any official documentation of tree objets format? Are tree
> objects encoded specially in some way? How can I parse the inflated
> contents of a tree object?
>
> We're suspecting that there is some kind of special format or
> encoding, because the command "git cat-file -p <sha>" show me ...
> While "git cat-file tree <sha>" generate ...

"cat-file -p" is meant to be human-readable form.  The latter gives
the exact byte contents read_sha1_file() sees, which is a binary
format.  Essentially, it is a sequence of:

 - mode of the entry encoded in octal, without any leading '0' pad;
 - pathname component of the entry, terminated with NUL;
 - 20-byte SHA-1 object name.

sorted in a particular order.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Exact format of tree objets
  2013-06-11 18:38 ` Junio C Hamano
@ 2013-06-12 14:06   ` Jakub Narebski
  2013-06-18 13:53     ` Chico Sokol
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Narebski @ 2013-06-12 14:06 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster <at> pobox.com> writes:
> Chico Sokol <chico.sokol <at> gmail.com> writes:
> 
> > Is there any official documentation of tree objets format? Are tree
> > objects encoded specially in some way? How can I parse the inflated
> > contents of a tree object?
> >
> > We're suspecting that there is some kind of special format or
> > encoding, because the command "git cat-file -p <sha>" show me ...
> > While "git cat-file tree <sha>" generate ...
> 
> "cat-file -p" is meant to be human-readable form.  The latter gives
> the exact byte contents read_sha1_file() sees, which is a binary
> format.  Essentially, it is a sequence of:
> 
>  - mode of the entry encoded in octal, without any leading '0' pad;
>  - pathname component of the entry, terminated with NUL;
>  - 20-byte SHA-1 object name.

I always wondered why this is the sole object format where SHA-1 is in 20-
byte binary format and not 40-chars hexadecimal string format...

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Exact format of tree objets
  2013-06-12 14:06   ` Jakub Narebski
@ 2013-06-18 13:53     ` Chico Sokol
  0 siblings, 0 replies; 7+ messages in thread
From: Chico Sokol @ 2013-06-18 13:53 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Thanks!

By the way, where can I find this kind of specification? I couldn't
find the spec of tree objects here:
https://github.com/git/git/tree/master/Documentation


--
Chico Sokol


On Wed, Jun 12, 2013 at 11:06 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> Junio C Hamano <gitster <at> pobox.com> writes:
>> Chico Sokol <chico.sokol <at> gmail.com> writes:
>>
>> > Is there any official documentation of tree objets format? Are tree
>> > objects encoded specially in some way? How can I parse the inflated
>> > contents of a tree object?
>> >
>> > We're suspecting that there is some kind of special format or
>> > encoding, because the command "git cat-file -p <sha>" show me ...
>> > While "git cat-file tree <sha>" generate ...
>>
>> "cat-file -p" is meant to be human-readable form.  The latter gives
>> the exact byte contents read_sha1_file() sees, which is a binary
>> format.  Essentially, it is a sequence of:
>>
>>  - mode of the entry encoded in octal, without any leading '0' pad;
>>  - pathname component of the entry, terminated with NUL;
>>  - 20-byte SHA-1 object name.
>
> I always wondered why this is the sole object format where SHA-1 is in 20-
> byte binary format and not 40-chars hexadecimal string format...
>
> --
> Jakub Narębski
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Exact format of tree objets
  2013-06-11 18:26 ` Ilari Liusvaara
@ 2013-06-18 15:15   ` Chico Sokol
  2013-06-18 17:47     ` Thomas Rast
  0 siblings, 1 reply; 7+ messages in thread
From: Chico Sokol @ 2013-06-18 15:15 UTC (permalink / raw)
  To: Ilari Liusvaara; +Cc: git

What is the encoding of the filename?


--
Chico Sokol


On Tue, Jun 11, 2013 at 3:26 PM, Ilari Liusvaara
<ilari.liusvaara@elisanet.fi> wrote:
> On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote:
>> Is there any official documentation of tree objets format? Are tree
>> objects encoded specially in some way? How can I parse the inflated
>> contents of a tree object?
>
> Tree object consists of entries, each concatenation of:
> - Octal mode (using ASCII digits 0-7).
> - Single SPACE (0x20)
> - Filename
> - Single NUL (0x00)
> - 20-byte binary SHA-1 of referenced object.
>
> At least following octal modes are known:
> 40000: Directory (tree).
> 100644: Regular file (blob).
> 100755: Executable file (blob).
> 120000: Symbolic link (blob).
> 160000: Submodule (commit).
>
> The entries are always sorted in (bytewise) lexicographical order,
> except directories sort like there was impiled '/' at the end.
>
> So e.g.:
> ! < 0 < 9 < a < a- < a- (directory) < a (directory) < a0 < ab < b < z.
>
>
> The idea of sorting directories specially is that if one recurses
> upon hitting a directory and uses '/' as path separator, then the
> full filenames are in bytewise lexicographical order.
>
> -Ilari

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Exact format of tree objets
  2013-06-18 15:15   ` Chico Sokol
@ 2013-06-18 17:47     ` Thomas Rast
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Rast @ 2013-06-18 17:47 UTC (permalink / raw)
  To: Chico Sokol; +Cc: Ilari Liusvaara, git

Chico Sokol <chico.sokol@gmail.com> writes:

> What is the encoding of the filename?

Git just considers filename a bunch of bytes that form a posix filename
(i.e., may not contain '/' and '\0').  So depending on your point of
view, it's either "no encoding" or "whatever you put into it".

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-06-18 17:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-11 16:25 Exact format of tree objets Chico Sokol
2013-06-11 18:26 ` Ilari Liusvaara
2013-06-18 15:15   ` Chico Sokol
2013-06-18 17:47     ` Thomas Rast
2013-06-11 18:38 ` Junio C Hamano
2013-06-12 14:06   ` Jakub Narebski
2013-06-18 13:53     ` Chico Sokol

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).