* Exact format of tree objets
@ 2013-06-11 16:25 Chico Sokol
2013-06-11 18:26 ` Ilari Liusvaara
2013-06-11 18:38 ` Junio C Hamano
0 siblings, 2 replies; 7+ messages in thread
From: Chico Sokol @ 2013-06-11 16:25 UTC (permalink / raw)
To: git
Is there any official documentation of tree objets format? Are tree
objects encoded specially in some way? How can I parse the inflated
contents of a tree object?
We're suspecting that there is some kind of special format or
encoding, because the command "git cat-file -p <sha>" show me the
expected output, something like:
100644 blob 2beae51a0e14b3167fd7e81119972caef95779f4 .gitignore
100644 blob 7c817960e954f0278a6eee8d58611f61445167e8 LICENSE.txt
100644 blob 30e849cba985d74bfd29696f6dee5a40abaacb03 README
...
While "git cat-file tree <sha>" generate an strange output, which
indicate some kink of encoding problem. Something like:
100644 .gitignore+��▒����,��Wy�100644
LICENSE.txt|�y`�T�'�n��XaaDQg�100644 README0�I˩��K�)
Thanks,
--
Chico Sokol
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Exact format of tree objets
2013-06-11 16:25 Exact format of tree objets Chico Sokol
@ 2013-06-11 18:26 ` Ilari Liusvaara
2013-06-18 15:15 ` Chico Sokol
2013-06-11 18:38 ` Junio C Hamano
1 sibling, 1 reply; 7+ messages in thread
From: Ilari Liusvaara @ 2013-06-11 18:26 UTC (permalink / raw)
To: Chico Sokol; +Cc: git
On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote:
> Is there any official documentation of tree objets format? Are tree
> objects encoded specially in some way? How can I parse the inflated
> contents of a tree object?
Tree object consists of entries, each concatenation of:
- Octal mode (using ASCII digits 0-7).
- Single SPACE (0x20)
- Filename
- Single NUL (0x00)
- 20-byte binary SHA-1 of referenced object.
At least following octal modes are known:
40000: Directory (tree).
100644: Regular file (blob).
100755: Executable file (blob).
120000: Symbolic link (blob).
160000: Submodule (commit).
The entries are always sorted in (bytewise) lexicographical order,
except directories sort like there was impiled '/' at the end.
So e.g.:
! < 0 < 9 < a < a- < a- (directory) < a (directory) < a0 < ab < b < z.
The idea of sorting directories specially is that if one recurses
upon hitting a directory and uses '/' as path separator, then the
full filenames are in bytewise lexicographical order.
-Ilari
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Exact format of tree objets
2013-06-11 18:26 ` Ilari Liusvaara
@ 2013-06-18 15:15 ` Chico Sokol
2013-06-18 17:47 ` Thomas Rast
0 siblings, 1 reply; 7+ messages in thread
From: Chico Sokol @ 2013-06-18 15:15 UTC (permalink / raw)
To: Ilari Liusvaara; +Cc: git
What is the encoding of the filename?
--
Chico Sokol
On Tue, Jun 11, 2013 at 3:26 PM, Ilari Liusvaara
<ilari.liusvaara@elisanet.fi> wrote:
> On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote:
>> Is there any official documentation of tree objets format? Are tree
>> objects encoded specially in some way? How can I parse the inflated
>> contents of a tree object?
>
> Tree object consists of entries, each concatenation of:
> - Octal mode (using ASCII digits 0-7).
> - Single SPACE (0x20)
> - Filename
> - Single NUL (0x00)
> - 20-byte binary SHA-1 of referenced object.
>
> At least following octal modes are known:
> 40000: Directory (tree).
> 100644: Regular file (blob).
> 100755: Executable file (blob).
> 120000: Symbolic link (blob).
> 160000: Submodule (commit).
>
> The entries are always sorted in (bytewise) lexicographical order,
> except directories sort like there was impiled '/' at the end.
>
> So e.g.:
> ! < 0 < 9 < a < a- < a- (directory) < a (directory) < a0 < ab < b < z.
>
>
> The idea of sorting directories specially is that if one recurses
> upon hitting a directory and uses '/' as path separator, then the
> full filenames are in bytewise lexicographical order.
>
> -Ilari
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Exact format of tree objets
2013-06-18 15:15 ` Chico Sokol
@ 2013-06-18 17:47 ` Thomas Rast
0 siblings, 0 replies; 7+ messages in thread
From: Thomas Rast @ 2013-06-18 17:47 UTC (permalink / raw)
To: Chico Sokol; +Cc: Ilari Liusvaara, git
Chico Sokol <chico.sokol@gmail.com> writes:
> What is the encoding of the filename?
Git just considers filename a bunch of bytes that form a posix filename
(i.e., may not contain '/' and '\0'). So depending on your point of
view, it's either "no encoding" or "whatever you put into it".
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Exact format of tree objets
2013-06-11 16:25 Exact format of tree objets Chico Sokol
2013-06-11 18:26 ` Ilari Liusvaara
@ 2013-06-11 18:38 ` Junio C Hamano
2013-06-12 14:06 ` Jakub Narebski
1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-06-11 18:38 UTC (permalink / raw)
To: Chico Sokol; +Cc: git
Chico Sokol <chico.sokol@gmail.com> writes:
> Is there any official documentation of tree objets format? Are tree
> objects encoded specially in some way? How can I parse the inflated
> contents of a tree object?
>
> We're suspecting that there is some kind of special format or
> encoding, because the command "git cat-file -p <sha>" show me ...
> While "git cat-file tree <sha>" generate ...
"cat-file -p" is meant to be human-readable form. The latter gives
the exact byte contents read_sha1_file() sees, which is a binary
format. Essentially, it is a sequence of:
- mode of the entry encoded in octal, without any leading '0' pad;
- pathname component of the entry, terminated with NUL;
- 20-byte SHA-1 object name.
sorted in a particular order.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Exact format of tree objets
2013-06-11 18:38 ` Junio C Hamano
@ 2013-06-12 14:06 ` Jakub Narebski
2013-06-18 13:53 ` Chico Sokol
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Narebski @ 2013-06-12 14:06 UTC (permalink / raw)
To: git
Junio C Hamano <gitster <at> pobox.com> writes:
> Chico Sokol <chico.sokol <at> gmail.com> writes:
>
> > Is there any official documentation of tree objets format? Are tree
> > objects encoded specially in some way? How can I parse the inflated
> > contents of a tree object?
> >
> > We're suspecting that there is some kind of special format or
> > encoding, because the command "git cat-file -p <sha>" show me ...
> > While "git cat-file tree <sha>" generate ...
>
> "cat-file -p" is meant to be human-readable form. The latter gives
> the exact byte contents read_sha1_file() sees, which is a binary
> format. Essentially, it is a sequence of:
>
> - mode of the entry encoded in octal, without any leading '0' pad;
> - pathname component of the entry, terminated with NUL;
> - 20-byte SHA-1 object name.
I always wondered why this is the sole object format where SHA-1 is in 20-
byte binary format and not 40-chars hexadecimal string format...
--
Jakub Narębski
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Exact format of tree objets
2013-06-12 14:06 ` Jakub Narebski
@ 2013-06-18 13:53 ` Chico Sokol
0 siblings, 0 replies; 7+ messages in thread
From: Chico Sokol @ 2013-06-18 13:53 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Thanks!
By the way, where can I find this kind of specification? I couldn't
find the spec of tree objects here:
https://github.com/git/git/tree/master/Documentation
--
Chico Sokol
On Wed, Jun 12, 2013 at 11:06 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> Junio C Hamano <gitster <at> pobox.com> writes:
>> Chico Sokol <chico.sokol <at> gmail.com> writes:
>>
>> > Is there any official documentation of tree objets format? Are tree
>> > objects encoded specially in some way? How can I parse the inflated
>> > contents of a tree object?
>> >
>> > We're suspecting that there is some kind of special format or
>> > encoding, because the command "git cat-file -p <sha>" show me ...
>> > While "git cat-file tree <sha>" generate ...
>>
>> "cat-file -p" is meant to be human-readable form. The latter gives
>> the exact byte contents read_sha1_file() sees, which is a binary
>> format. Essentially, it is a sequence of:
>>
>> - mode of the entry encoded in octal, without any leading '0' pad;
>> - pathname component of the entry, terminated with NUL;
>> - 20-byte SHA-1 object name.
>
> I always wondered why this is the sole object format where SHA-1 is in 20-
> byte binary format and not 40-chars hexadecimal string format...
>
> --
> Jakub Narębski
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-06-18 17:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-11 16:25 Exact format of tree objets Chico Sokol
2013-06-11 18:26 ` Ilari Liusvaara
2013-06-18 15:15 ` Chico Sokol
2013-06-18 17:47 ` Thomas Rast
2013-06-11 18:38 ` Junio C Hamano
2013-06-12 14:06 ` Jakub Narebski
2013-06-18 13:53 ` Chico Sokol
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).