git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] pack-format: document missing fields
@ 2020-04-09 19:11 Alba Mendez via GitGitGadget
  2020-04-09 21:31 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Alba Mendez via GitGitGadget @ 2020-04-09 19:11 UTC (permalink / raw)
  To: git; +Cc: Alba Mendez, Alba Mendez

From: Alba Mendez <me@alba.sh>

Document missing fields in the description of the delta data, added at
011b648 (pack-format.txt: more details on pack file format, 2018-05-11).

Also, the description of object entries at the main section is a bit
vague. There's an equivalent but more detailed description in the index
section; since they are redundant, put that one in place of the first.

I have also expanded tabs to prevent alignment issues and rephrased a
bit to make things clearer.

Signed-off-by: Alba Mendez <me@alba.sh>
---
    pack-format: document missing fields
    
    Hello! These are some fixes & improvements for the pack-format
    documentation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-605%2Fmildsunrise%2Fpatch-1-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-605/mildsunrise/patch-1-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/605

 Documentation/technical/pack-format.txt | 75 ++++++++++---------------
 1 file changed, 31 insertions(+), 44 deletions(-)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index d3a142c6520..b78e03137f4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -9,7 +9,7 @@ Git pack format
          The signature is: {'P', 'A', 'C', 'K'}
 
      4-byte version number (network byte order):
-	 Git currently accepts version number 2 or 3 but
+         Git currently accepts version number 2 or 3 but
          generates version 2 only.
 
      4-byte number of objects contained in the pack (network byte order)
@@ -20,19 +20,34 @@ Git pack format
    - The header is followed by number of object entries, each of
      which looks like this:
 
-     (undeltified representation)
-     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
-     compressed data
-
-     (deltified representation)
-     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
-     20-byte base object name if OBJ_REF_DELTA or a negative relative
-	 offset from the delta object's position in the pack if this
-	 is an OBJ_OFS_DELTA object
-     compressed delta data
+     packed object header:
+        1-byte size extension bit (MSB)
+            type (next 3-bit)
+            size0 (lower 4-bit)
+        n-byte sizeN (as long as MSB is set, each 7-bit)
+            size0..sizeN form 4+7+7+..+7 bit integer, size0
+            is the least significant part, and sizeN is the
+            most significant part.
+     packed object data:
+        If it is not DELTA, then deflated bytes (the
+            size above is the size before compression).
+        If it is REF_DELTA, then
+            20-byte base object name SHA-1.
+            deflated delta data (the size above is the
+                size of this data before compression).
+        If it is OFS_DELTA, then
+            n-byte offset (see below) interpreted as a
+                negative offset from the type-byte of
+                the header of the ofs-delta entry.
+            deflated delta data (the size above is the
+                size of this data before compression).
 
-     Observation: length of each object is encoded in a variable
-     length format and is not constrained to 32-bit or anything.
+     offset encoding:
+        n bytes with MSB set in all but the last one.
+        The offset is then the number constructed by
+        concatenating the lower 7 bit of each byte, and
+        for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
+        to the result.
 
   - The trailer records 20-byte SHA-1 checksum of all of the above.
 
@@ -67,7 +82,9 @@ Ref-delta can also refer to an object outside the pack (i.e. the
 so-called "thin pack"). When stored on disk however, the pack should
 be self contained to avoid cyclic dependency.
 
-The delta data is a sequence of instructions to reconstruct an object
+The deflated delta data begins with two n-byte sizes: the size of
+the base object data, and the size of the reconstructed object data.
+What follows is a sequence of instructions to reconstruct the object
 from the base object. If the base object is deltified, it must be
 converted to canonical form first. Each instruction appends more and
 more data to the target object until it's complete. There are two
@@ -186,36 +203,6 @@ trailer	  | | packfile checksum              |
                   |
 Pack file entry: <+
 
-     packed object header:
-	1-byte size extension bit (MSB)
-	       type (next 3 bit)
-	       size0 (lower 4-bit)
-        n-byte sizeN (as long as MSB is set, each 7-bit)
-		size0..sizeN form 4+7+7+..+7 bit integer, size0
-		is the least significant part, and sizeN is the
-		most significant part.
-     packed object data:
-        If it is not DELTA, then deflated bytes (the size above
-		is the size before compression).
-	If it is REF_DELTA, then
-	  20-byte base object name SHA-1 (the size above is the
-		size of the delta data that follows).
-          delta data, deflated.
-	If it is OFS_DELTA, then
-	  n-byte offset (see below) interpreted as a negative
-		offset from the type-byte of the header of the
-		ofs-delta entry (the size above is the size of
-		the delta data that follows).
-	  delta data, deflated.
-
-     offset encoding:
-	  n bytes with MSB set in all but the last one.
-	  The offset is then the number constructed by
-	  concatenating the lower 7 bit of each byte, and
-	  for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
-	  to the result.
-
-
 
 == Version 2 pack-*.idx files support packs larger than 4 GiB, and
    have some other reorganizations.  They have the format:

base-commit: 9fadedd637b312089337d73c3ed8447e9f0aa775
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] pack-format: document missing fields
  2020-04-09 19:11 [PATCH] pack-format: document missing fields Alba Mendez via GitGitGadget
@ 2020-04-09 21:31 ` Junio C Hamano
  2020-04-10  1:19   ` Danh Doan
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2020-04-09 21:31 UTC (permalink / raw)
  To: Alba Mendez via GitGitGadget; +Cc: git, Alba Mendez

"Alba Mendez via GitGitGadget" <gitgitgadget@gmail.com> writes:

>       4-byte version number (network byte order):
> -	 Git currently accepts version number 2 or 3 but
> +         Git currently accepts version number 2 or 3 but
>           generates version 2 only.

Hmph, what is this hunk about?

> -     (undeltified representation)
> -     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
> -     compressed data
> -
> -     (deltified representation)
> -     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
> -     20-byte base object name if OBJ_REF_DELTA or a negative relative
> -	 offset from the delta object's position in the pack if this
> -	 is an OBJ_OFS_DELTA object
> -     compressed delta data


> +     packed object header:
> +        1-byte size extension bit (MSB)
> +            type (next 3-bit)
> +            size0 (lower 4-bit)
> +        n-byte sizeN (as long as MSB is set, each 7-bit)
> +            size0..sizeN form 4+7+7+..+7 bit integer, size0
> +            is the least significant part, and sizeN is the
> +            most significant part.
> +     packed object data:
> +        If it is not DELTA, then deflated bytes (the
> +            size above is the size before compression).

Correct.

> +        If it is REF_DELTA, then
> +            20-byte base object name SHA-1.
> +            deflated delta data (the size above is the
> +                size of this data before compression).

Correct.

> +        If it is OFS_DELTA, then
> +            n-byte offset (see below) interpreted as a
> +                negative offset from the type-byte of
> +                the header of the ofs-delta entry.

Correct, and "see below" here is very important.

> +            deflated delta data (the size above is the
> +                size of this data before compression).

> +     offset encoding:
> +        n bytes with MSB set in all but the last one.
> +        The offset is then the number constructed by
> +        concatenating the lower 7 bit of each byte, and
> +        for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
> +        to the result.

Correct.  Shouldn't we have a similar explanation for the object
length varint above (in other words, how 4+7+7+..+7 bits are
combined to form an integer)?

> @@ -67,7 +82,9 @@ Ref-delta can also refer to an object outside the pack (i.e. the
>  so-called "thin pack"). When stored on disk however, the pack should
>  be self contained to avoid cyclic dependency.
>  
> -The delta data is a sequence of instructions to reconstruct an object
> +The deflated delta data begins with two n-byte sizes: the size of
> +the base object data, and the size of the reconstructed object data.

Similarly, not just "n-byte size", but clarify that this is a pair
of "varint" and specify which kind of "varint" it is (IIRC, this
encoding did not use the "save-one-bit" trick the offset encoding
uses).

> @@ -186,36 +203,6 @@ trailer	  | | packfile checksum              |
>                    |
>  Pack file entry: <+
>  
> -     packed object header:
> -	1-byte size extension bit (MSB)
> -	       type (next 3 bit)
> -	       size0 (lower 4-bit)
> -        n-byte sizeN (as long as MSB is set, each 7-bit)
> -		size0..sizeN form 4+7+7+..+7 bit integer, size0
> -		is the least significant part, and sizeN is the
> -		most significant part.
> -     packed object data:
> -        If it is not DELTA, then deflated bytes (the size above
> -		is the size before compression).
> -	If it is REF_DELTA, then
> -	  20-byte base object name SHA-1 (the size above is the
> -		size of the delta data that follows).
> -          delta data, deflated.
> -	If it is OFS_DELTA, then
> -	  n-byte offset (see below) interpreted as a negative
> -		offset from the type-byte of the header of the
> -		ofs-delta entry (the size above is the size of
> -		the delta data that follows).
> -	  delta data, deflated.
> -
> -     offset encoding:
> -	  n bytes with MSB set in all but the last one.
> -	  The offset is then the number constructed by
> -	  concatenating the lower 7 bit of each byte, and
> -	  for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
> -	  to the result.
> -
> -

Hmph, I kept saying "Correct" above, but it seems that the
correctness come from the original.  So, why is this patch so big?

As far as I can tell, the only thing it improved (content-wise) was
to say that the delta data has two varints that lets reader to
validate the length of the original contents that the delta applies
to, and to allocate the buffer to hold the result before applying
the delta.  Puzzled.

Thanks.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] pack-format: document missing fields
  2020-04-09 21:31 ` Junio C Hamano
@ 2020-04-10  1:19   ` Danh Doan
  0 siblings, 0 replies; 3+ messages in thread
From: Danh Doan @ 2020-04-10  1:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alba Mendez via GitGitGadget, git, Alba Mendez

On 2020-04-09 14:31:32-0700, Junio C Hamano <gitster@pobox.com> wrote:
> "Alba Mendez via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> >       4-byte version number (network byte order):
> > -	 Git currently accepts version number 2 or 3 but
> > +         Git currently accepts version number 2 or 3 but
> >           generates version 2 only.
> 
> Hmph, what is this hunk about?

The original line is "<TAB><SPACE>Git", she changed to 9 <SPACE>
to match with other lines.
I guess she set tabstop=4, this paragraph will be mis-aligned on that.

> > -     (undeltified representation)
> > -     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
> > -     compressed data
> > -
> > -     (deltified representation)
> > -     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
> > -     20-byte base object name if OBJ_REF_DELTA or a negative relative
> > -	 offset from the delta object's position in the pack if this
> > -	 is an OBJ_OFS_DELTA object

Alba, if you gonna change the above <TAB> to 8 <SPACE> above, you may
want to change here, too.
This paragraph was written with tabstop=4 in mind.
And it's in the context of below hunk.

I'm not sure if changing <TAB> to 8 <SPACE> was expected in this
file because <TAB> and <SPACE> is also mixed in "Pack file entry"
section.

-- 
Danh

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-10  1:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-09 19:11 [PATCH] pack-format: document missing fields Alba Mendez via GitGitGadget
2020-04-09 21:31 ` Junio C Hamano
2020-04-10  1:19   ` Danh Doan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).