git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git <git@vger.kernel.org>, "Junio C Hamano" <gitster@pobox.com>,
	"Jeff King" <peff@peff.net>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>
Subject: Re: [PATCH] pack-format.txt: more details on pack file format
Date: Tue, 8 May 2018 20:22:09 +0200	[thread overview]
Message-ID: <CACsJy8A10qixJ7YtJKJejp1t49aBZFDn7CjSTiaq8GeVHuxiOQ@mail.gmail.com> (raw)
In-Reply-To: <CAGZ79kZiwX-QFnkTfRHby38GYBDwj-0Dyv3_PWPXtnWr+112CA@mail.gmail.com>

On Tue, May 8, 2018 at 7:23 PM, Stefan Beller <sbeller@google.com> wrote:
>>  While at there, I also add some text about this obscure delta format.
>>  We occasionally have questions about this on the mailing list if I
>>  remember correctly.
>
> Let me see if I can understand it, as I am not well versed in the
> delta format, so ideally I would understand it from the patch here?

Well yes. I don't expect my first version to be that easy to
understand. This is where you come in to help ;-)

>> +Valid object types are:
>> +
>> +- OBJ_COMMIT (1)
>> +- OBJ_TREE (2)
>> +- OBJ_BLOB (3)
>> +- OBJ_TAG (4)
>> +- OBJ_OFS_DELTA (6)
>> +- OBJ_REF_DELTA (7)
>> +
>> +Type 5 is reserved for future expansion.
>
> and type 0 as well, but that is not spelled out?

type 0 is invalid. I think in some encoding it's not even possible to
encode zero. Anyway yes it should be spelled out.

>
>> +Deltified representation
>
> Does this refer to OFS delta as well as REF deltas?

Yes. Both OFS and REF deltas have the same "body" which is what this
part is about. The differences between OFS and REF deltas are not
described (in fact I don't think we describe what OFS and REF deltas
are at all).

>> is a sequence of one byte command optionally
>> +followed by more data for the command. The following commands are
>> +recognized:
>
> So a Deltified representation of an object is a 6 or 7 in the 3 bit type
> and then the length. Then a command is shown how to construct
> the object based on other objects. Can there be more commands?
>
>> +- If bit 7 is set, the remaining bits in the command byte specifies
>> +  how to extract copy offset and size to copy. The following must be
>> +  evaluated in this exact order:
>
> So there are 2 modes, and the high bit indicates which mode is used.
> You start describing the more complicated mode first,
> maybe give names to both of them? "direct copy" (below) and
> "compressed copy with offset" ?

I started to update this more because even this text is hard to get
even to me. So let's get the background first.

We have a source object somewhere (the object name comes from ofs/ref
delta's header), basically we have the whole content. This delta
thingy tells us how to use that source object to create a new (target)
object.

The delta is actually a sequence of instructions (of variable length).
One is for copying from the source object. The other copies from the
delta itself (e.g. this is new data in the target which is not
available anywhere in the source object to copy from). The last bit of
the first byte determines what instruction type it is.


>> +  - If bit 0 is set, the following byte contains bits 0-7 of the copy
>> +    offset (this also resets all other bits in the copy offset to
>> +    zero).
>> +  - If bit 1 is set, the following byte contains bits 8-15 of the copy
>> +    offset.
>> +  - If bit 2 is set, the following byte contains bits 16-23 of the
>> +    copy offset.
>> +  - If bit 3 is set, the following byte contains bits 24-31 of the
>> +    copy offset.
>
> I assume these bits are exclusive, i.e. if bit 3 is set, bits 0-2 are not
> allowed to be set. What happens if they are set, do we care?
>
> If bit 3 is set, then the following byte contains 24-31 of the copy offset,
> where is the rest? Do I wait for another command byte with
> bits 2,1,0 to learn about the body offsets, or do they follow the
> following byte? Something like:
>
>   "If bit 3 is set, then the next 4 bytes are the copy offset,
>   in network byte order"

My first attempt at "translating to English" is like a constructing C
from assembly: it's horrible.

The instruction looks like this

        bit      0        1        2        3       4      5      6
  +----------+--------+--------+--------+--------+------+------+------+
  | 1xxxxxxx | offset | offset | offset | offset | size | size | size |
  +----------+--------+--------+--------+--------+------+------+------+

Here you can see it in its full form, each box represents a byte. The
first byte has bit 7 set as mentioned. We can see here that offsets
(where to copy from in the source object) takes 4 bytes and size (how
many bytes to copy) takes 3. Offset size size is in LSB order.

The "xxxxxxx" part lets us shrink this down. If the offset can fit in
16 bits, there's no reason to waste the last two bytes describing
zero. Each 'x' marks whether the corresponding byte is present. The
bit number is in the first row. So if you have offset 255 and size 1,
the instruction is three bytes 10010001b, 255, 1. The octets on "bit
column" 1, 2, 3, 5 and 6 are missing because the corresponding bit in
the first bit is not set.

>> +  - If bit 4 is set, the following byte contains bits 0-7 of the copy
>> +    size (this also resets all other bits in the copy size to zero_.
>> +  - If bit 5 is set, the following byte contains bits 8-15 of the copy
>> +    size.
>> +  - If bit 6 is set, the following byte contains bits 16-23 of the
>> +    copy size.
>
> bits 4-7 seem to be another group of mutually exclusive bits.
> The same question as above:
> If bit 6 is set, where are bits 0-15 of the copy size?

I think this is a corner case in this format. I think Nico meant to
specify consecutive bytes: if size is 2 bytes then you have to specify
_both_ of them even if the first byte could be zero and omitted.

The implementation detail is, if bit 6 is set but bit 4 is not, then
the size value is pretty much random. It's only when bit 4 is set that
we first clear out "size" and start adding bits to it.

>
>> +
>> +  Copy size zero means 0x10000 bytes.
>
> This is an interesting caveat. So we can only copy 1-0x10000 bytes,
> and cannot express to copy 0 bytes?

Yes. There's no point to copy nothing. And it saves space to not
specify "size" at all. I think this is meant to copy a very large part
from the source, so you just continue to copy a series of 0x10000
chunks.

>
>> The data from source object at
>> +  the given copy offset is copied back to the destination buffer.
>> +
>> +- If bit 7 is not set, it is the copy size in bytes. The following
>> +  bytes are copied to destination buffer
>> +- Command byte zero is reserved for future expansion.
>
> Thanks,
> Stefan



-- 
Duy

  reply	other threads:[~2018-05-08 18:22 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-30 22:07 [PATCH 0/9] get_short_oid UI improvements Ævar Arnfjörð Bjarmason
2018-04-30 22:07 ` [PATCH 1/9] sha1-name.c: remove stray newline Ævar Arnfjörð Bjarmason
2018-04-30 22:07 ` [PATCH 2/9] sha1-array.h: align function arguments Ævar Arnfjörð Bjarmason
2018-04-30 22:07 ` [PATCH 3/9] sha1-name.c: move around the collect_ambiguous() function Ævar Arnfjörð Bjarmason
2018-04-30 22:07 ` [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1 Ævar Arnfjörð Bjarmason
2018-05-01 11:11   ` Derrick Stolee
2018-05-01 11:27     ` Ævar Arnfjörð Bjarmason
2018-05-01 12:26       ` Derrick Stolee
2018-05-01 12:36         ` Ævar Arnfjörð Bjarmason
2018-05-01 13:05           ` Derrick Stolee
2018-04-30 22:07 ` [PATCH 5/9] get_short_oid: learn to disambiguate by ^{tag} Ævar Arnfjörð Bjarmason
2018-04-30 22:07 ` [PATCH 6/9] get_short_oid: learn to disambiguate by ^{blob} Ævar Arnfjörð Bjarmason
2018-04-30 22:07 ` [PATCH 7/9] get_short_oid / peel_onion: ^{tree} should mean tree, not treeish Ævar Arnfjörð Bjarmason
2018-05-01  1:13   ` brian m. carlson
2018-04-30 22:07 ` [PATCH 8/9] get_short_oid / peel_onion: ^{tree} should mean commit, not commitish Ævar Arnfjörð Bjarmason
2018-04-30 23:22   ` Eric Sunshine
2018-04-30 22:07 ` [PATCH 9/9] config doc: document core.disambiguate Ævar Arnfjörð Bjarmason
2018-04-30 22:34 ` [PATCH 0/9] get_short_oid UI improvements Stefan Beller
2018-05-01  1:27 ` brian m. carlson
2018-05-01 11:16 ` Derrick Stolee
2018-05-01 12:06 ` [PATCH v2 00/12] " Ævar Arnfjörð Bjarmason
2018-05-01 13:03   ` [PATCH v2 06/11] get_short_oid: sort ambiguous objects by type, then SHA-1 Derrick Stolee
2018-05-01 13:39     ` Ævar Arnfjörð Bjarmason
2018-05-01 13:44       ` Derrick Stolee
2018-05-01 14:10         ` Ævar Arnfjörð Bjarmason
2018-05-01 14:15           ` Derrick Stolee
2018-05-01 18:40   ` [PATCH v3 00/12] get_short_oid UI improvements Ævar Arnfjörð Bjarmason
2018-05-02 12:42     ` Derrick Stolee
2018-05-02 13:45       ` Derrick Stolee
2018-05-03  6:43         ` Jacob Keller
2018-05-01 18:40   ` [PATCH v3 01/12] sha1-name.c: remove stray newline Ævar Arnfjörð Bjarmason
2018-05-01 18:40   ` [PATCH v3 02/12] sha1-array.h: align function arguments Ævar Arnfjörð Bjarmason
2018-05-01 18:40   ` [PATCH v3 03/12] git-p4: change "commitish" typo to "committish" Ævar Arnfjörð Bjarmason
2018-05-01 18:40   ` [PATCH v3 04/12] cache.h: add comment explaining the order in object_type Ævar Arnfjörð Bjarmason
2018-05-03  5:05     ` Junio C Hamano
2018-05-08 15:35     ` Duy Nguyen
2018-05-08 15:56       ` [PATCH] pack-format.txt: more details on pack file format Nguyễn Thái Ngọc Duy
2018-05-08 17:23         ` Stefan Beller
2018-05-08 18:22           ` Duy Nguyen [this message]
2018-05-08 18:58             ` Stefan Beller
2018-05-08 18:21         ` Ævar Arnfjörð Bjarmason
2018-05-08 18:24           ` Duy Nguyen
2018-05-10 15:09         ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-05-10 17:06           ` Stefan Beller
2018-05-11  6:41             ` Duy Nguyen
2018-05-11  3:54           ` Junio C Hamano
2018-05-11  6:55           ` [PATCH v3] " Nguyễn Thái Ngọc Duy
2018-05-01 18:40   ` [PATCH v3 05/12] sha1-name.c: move around the collect_ambiguous() function Ævar Arnfjörð Bjarmason
2018-05-01 18:40   ` [PATCH v3 06/12] get_short_oid: sort ambiguous objects by type, then SHA-1 Ævar Arnfjörð Bjarmason
2018-05-03  5:13     ` Junio C Hamano
2018-05-08 14:44     ` Jeff King
2018-05-01 18:40   ` [PATCH v3 07/12] get_short_oid: learn to disambiguate by ^{tag} Ævar Arnfjörð Bjarmason
2018-05-01 18:40   ` [PATCH v3 08/12] get_short_oid: learn to disambiguate by ^{blob} Ævar Arnfjörð Bjarmason
2018-05-01 18:40   ` [PATCH v3 09/12] get_short_oid / peel_onion: ^{tree} should be tree, not treeish Ævar Arnfjörð Bjarmason
2018-05-03  5:28     ` Junio C Hamano
2018-05-03  7:28       ` Ævar Arnfjörð Bjarmason
2018-05-04  2:19         ` Junio C Hamano
2018-05-04  8:42           ` Ævar Arnfjörð Bjarmason
2018-05-07  4:08             ` Junio C Hamano
2018-05-08 14:34               ` Jeff King
2018-05-08 18:53                 ` Ævar Arnfjörð Bjarmason
2018-05-09  7:56                   ` Jeff King
2018-05-09 10:48                     ` Ævar Arnfjörð Bjarmason
2018-05-10  4:21                       ` Junio C Hamano
2018-05-10  6:50                         ` Jeff King
2018-05-10 12:42     ` [PATCH v4 0/6] get_short_oid UI improvements Ævar Arnfjörð Bjarmason
2018-05-10 16:04       ` Jeff King
2018-05-10 12:42     ` [PATCH v4 1/6] sha1-name.c: remove stray newline Ævar Arnfjörð Bjarmason
2018-05-10 12:42     ` [PATCH v4 2/6] sha1-array.h: align function arguments Ævar Arnfjörð Bjarmason
2018-05-10 15:06       ` Jeff King
2018-05-11  3:07         ` Junio C Hamano
2018-05-11  3:09           ` Junio C Hamano
2018-05-10 12:43     ` [PATCH v4 3/6] git-p4: change "commitish" typo to "committish" Ævar Arnfjörð Bjarmason
2018-05-10 15:00       ` Luke Diamand
2018-05-10 12:43     ` [PATCH v4 4/6] sha1-name.c: move around the collect_ambiguous() function Ævar Arnfjörð Bjarmason
2018-05-10 12:43     ` [PATCH v4 5/6] get_short_oid: sort ambiguous objects by type, then SHA-1 Ævar Arnfjörð Bjarmason
2018-05-10 15:22       ` Jeff King
2018-05-11  5:36       ` Junio C Hamano
2018-05-10 12:43     ` [PATCH v4 6/6] get_short_oid: document & warn if we ignore the type selector Ævar Arnfjörð Bjarmason
2018-05-10 13:15       ` Martin Ågren
2018-05-10 16:03       ` Jeff King
2018-05-10 16:10         ` Jeff King
2018-05-10 16:15         ` Jeff King
2018-05-01 18:40   ` [PATCH v3 10/12] get_short_oid / peel_onion: ^{commit} should be commit, not committish Ævar Arnfjörð Bjarmason
2018-05-01 18:40   ` [PATCH v3 11/12] config doc: document core.disambiguate Ævar Arnfjörð Bjarmason
2018-05-08 14:41     ` Jeff King
2018-05-01 18:40   ` [PATCH v3 12/12] get_short_oid: document & warn if we ignore the type selector Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 01/12] sha1-name.c: remove stray newline Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 02/12] sha1-array.h: align function arguments Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 03/12] git-p4: change "commitish" typo to "committish" Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 04/12] cache.h: add comment explaining the order in object_type Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 05/12] sha1-name.c: move around the collect_ambiguous() function Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 06/12] get_short_oid: sort ambiguous objects by type, then SHA-1 Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 07/12] get_short_oid: learn to disambiguate by ^{tag} Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 08/12] get_short_oid: learn to disambiguate by ^{blob} Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 09/12] get_short_oid / peel_onion: ^{tree} should be tree, not treeish Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 10/12] get_short_oid / peel_onion: ^{commit} should be commit, not committish Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 11/12] config doc: document core.disambiguate Ævar Arnfjörð Bjarmason
2018-05-01 12:06 ` [PATCH v2 12/12] get_short_oid: document & warn if we ignore the type selector Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8A10qixJ7YtJKJejp1t49aBZFDn7CjSTiaq8GeVHuxiOQ@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).