git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Docs: possible incorrect diagram of Delta Copy instruction
@ 2020-09-11 12:30 Bouke Versteegh
  2020-09-11 14:33 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Bouke Versteegh @ 2020-09-11 12:30 UTC (permalink / raw)
  To: git

Dear maintainers of git documentation,

While working on an implementation of git in dart, I've noticed a
possible error in the documentation. I hope I'm using the correct
channel to report this issue.

On [pack format](https://git-scm.com/docs/pack-format#_instruction_to_copy_from_base_object)
a diagram is shown, that explains the format of a Copy instruction
inside a Deltified pack entry:

+----------+---------+---------+---------+---------+-------+-------+-------+
| 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
+----------+---------+---------+---------+---------+-------+-------+-------+

The documentation specifies that diagrams follow the RFC950
(https://www.ietf.org/rfc/rfc1950.txt) format.
That means that the left bit is MSB, and the right bit is LSB, so the
OpCode is MSB (1xxxxxxx), which is correct and matches other sources.
It also would mean that offset 1-4 should be read from bit 7, 6, 5 and
4 (i.e. 0x40, 0x20, 0x10, 0x08)

However, looking at the git source-code and other documentation (see
[1] [2]), I see that offset [1-4] are read from the LOWEST 4 bits, and
the SIZE  bits are stored right after MSB (opcode).

https://github.com/git/git/blob/54e85e7af1ac9e9a92888060d6811ae767fea1bc/patch-delta.c#L49-L55

  PARSE_CP_PARAM(0x01, cp_off, 0);
  PARSE_CP_PARAM(0x02, cp_off, 8);
  PARSE_CP_PARAM(0x04, cp_off, 16);
  PARSE_CP_PARAM(0x08, cp_off, 24);
  PARSE_CP_PARAM(0x10, cp_size, 0);
  PARSE_CP_PARAM(0x20, cp_size, 8);
  PARSE_CP_PARAM(0x40, cp_size, 16);

From this source code, I would conclude that the diagram should look
like this instead:

+----------+---------+---------+---------+---------+-------+-------+-------+
| 1xxxxxxx | size3 | size 2 | size1 | offset4 | offset3 | offset2 | offset1 |
+----------+---------+---------+---------+---------+-------+-------+-------+

Please let me know if I misunderstood the diagram instead.

Kind regards, Bouke

[1] https://shop.jcoglan.com/building-git/
[2] https://codewords.recurse.com/issues/three/unpacking-git-packfiles#applying-deltas

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Docs: possible incorrect diagram of Delta Copy instruction
  2020-09-11 12:30 Docs: possible incorrect diagram of Delta Copy instruction Bouke Versteegh
@ 2020-09-11 14:33 ` Jeff King
  2020-09-12 11:21   ` Bouke Versteegh
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2020-09-11 14:33 UTC (permalink / raw)
  To: Bouke Versteegh; +Cc: git

On Fri, Sep 11, 2020 at 02:30:20PM +0200, Bouke Versteegh wrote:

> While working on an implementation of git in dart, I've noticed a
> possible error in the documentation. I hope I'm using the correct
> channel to report this issue.

This is definitely the right place.

> On [pack format](https://git-scm.com/docs/pack-format#_instruction_to_copy_from_base_object)
> a diagram is shown, that explains the format of a Copy instruction
> inside a Deltified pack entry:
> 
> +----------+---------+---------+---------+---------+-------+-------+-------+
> | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
> +----------+---------+---------+---------+---------+-------+-------+-------+
> 
> The documentation specifies that diagrams follow the RFC950
> (https://www.ietf.org/rfc/rfc1950.txt) format.

I think you mean rfc1951 here, but yeah, it is clear in that document
that bytes are in MSB order. And also that each of those boxes is a
single byte.

> That means that the left bit is MSB, and the right bit is LSB, so the
> OpCode is MSB (1xxxxxxx), which is correct and matches other sources.

Right, agreed.

> It also would mean that offset 1-4 should be read from bit 7, 6, 5 and
> 4 (i.e. 0x40, 0x20, 0x10, 0x08)

I don't think this follows, though. "offset1" is a separate byte. We
have a sequence of bytes here, some of which may be present. Their
presence is determined by the bits "xxxxxxx", but the diagram does not
say anything about the order.

The paragraphs below do say:

  This is the instruction format to copy a byte range from the source
  object. It encodes the offset to copy from and the number of bytes to
  copy. Offset and size are in little-endian order.

So "offset1" is the lowest byte of the offset value. We still don't know
which bit in the instruction byte tells us whether it's there yet,
though. The next paragraph says:

  All offset and size bytes are optional. This is to reduce the
  instruction size when encoding small offsets or sizes. The first seven
  bits in the first octet determines which of the next seven octets is
  present. If bit zero is set, offset1 is present. If bit one is set
  offset2 is present and so on.

So bit zero of the "xxxxxxx" (the LSB) is offset1, and we have to check
that first. Which I think is what happens in:

>   PARSE_CP_PARAM(0x01, cp_off, 0);

That's reading the low bit of the command byte, and then reading the
_next_ byte from data, and shifting it not at all (because it's the low
byte of the offset). That macro looks like this:

  #define PARSE_CP_PARAM(bit, var, shift) do { \
                          if (cmd & (bit)) { \
                                  if (data >= top) \
                                          goto bad_length; \
                                  var |= ((unsigned) *data++ << (shift)); \
                          } } while (0)

> However, looking at the git source-code and other documentation (see
> [1] [2]), I see that offset [1-4] are read from the LOWEST 4 bits, and
> the SIZE  bits are stored right after MSB (opcode).

I think both the code and diagram are right. It's just that the order of
the follow-on bytes is not part of that diagram, but rather the
explanation below it.

> From this source code, I would conclude that the diagram should look
> like this instead:
> 
> +----------+---------+---------+---------+---------+-------+-------+-------+
> | 1xxxxxxx | size3 | size 2 | size1 | offset4 | offset3 | offset2 | offset1 |
> +----------+---------+---------+---------+---------+-------+-------+-------+

That would definitely be wrong. If there's an offset1 byte, it
immediately follows the command byte.

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Docs: possible incorrect diagram of Delta Copy instruction
  2020-09-11 14:33 ` Jeff King
@ 2020-09-12 11:21   ` Bouke Versteegh
  0 siblings, 0 replies; 3+ messages in thread
From: Bouke Versteegh @ 2020-09-12 11:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi Jeff,

Thank you for your elaborate answer. So it turns out I misinterpreted
the diagram to represent the 8 opcode BITS rather than the whole
sequence of BYTES.
In hindsight there were some hints that I've overlooked, like the
first block containing 1xxxxxxx, which shows already its a byte, not a
bit, and the explanation below.

Thanks again for taking the time to explain!

Kind regards, Bouke

On Fri, 11 Sep 2020 at 16:33, Jeff King <peff@peff.net> wrote:
>
> On Fri, Sep 11, 2020 at 02:30:20PM +0200, Bouke Versteegh wrote:
>
> > While working on an implementation of git in dart, I've noticed a
> > possible error in the documentation. I hope I'm using the correct
> > channel to report this issue.
>
> This is definitely the right place.
>
> > On [pack format](https://git-scm.com/docs/pack-format#_instruction_to_copy_from_base_object)
> > a diagram is shown, that explains the format of a Copy instruction
> > inside a Deltified pack entry:
> >
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> > | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> >
> > The documentation specifies that diagrams follow the RFC950
> > (https://www.ietf.org/rfc/rfc1950.txt) format.
>
> I think you mean rfc1951 here, but yeah, it is clear in that document
> that bytes are in MSB order. And also that each of those boxes is a
> single byte.
>
> > That means that the left bit is MSB, and the right bit is LSB, so the
> > OpCode is MSB (1xxxxxxx), which is correct and matches other sources.
>
> Right, agreed.
>
> > It also would mean that offset 1-4 should be read from bit 7, 6, 5 and
> > 4 (i.e. 0x40, 0x20, 0x10, 0x08)
>
> I don't think this follows, though. "offset1" is a separate byte. We
> have a sequence of bytes here, some of which may be present. Their
> presence is determined by the bits "xxxxxxx", but the diagram does not
> say anything about the order.
>
> The paragraphs below do say:
>
>   This is the instruction format to copy a byte range from the source
>   object. It encodes the offset to copy from and the number of bytes to
>   copy. Offset and size are in little-endian order.
>
> So "offset1" is the lowest byte of the offset value. We still don't know
> which bit in the instruction byte tells us whether it's there yet,
> though. The next paragraph says:
>
>   All offset and size bytes are optional. This is to reduce the
>   instruction size when encoding small offsets or sizes. The first seven
>   bits in the first octet determines which of the next seven octets is
>   present. If bit zero is set, offset1 is present. If bit one is set
>   offset2 is present and so on.
>
> So bit zero of the "xxxxxxx" (the LSB) is offset1, and we have to check
> that first. Which I think is what happens in:
>
> >   PARSE_CP_PARAM(0x01, cp_off, 0);
>
> That's reading the low bit of the command byte, and then reading the
> _next_ byte from data, and shifting it not at all (because it's the low
> byte of the offset). That macro looks like this:
>
>   #define PARSE_CP_PARAM(bit, var, shift) do { \
>                           if (cmd & (bit)) { \
>                                   if (data >= top) \
>                                           goto bad_length; \
>                                   var |= ((unsigned) *data++ << (shift)); \
>                           } } while (0)
>
> > However, looking at the git source-code and other documentation (see
> > [1] [2]), I see that offset [1-4] are read from the LOWEST 4 bits, and
> > the SIZE  bits are stored right after MSB (opcode).
>
> I think both the code and diagram are right. It's just that the order of
> the follow-on bytes is not part of that diagram, but rather the
> explanation below it.
>
> > From this source code, I would conclude that the diagram should look
> > like this instead:
> >
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> > | 1xxxxxxx | size3 | size 2 | size1 | offset4 | offset3 | offset2 | offset1 |
> > +----------+---------+---------+---------+---------+-------+-------+-------+
>
> That would definitely be wrong. If there's an offset1 byte, it
> immediately follows the command byte.
>
> -Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-12 11:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 12:30 Docs: possible incorrect diagram of Delta Copy instruction Bouke Versteegh
2020-09-11 14:33 ` Jeff King
2020-09-12 11:21   ` Bouke Versteegh

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).