git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Bouke Versteegh <info@boukeversteegh.nl>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: Docs: possible incorrect diagram of Delta Copy instruction
Date: Sat, 12 Sep 2020 13:21:57 +0200	[thread overview]
Message-ID: <CAFag017mV=gE3GZ2Cn6gp9=kDF+SH1hm_Xb_UiV0MHsftDr3tw@mail.gmail.com> (raw)
In-Reply-To: <20200911143321.GA2374950@coredump.intra.peff.net>

Hi Jeff,

Thank you for your elaborate answer. So it turns out I misinterpreted
the diagram to represent the 8 opcode BITS rather than the whole
sequence of BYTES.
In hindsight there were some hints that I've overlooked, like the
first block containing 1xxxxxxx, which shows already its a byte, not a
bit, and the explanation below.

Thanks again for taking the time to explain!

Kind regards, Bouke

On Fri, 11 Sep 2020 at 16:33, Jeff King <peff@peff.net> wrote:
>
> On Fri, Sep 11, 2020 at 02:30:20PM +0200, Bouke Versteegh wrote:
>
> > While working on an implementation of git in dart, I've noticed a
> > possible error in the documentation. I hope I'm using the correct
> > channel to report this issue.
>
> This is definitely the right place.
>
> > On [pack format](https://git-scm.com/docs/pack-format#_instruction_to_copy_from_base_object)
> > a diagram is shown, that explains the format of a Copy instruction
> > inside a Deltified pack entry:
> >
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> > | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> >
> > The documentation specifies that diagrams follow the RFC950
> > (https://www.ietf.org/rfc/rfc1950.txt) format.
>
> I think you mean rfc1951 here, but yeah, it is clear in that document
> that bytes are in MSB order. And also that each of those boxes is a
> single byte.
>
> > That means that the left bit is MSB, and the right bit is LSB, so the
> > OpCode is MSB (1xxxxxxx), which is correct and matches other sources.
>
> Right, agreed.
>
> > It also would mean that offset 1-4 should be read from bit 7, 6, 5 and
> > 4 (i.e. 0x40, 0x20, 0x10, 0x08)
>
> I don't think this follows, though. "offset1" is a separate byte. We
> have a sequence of bytes here, some of which may be present. Their
> presence is determined by the bits "xxxxxxx", but the diagram does not
> say anything about the order.
>
> The paragraphs below do say:
>
>   This is the instruction format to copy a byte range from the source
>   object. It encodes the offset to copy from and the number of bytes to
>   copy. Offset and size are in little-endian order.
>
> So "offset1" is the lowest byte of the offset value. We still don't know
> which bit in the instruction byte tells us whether it's there yet,
> though. The next paragraph says:
>
>   All offset and size bytes are optional. This is to reduce the
>   instruction size when encoding small offsets or sizes. The first seven
>   bits in the first octet determines which of the next seven octets is
>   present. If bit zero is set, offset1 is present. If bit one is set
>   offset2 is present and so on.
>
> So bit zero of the "xxxxxxx" (the LSB) is offset1, and we have to check
> that first. Which I think is what happens in:
>
> >   PARSE_CP_PARAM(0x01, cp_off, 0);
>
> That's reading the low bit of the command byte, and then reading the
> _next_ byte from data, and shifting it not at all (because it's the low
> byte of the offset). That macro looks like this:
>
>   #define PARSE_CP_PARAM(bit, var, shift) do { \
>                           if (cmd & (bit)) { \
>                                   if (data >= top) \
>                                           goto bad_length; \
>                                   var |= ((unsigned) *data++ << (shift)); \
>                           } } while (0)
>
> > However, looking at the git source-code and other documentation (see
> > [1] [2]), I see that offset [1-4] are read from the LOWEST 4 bits, and
> > the SIZE  bits are stored right after MSB (opcode).
>
> I think both the code and diagram are right. It's just that the order of
> the follow-on bytes is not part of that diagram, but rather the
> explanation below it.
>
> > From this source code, I would conclude that the diagram should look
> > like this instead:
> >
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> > | 1xxxxxxx | size3 | size 2 | size1 | offset4 | offset3 | offset2 | offset1 |
> > +----------+---------+---------+---------+---------+-------+-------+-------+
>
> That would definitely be wrong. If there's an offset1 byte, it
> immediately follows the command byte.
>
> -Peff

      reply	other threads:[~2020-09-12 11:23 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-11 12:30 Docs: possible incorrect diagram of Delta Copy instruction Bouke Versteegh
2020-09-11 14:33 ` Jeff King
2020-09-12 11:21   ` Bouke Versteegh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFag017mV=gE3GZ2Cn6gp9=kDF+SH1hm_Xb_UiV0MHsftDr3tw@mail.gmail.com' \
    --to=info@boukeversteegh.nl \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).