git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* Question about the pack OBJ_OFS_DELTA format
@ 2020-01-10  8:26 Erik Fastermann
  2020-01-10  9:57 ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Erik Fastermann @ 2020-01-10  8:26 UTC (permalink / raw)
  To: git

Hi all,

I'm trying to implement the Git pack format. Parsing the index file and
unpacking undeltified objects already works. However I'm unable to get
the offset, if the type is OBJ_OFS_DELTA.

The very much work-in-progress Go code and data can be found here:
https://github.com/erikfastermann/notes/tree/wip/git

From the docs (https://git-scm.com/docs/pack-format) I assume, that
the offset is a variable length integer, like the size.  However my
caclulation leads to a illogical result.

When trying the hash 8c40ff4767d973b672fe5aa431cb8ba0593dd26a with:

git verify-pack -v pack.pack

I get: offset: 138650 size: 30.

The base object 9b25d441c1bf358af01edc4eeba65870581a5ac1 (shown by
verify-pack) has the offset 136887, which I think means the delta should
be: 138650 - 136887 = 1763.

Using the command:

dd skip=138650 count=4 if=pack.pack bs=1 status=none | hexdump -C

I get: ee 01 8c 63

The first two bytes, the type and the size are correctly computed.

So the next varint should be the offset.

8c: 10001100 --- 63: 01100011

-> 1100011_0001100

-> 12684 ???

The result is the same when calculating it manually and with my program.

I probably have some crucial misunderstanding about the format, so a
clarification would be nice.

Thank you for your help.

Erik

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about the pack OBJ_OFS_DELTA format
  2020-01-10  8:26 Question about the pack OBJ_OFS_DELTA format Erik Fastermann
@ 2020-01-10  9:57 ` Jeff King
  2020-01-10 13:56   ` Erik Fastermann
  2020-01-10 19:41   ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff King @ 2020-01-10  9:57 UTC (permalink / raw)
  To: Erik Fastermann; +Cc: git

On Fri, Jan 10, 2020 at 09:26:27AM +0100, Erik Fastermann wrote:

> I get: ee 01 8c 63
> 
> The first two bytes, the type and the size are correctly computed.
> 
> So the next varint should be the offset.
> 
> 8c: 10001100 --- 63: 01100011
> 
> -> 1100011_0001100
> 
> -> 12684 ???
> 
> The result is the same when calculating it manually and with my program.

The pack-format.txt file says:

       offset encoding:
            n bytes with MSB set in all but the last one.
            The offset is then the number constructed by
            concatenating the lower 7 bit of each byte, and
            for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
            to the result.

but I think is missing two bits of information:

  - the bytes are in most-significant to least-significant order, which
    IIRC is the opposite of the size varint

  - each 7-bit byte sneaks in some extra data by implicitly adding "1"
    to all but the last byte

So the low seven bits of "8c" is "12". Add one and multiply by 2^7 gets
you 1664. The low seven of "63" is 99. No addition or multiply because
it's the last byte.

The result is 1763, which is what you expected.

It does seem like the documentation could be a lot better. I had to dig
into the source (packfile.c:get_delta_base is pretty clear, but if
you're trying to do a non-GPL clean-room implementation, then obviously
don't look at it).

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about the pack OBJ_OFS_DELTA format
  2020-01-10  9:57 ` Jeff King
@ 2020-01-10 13:56   ` Erik Fastermann
  2020-01-10 19:41   ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Erik Fastermann @ 2020-01-10 13:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Thanks mate. That helped me a lot.

Erik

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about the pack OBJ_OFS_DELTA format
  2020-01-10  9:57 ` Jeff King
  2020-01-10 13:56   ` Erik Fastermann
@ 2020-01-10 19:41   ` Junio C Hamano
  2020-01-11  9:56     ` Jeff King
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2020-01-10 19:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Erik Fastermann, git

Jeff King <peff@peff.net> writes:

> The pack-format.txt file says:
>
>        offset encoding:
>             n bytes with MSB set in all but the last one.
>             The offset is then the number constructed by
>             concatenating the lower 7 bit of each byte, and
>             for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
>             to the result.
>
> but I think is missing two bits of information:
>
>   - the bytes are in most-significant to least-significant order, which
>     IIRC is the opposite of the size varint
>
>   - each 7-bit byte sneaks in some extra data by implicitly adding "1"
>     to all but the last byte

Isn't the latter mentioned in the paragraph you quoted?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about the pack OBJ_OFS_DELTA format
  2020-01-10 19:41   ` Junio C Hamano
@ 2020-01-11  9:56     ` Jeff King
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2020-01-11  9:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Erik Fastermann, git

On Fri, Jan 10, 2020 at 11:41:08AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The pack-format.txt file says:
> >
> >        offset encoding:
> >             n bytes with MSB set in all but the last one.
> >             The offset is then the number constructed by
> >             concatenating the lower 7 bit of each byte, and
> >             for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
> >             to the result.
> >
> > but I think is missing two bits of information:
> >
> >   - the bytes are in most-significant to least-significant order, which
> >     IIRC is the opposite of the size varint
> >
> >   - each 7-bit byte sneaks in some extra data by implicitly adding "1"
> >     to all but the last byte
> 
> Isn't the latter mentioned in the paragraph you quoted?

Hmm, yeah. I admit I had trouble parsing exactly what that part was
trying to say, and thought it was trying to talk about how you'd shift
the individual bytes. But reading more carefully, it does say "adding",
so yeah, it accounts for the extra.

It's a little confusing, I think, because in code you'd add just
continually add one before shifting, rather than trying to add in the
extra values at the end.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-10  8:26 Question about the pack OBJ_OFS_DELTA format Erik Fastermann
2020-01-10  9:57 ` Jeff King
2020-01-10 13:56   ` Erik Fastermann
2020-01-10 19:41   ` Junio C Hamano
2020-01-11  9:56     ` Jeff King

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git