git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* How DELTA objects values work and are calculated
@ 2019-01-05  2:48 Farhan Khan
  2019-01-05  4:46 ` Duy Nguyen
  0 siblings, 1 reply; 4+ messages in thread
From: Farhan Khan @ 2019-01-05  2:48 UTC (permalink / raw)
  To: git

Hi all,

I'm having trouble understanding how OBJ_REF_DELTA and OBJ_REF_DELTA
(deltas) work in git. Where does git calculate the sha1 hash values
when doing "git index-pack" in builtin/index-pack.c. I think my lack
of understanding of the code is compounded the fact that I do not
understand what the two object types are.

From tracing the code starting from index-pack, all non-delta object
type hashes are calculated in index-pack.c:1131 (parse_pack_objects).
However, when the function ends, the delta objects hash values are set
to all 0's.

My questions are:
A) How do Delta objects work?
B) Where and how are the sha1 values calculated?

I have read Documentation/technical/pack-format.txt, but am still not clear.

Thank you!
--
Farhan Khan
PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How DELTA objects values work and are calculated
  2019-01-05  2:48 How DELTA objects values work and are calculated Farhan Khan
@ 2019-01-05  4:46 ` Duy Nguyen
  2019-01-05 22:32   ` Farhan Khan
  0 siblings, 1 reply; 4+ messages in thread
From: Duy Nguyen @ 2019-01-05  4:46 UTC (permalink / raw)
  To: Farhan Khan; +Cc: Git Mailing List

On Sat, Jan 5, 2019 at 9:49 AM Farhan Khan <khanzf@gmail.com> wrote:
>
> Hi all,
>
> I'm having trouble understanding how OBJ_REF_DELTA and OBJ_REF_DELTA
> (deltas) work in git. Where does git calculate the sha1 hash values
> when doing "git index-pack" in builtin/index-pack.c. I think my lack
> of understanding of the code is compounded the fact that I do not
> understand what the two object types are.
>
> From tracing the code starting from index-pack, all non-delta object
> type hashes are calculated in index-pack.c:1131 (parse_pack_objects).
> However, when the function ends, the delta objects hash values are set
> to all 0's.

Delta objects depend on other objects (and even delta ones). To
calculate its sha1 values we may need to recursively calculate sha1
values of its base objects. This is why we do it in a separate phase
because the calculation is more complicated than non-delta objects.

> My questions are:
> A) How do Delta objects work?

A delta object consists of a reference to the base object (either an
sha1 value, or the offset to where the object is) and a "delta" to be
applied on (it's basically a binary diff).

> B) Where and how are the sha1 values calculated?

Start at threaded_second_pass() in index-pack.c, we go through all
delta objects here and try to calculate their sha1 values. Eventually
you'll hit resolve_delta(), where the delta is actually applied to the
base object in the patch_delta() call, and the sha1 value calculated
in the following hash_object_file() call.

>
> I have read Documentation/technical/pack-format.txt, but am still not clear.
>
> Thank you!
> --
> Farhan Khan
> PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE



-- 
Duy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How DELTA objects values work and are calculated
  2019-01-05  4:46 ` Duy Nguyen
@ 2019-01-05 22:32   ` Farhan Khan
  2019-01-06  2:32     ` Duy Nguyen
  0 siblings, 1 reply; 4+ messages in thread
From: Farhan Khan @ 2019-01-05 22:32 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List



On 1/4/19 11:46 PM, Duy Nguyen wrote:
> On Sat, Jan 5, 2019 at 9:49 AM Farhan Khan <khanzf@gmail.com> wrote:
>>
>> Hi all,
>>
>> I'm having trouble understanding how OBJ_REF_DELTA and OBJ_REF_DELTA
>> (deltas) work in git. Where does git calculate the sha1 hash values
>> when doing "git index-pack" in builtin/index-pack.c. I think my lack
>> of understanding of the code is compounded the fact that I do not
>> understand what the two object types are.
>>
>>  From tracing the code starting from index-pack, all non-delta object
>> type hashes are calculated in index-pack.c:1131 (parse_pack_objects).
>> However, when the function ends, the delta objects hash values are set
>> to all 0's.
> 
> Delta objects depend on other objects (and even delta ones). To
> calculate its sha1 values we may need to recursively calculate sha1
> values of its base objects. This is why we do it in a separate phase
> because the calculation is more complicated than non-delta objects.
> 
>> My questions are:
>> A) How do Delta objects work?
> 
> A delta object consists of a reference to the base object (either an
> sha1 value, or the offset to where the object is) and a "delta" to be
> applied on (it's basically a binary diff).
> 
>> B) Where and how are the sha1 values calculated?
> 
> Start at threaded_second_pass() in index-pack.c, we go through all
> delta objects here and try to calculate their sha1 values. Eventually
> you'll hit resolve_delta(), where the delta is actually applied to the
> base object in the patch_delta() call, and the sha1 value calculated
> in the following hash_object_file() call.
> 
>>
>> I have read Documentation/technical/pack-format.txt, but am still not clear.
>>
>> Thank you!
>> --
>> Farhan Khan
>> PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE
> 
> 
> 


Hi Duy,

Thanks for explaining the Delta objects.

What does a OBJ_REF_DELTA object itself consist of? Do you have to 
uncompress it to parse its values? How do you get its size?

I read through resolve deltas which leads to threaded_second_pass, where 
you suggested to start, but I do not understand what is happening at a 
high level and get confused while reading the code.

 From threaded_second_pass, execution goes into a for-loop that runs 
resolve_base(), which runs runs find_unresolved_deltas(). Is this 
finding the unresolved deltas of the current object (The current 
OBJ_REF_DELTA we are going through)? This then runs 
find_unresolved_deltas() and shortly afterwards 
find_unresolved_deltas_1(). It seems that find_unresolved_deltas_1() is 
applying deltas, but I am not certain.

I do not understand what is happening in any of these functions. There 
are some comments on builtin/index-pack.c:883-904

Overall, I do not understand this entire process, what values to capture 
along the way, and how they are consumed. Please provide some guidance 
on how this process works.

Thank you!
Farhan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How DELTA objects values work and are calculated
  2019-01-05 22:32   ` Farhan Khan
@ 2019-01-06  2:32     ` Duy Nguyen
  0 siblings, 0 replies; 4+ messages in thread
From: Duy Nguyen @ 2019-01-06  2:32 UTC (permalink / raw)
  To: Farhan Khan; +Cc: Git Mailing List

On Sun, Jan 6, 2019 at 5:32 AM Farhan Khan <khanzf@gmail.com> wrote:
> Hi Duy,
>
> Thanks for explaining the Delta objects.
>
> What does a OBJ_REF_DELTA object itself consist of?

from pack-format.txt

     (deltified representation)
     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
     20-byte base object name if OBJ_REF_DELTA or a negative relative
offset from the delta object's position in the pack if this
is an OBJ_OFS_DELTA object
     compressed delta data


> Do you have to uncompress it to parse its values?

The delta part is compressed, so yes. The "base object name" is not.

> How do you get its size?

Uncompress until the end the delta until the end. zlib stream has some
sort of "end-of-stream" marker so it knows when to stop.

> I read through resolve deltas which leads to threaded_second_pass, where
> you suggested to start, but I do not understand what is happening at a
> high level and get confused while reading the code.
>
>  From threaded_second_pass, execution goes into a for-loop that runs
> resolve_base(), which runs runs find_unresolved_deltas(). Is this
> finding the unresolved deltas of the current object (The current
> OBJ_REF_DELTA we are going through)? This then runs
> find_unresolved_deltas() and shortly afterwards
> find_unresolved_deltas_1(). It seems that find_unresolved_deltas_1() is
> applying deltas, but I am not certain.

Ah I forgot how "fun" these functions were :) The obvious way to
resolve an delta object is to resolve (recursively) its base object
first, then you apply delta on top and are done. However that implies
recursion, and also not really cache friendly. So what
find_unresolve_deltas_1() does is backward. It starts at a (already
resolved, e.g. non-delta) base object, then applies deltas for all
delta objects that immediately depend on it, then continue to resolve
delta objects depending on these children... The
find_*_delta_children() functions find these deltas, then
find_unresolve_deltas_1() will call resolve_delta() to do the real
work

- the delta type (OBJ_REF_.. or OBJ_OFS_...) is already known at this
point. I believe we know from the first pass
- the delta is uncompressed here, with get_data_from_pack()
- the base object is obtained via get_base_data(), which is recursive,
but since we go backwards from parent to child, base->data should be
already valid and get_base_data() becomes no-op

> I do not understand what is happening in any of these functions. There
> are some comments on builtin/index-pack.c:883-904
>
> Overall, I do not understand this entire process, what values to capture
> along the way, and how they are consumed. Please provide some guidance
> on how this process works.

An easier way to understand this is actually run it through a debugger
(in single thread mode). Create a small repo with a handful of deltas.
Use "git verify-pack -v" to see what object is delta and where... then
you have something to double check while you step through the code.

>
> Thank you!
> Farhan
-- 
Duy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-01-06  2:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-05  2:48 How DELTA objects values work and are calculated Farhan Khan
2019-01-05  4:46 ` Duy Nguyen
2019-01-05 22:32   ` Farhan Khan
2019-01-06  2:32     ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).