* How DELTA objects values work and are calculated @ 2019-01-05 2:48 Farhan Khan 2019-01-05 4:46 ` Duy Nguyen 0 siblings, 1 reply; 4+ messages in thread From: Farhan Khan @ 2019-01-05 2:48 UTC (permalink / raw) To: git Hi all, I'm having trouble understanding how OBJ_REF_DELTA and OBJ_REF_DELTA (deltas) work in git. Where does git calculate the sha1 hash values when doing "git index-pack" in builtin/index-pack.c. I think my lack of understanding of the code is compounded the fact that I do not understand what the two object types are. From tracing the code starting from index-pack, all non-delta object type hashes are calculated in index-pack.c:1131 (parse_pack_objects). However, when the function ends, the delta objects hash values are set to all 0's. My questions are: A) How do Delta objects work? B) Where and how are the sha1 values calculated? I have read Documentation/technical/pack-format.txt, but am still not clear. Thank you! -- Farhan Khan PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How DELTA objects values work and are calculated 2019-01-05 2:48 How DELTA objects values work and are calculated Farhan Khan @ 2019-01-05 4:46 ` Duy Nguyen 2019-01-05 22:32 ` Farhan Khan 0 siblings, 1 reply; 4+ messages in thread From: Duy Nguyen @ 2019-01-05 4:46 UTC (permalink / raw) To: Farhan Khan; +Cc: Git Mailing List On Sat, Jan 5, 2019 at 9:49 AM Farhan Khan <khanzf@gmail.com> wrote: > > Hi all, > > I'm having trouble understanding how OBJ_REF_DELTA and OBJ_REF_DELTA > (deltas) work in git. Where does git calculate the sha1 hash values > when doing "git index-pack" in builtin/index-pack.c. I think my lack > of understanding of the code is compounded the fact that I do not > understand what the two object types are. > > From tracing the code starting from index-pack, all non-delta object > type hashes are calculated in index-pack.c:1131 (parse_pack_objects). > However, when the function ends, the delta objects hash values are set > to all 0's. Delta objects depend on other objects (and even delta ones). To calculate its sha1 values we may need to recursively calculate sha1 values of its base objects. This is why we do it in a separate phase because the calculation is more complicated than non-delta objects. > My questions are: > A) How do Delta objects work? A delta object consists of a reference to the base object (either an sha1 value, or the offset to where the object is) and a "delta" to be applied on (it's basically a binary diff). > B) Where and how are the sha1 values calculated? Start at threaded_second_pass() in index-pack.c, we go through all delta objects here and try to calculate their sha1 values. Eventually you'll hit resolve_delta(), where the delta is actually applied to the base object in the patch_delta() call, and the sha1 value calculated in the following hash_object_file() call. > > I have read Documentation/technical/pack-format.txt, but am still not clear. > > Thank you! > -- > Farhan Khan > PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE -- Duy ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How DELTA objects values work and are calculated 2019-01-05 4:46 ` Duy Nguyen @ 2019-01-05 22:32 ` Farhan Khan 2019-01-06 2:32 ` Duy Nguyen 0 siblings, 1 reply; 4+ messages in thread From: Farhan Khan @ 2019-01-05 22:32 UTC (permalink / raw) To: Duy Nguyen; +Cc: Git Mailing List On 1/4/19 11:46 PM, Duy Nguyen wrote: > On Sat, Jan 5, 2019 at 9:49 AM Farhan Khan <khanzf@gmail.com> wrote: >> >> Hi all, >> >> I'm having trouble understanding how OBJ_REF_DELTA and OBJ_REF_DELTA >> (deltas) work in git. Where does git calculate the sha1 hash values >> when doing "git index-pack" in builtin/index-pack.c. I think my lack >> of understanding of the code is compounded the fact that I do not >> understand what the two object types are. >> >> From tracing the code starting from index-pack, all non-delta object >> type hashes are calculated in index-pack.c:1131 (parse_pack_objects). >> However, when the function ends, the delta objects hash values are set >> to all 0's. > > Delta objects depend on other objects (and even delta ones). To > calculate its sha1 values we may need to recursively calculate sha1 > values of its base objects. This is why we do it in a separate phase > because the calculation is more complicated than non-delta objects. > >> My questions are: >> A) How do Delta objects work? > > A delta object consists of a reference to the base object (either an > sha1 value, or the offset to where the object is) and a "delta" to be > applied on (it's basically a binary diff). > >> B) Where and how are the sha1 values calculated? > > Start at threaded_second_pass() in index-pack.c, we go through all > delta objects here and try to calculate their sha1 values. Eventually > you'll hit resolve_delta(), where the delta is actually applied to the > base object in the patch_delta() call, and the sha1 value calculated > in the following hash_object_file() call. > >> >> I have read Documentation/technical/pack-format.txt, but am still not clear. >> >> Thank you! >> -- >> Farhan Khan >> PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE > > > Hi Duy, Thanks for explaining the Delta objects. What does a OBJ_REF_DELTA object itself consist of? Do you have to uncompress it to parse its values? How do you get its size? I read through resolve deltas which leads to threaded_second_pass, where you suggested to start, but I do not understand what is happening at a high level and get confused while reading the code. From threaded_second_pass, execution goes into a for-loop that runs resolve_base(), which runs runs find_unresolved_deltas(). Is this finding the unresolved deltas of the current object (The current OBJ_REF_DELTA we are going through)? This then runs find_unresolved_deltas() and shortly afterwards find_unresolved_deltas_1(). It seems that find_unresolved_deltas_1() is applying deltas, but I am not certain. I do not understand what is happening in any of these functions. There are some comments on builtin/index-pack.c:883-904 Overall, I do not understand this entire process, what values to capture along the way, and how they are consumed. Please provide some guidance on how this process works. Thank you! Farhan ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How DELTA objects values work and are calculated 2019-01-05 22:32 ` Farhan Khan @ 2019-01-06 2:32 ` Duy Nguyen 0 siblings, 0 replies; 4+ messages in thread From: Duy Nguyen @ 2019-01-06 2:32 UTC (permalink / raw) To: Farhan Khan; +Cc: Git Mailing List On Sun, Jan 6, 2019 at 5:32 AM Farhan Khan <khanzf@gmail.com> wrote: > Hi Duy, > > Thanks for explaining the Delta objects. > > What does a OBJ_REF_DELTA object itself consist of? from pack-format.txt (deltified representation) n-byte type and length (3-bit type, (n-1)*7+4-bit length) 20-byte base object name if OBJ_REF_DELTA or a negative relative offset from the delta object's position in the pack if this is an OBJ_OFS_DELTA object compressed delta data > Do you have to uncompress it to parse its values? The delta part is compressed, so yes. The "base object name" is not. > How do you get its size? Uncompress until the end the delta until the end. zlib stream has some sort of "end-of-stream" marker so it knows when to stop. > I read through resolve deltas which leads to threaded_second_pass, where > you suggested to start, but I do not understand what is happening at a > high level and get confused while reading the code. > > From threaded_second_pass, execution goes into a for-loop that runs > resolve_base(), which runs runs find_unresolved_deltas(). Is this > finding the unresolved deltas of the current object (The current > OBJ_REF_DELTA we are going through)? This then runs > find_unresolved_deltas() and shortly afterwards > find_unresolved_deltas_1(). It seems that find_unresolved_deltas_1() is > applying deltas, but I am not certain. Ah I forgot how "fun" these functions were :) The obvious way to resolve an delta object is to resolve (recursively) its base object first, then you apply delta on top and are done. However that implies recursion, and also not really cache friendly. So what find_unresolve_deltas_1() does is backward. It starts at a (already resolved, e.g. non-delta) base object, then applies deltas for all delta objects that immediately depend on it, then continue to resolve delta objects depending on these children... The find_*_delta_children() functions find these deltas, then find_unresolve_deltas_1() will call resolve_delta() to do the real work - the delta type (OBJ_REF_.. or OBJ_OFS_...) is already known at this point. I believe we know from the first pass - the delta is uncompressed here, with get_data_from_pack() - the base object is obtained via get_base_data(), which is recursive, but since we go backwards from parent to child, base->data should be already valid and get_base_data() becomes no-op > I do not understand what is happening in any of these functions. There > are some comments on builtin/index-pack.c:883-904 > > Overall, I do not understand this entire process, what values to capture > along the way, and how they are consumed. Please provide some guidance > on how this process works. An easier way to understand this is actually run it through a debugger (in single thread mode). Create a small repo with a handful of deltas. Use "git verify-pack -v" to see what object is delta and where... then you have something to double check while you step through the code. > > Thank you! > Farhan -- Duy ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-01-06 2:33 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-05 2:48 How DELTA objects values work and are calculated Farhan Khan 2019-01-05 4:46 ` Duy Nguyen 2019-01-05 22:32 ` Farhan Khan 2019-01-06 2:32 ` Duy Nguyen
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).