git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Farhan Khan <khanzf@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: How DELTA objects values work and are calculated
Date: Sun, 6 Jan 2019 09:32:50 +0700	[thread overview]
Message-ID: <CACsJy8A9=sKjoM+3ZoA0-PnufC4MNvP9MPoM9r6FN3sxAcZvbA@mail.gmail.com> (raw)
In-Reply-To: <581076b0-95c5-9af9-dec5-5a9bccfe2634@gmail.com>

On Sun, Jan 6, 2019 at 5:32 AM Farhan Khan <khanzf@gmail.com> wrote:
> Hi Duy,
>
> Thanks for explaining the Delta objects.
>
> What does a OBJ_REF_DELTA object itself consist of?

from pack-format.txt

     (deltified representation)
     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
     20-byte base object name if OBJ_REF_DELTA or a negative relative
offset from the delta object's position in the pack if this
is an OBJ_OFS_DELTA object
     compressed delta data


> Do you have to uncompress it to parse its values?

The delta part is compressed, so yes. The "base object name" is not.

> How do you get its size?

Uncompress until the end the delta until the end. zlib stream has some
sort of "end-of-stream" marker so it knows when to stop.

> I read through resolve deltas which leads to threaded_second_pass, where
> you suggested to start, but I do not understand what is happening at a
> high level and get confused while reading the code.
>
>  From threaded_second_pass, execution goes into a for-loop that runs
> resolve_base(), which runs runs find_unresolved_deltas(). Is this
> finding the unresolved deltas of the current object (The current
> OBJ_REF_DELTA we are going through)? This then runs
> find_unresolved_deltas() and shortly afterwards
> find_unresolved_deltas_1(). It seems that find_unresolved_deltas_1() is
> applying deltas, but I am not certain.

Ah I forgot how "fun" these functions were :) The obvious way to
resolve an delta object is to resolve (recursively) its base object
first, then you apply delta on top and are done. However that implies
recursion, and also not really cache friendly. So what
find_unresolve_deltas_1() does is backward. It starts at a (already
resolved, e.g. non-delta) base object, then applies deltas for all
delta objects that immediately depend on it, then continue to resolve
delta objects depending on these children... The
find_*_delta_children() functions find these deltas, then
find_unresolve_deltas_1() will call resolve_delta() to do the real
work

- the delta type (OBJ_REF_.. or OBJ_OFS_...) is already known at this
point. I believe we know from the first pass
- the delta is uncompressed here, with get_data_from_pack()
- the base object is obtained via get_base_data(), which is recursive,
but since we go backwards from parent to child, base->data should be
already valid and get_base_data() becomes no-op

> I do not understand what is happening in any of these functions. There
> are some comments on builtin/index-pack.c:883-904
>
> Overall, I do not understand this entire process, what values to capture
> along the way, and how they are consumed. Please provide some guidance
> on how this process works.

An easier way to understand this is actually run it through a debugger
(in single thread mode). Create a small repo with a handful of deltas.
Use "git verify-pack -v" to see what object is delta and where... then
you have something to double check while you step through the code.

>
> Thank you!
> Farhan
-- 
Duy

      reply	other threads:[~2019-01-06  2:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-05  2:48 How DELTA objects values work and are calculated Farhan Khan
2019-01-05  4:46 ` Duy Nguyen
2019-01-05 22:32   ` Farhan Khan
2019-01-06  2:32     ` Duy Nguyen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8A9=sKjoM+3ZoA0-PnufC4MNvP9MPoM9r6FN3sxAcZvbA@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=khanzf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).