git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git clone question
       [not found] <CAMSEaH6+fQuDjhY-5THoYpEjjrHU4Sofnmr-nASHaTZbHvQw1w@mail.gmail.com>
@ 2020-04-07 18:22 ` Sankar P
  2020-04-07 20:01   ` Bryan Turner
  0 siblings, 1 reply; 3+ messages in thread
From: Sankar P @ 2020-04-07 18:22 UTC (permalink / raw)
  To: git

Hi

I am trying to understand about git clone.

From the few git videos that I have watched and using git, I
understand that git stores the difference between each version, as an
object, with the sha of the diff as the way to address the object.

However, what is not clear to me is, how does `git clone` then work ?
If a repository has a thousand commits, do we download all the
thousand objects to the client system and then apply them one on top
of the other ? I am sure that must not be the case because the `git
clone` operation completes so fast, and I doubt if my disks are that
fast.

However, when I do a `git clone` I could see the history until the first commit.

I can also partially ignore the history and clone only the last N
commits history too. So my question is, how does `git clone` work
under the hood and how is it so fast ? Does the git server save the
expanded git tree (with all the git patches applied) and we just
transfer them when we do the `git clone` ?

Are there any good talks / papers / books on the internals of the
working of git ?

Thanks.


-- 
Sankar P
http://psankar.blogspot.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Git clone question
  2020-04-07 18:22 ` Git clone question Sankar P
@ 2020-04-07 20:01   ` Bryan Turner
  2020-04-08  2:27     ` Sankar P
  0 siblings, 1 reply; 3+ messages in thread
From: Bryan Turner @ 2020-04-07 20:01 UTC (permalink / raw)
  To: Sankar P; +Cc: Git Users

On Tue, Apr 7, 2020 at 11:21 AM Sankar P <sankar.curiosity@gmail.com> wrote:
>
> Hi
>
> I am trying to understand about git clone.
>
> From the few git videos that I have watched and using git, I
> understand that git stores the difference between each version, as an
> object, with the sha of the diff as the way to address the object.

This is not correct. Git stores full objects, not diffs/patches, and
the SHA is of the full object contents, not the changes to those
contents versus the previous contents.

Objects are stored compressed (using libz). In addition to
compression, when objects are packed Git can use a technique called
"delta compression" to allow it to build one object in terms of
another. This is something like a diff/patch, but it's not the delta
that gets hashed; it's the full object (before and after).

>
> However, what is not clear to me is, how does `git clone` then work ?
> If a repository has a thousand commits, do we download all the
> thousand objects to the client system and then apply them one on top
> of the other ? I am sure that must not be the case because the `git
> clone` operation completes so fast, and I doubt if my disks are that
> fast.

Between libz compression and delta compression the pack file
containing all of those objects tends to be substantially smaller than
the full set of objects.

>
> However, when I do a `git clone` I could see the history until the first commit.

Yes, when cloning by default you receive all the objects in the
history of the repository and, as you note below, you can also perform
a "shallow clone" to limit what you get to some specific depth. Newer
versions of Git are also working on introducing the concept of a
"partial clone", where some types of objects (large files, for
example) are not downloaded up front and are instead downloaded on
first access. This isn't yet widely supported, though.

>
> I can also partially ignore the history and clone only the last N
> commits history too. So my question is, how does `git clone` work
> under the hood and how is it so fast ? Does the git server save the
> expanded git tree (with all the git patches applied) and we just
> transfer them when we do the `git clone` ?

As noted above, there's no "patching" going on. The server does store
a packfile, though, and depending on how recently the pack was created
it can reuse substantial portions of that pack when creating a pack to
serve the clone.

>
> Are there any good talks / papers / books on the internals of the
> working of git ?

Have you checked https://git-scm.com/book/en/v2 ?

>
> Thanks.
>
>
> --
> Sankar P
> http://psankar.blogspot.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Git clone question
  2020-04-07 20:01   ` Bryan Turner
@ 2020-04-08  2:27     ` Sankar P
  0 siblings, 0 replies; 3+ messages in thread
From: Sankar P @ 2020-04-08  2:27 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users

> Objects are stored compressed (using libz). In addition to
> compression, when objects are packed Git can use a technique called
> "delta compression" to allow it to build one object in terms of
> another. This is something like a diff/patch, but it's not the delta
> that gets hashed; it's the full object (before and after).

oh okay.

>
> Between libz compression and delta compression the pack file
> containing all of those objects tends to be substantially smaller than
> the full set of objects.

Fascinating.

> As noted above, there's no "patching" going on. The server does store
> a packfile, though, and depending on how recently the pack was created
> it can reuse substantial portions of that pack when creating a pack to
> serve the clone.

> Have you checked https://git-scm.com/book/en/v2 ?

Thanks. I will check this out. The 10th chapter seems related. I
earlier came across this but thought it was a user guide, based on the
first few chapters and did not reach here.

If there are any other blog posts, talks, etc. about the delta
compression, packfile etc. too, please let me know. Thanks.

-- 
Sankar P
http://psankar.blogspot.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-08  2:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAMSEaH6+fQuDjhY-5THoYpEjjrHU4Sofnmr-nASHaTZbHvQw1w@mail.gmail.com>
2020-04-07 18:22 ` Git clone question Sankar P
2020-04-07 20:01   ` Bryan Turner
2020-04-08  2:27     ` Sankar P

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).