git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Philip Oakley <philipoakley@iee.org>
Cc: Ovatta Bianca <ovattabianca@gmail.com>, git@vger.kernel.org
Subject: Re: what is a snapshot?
Date: Fri, 24 Jun 2016 11:36:47 -0400	[thread overview]
Message-ID: <20160624153647.GA2448@sigill.intra.peff.net> (raw)
In-Reply-To: <F6172B8DA802476C863849DEA02684A7@PhilipOakley>

On Sun, Jun 19, 2016 at 04:20:14PM +0100, Philip Oakley wrote:

> From: "Ovatta Bianca" <ovattabianca@gmail.com>
> > I read in every comparison of git vs other version control systems,
> > that git does not record differences but takes "snapshots"
> > I was wondering what a "snapshot" is ? Does git store at every commit
> > the entire files which have been modified even if only a few bytes
> > were changed out of a large file?
> > 
> A snaphot is like a tar or zip of all your tracked files. This means it is
> easier to determine (compared to lots of diffs) the current content.
> 
> Keeping all the snapshots as separate loose items, when the majority of
> their content is unchanged would be very inefficient, so git then uses, at
> the right time, an efficient (and obviously lossless) compression technique
> to 'zip' all the snapshots together so that the final repository is
> 'packed'. The overall effect is a very efficient storage scheme.
> 
> There are some good explanations on the web, such as the
> https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
> though you may want to scan from the beginning ;-)

I think the delta compression is only half the story.

Each commit is a snapshot in that it points to the sha1 of the root
tree, which points to the sha1 of other trees and blobs. And following
that chain gives you the whole state of the tree, without having to care
about other commits.

And if the next commit changes only a few files, the sha1 for all the
other files will remain unchanged, and git does not need to store them
again. So already, before any explicit compression has happened, we get
de-duplication of identical content from commit to commit, at the file
and tree level.

And then when a file does change, we store the whole new version, then
delta compress it during "git gc", etc, as you describe.

-Peff

  reply	other threads:[~2016-06-24 15:36 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHWPVgNrTEw9FmW6K7QucgA74QWsTKfxZGt+mGd099k+O7O+rw@mail.gmail.com>
2016-06-19 14:15 ` Fwd: what is a snapshot? Ovatta Bianca
2016-06-19 15:20   ` Philip Oakley
2016-06-24 15:36     ` Jeff King [this message]
2016-06-29 19:28   ` Fwd: " Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160624153647.GA2448@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=ovattabianca@gmail.com \
    --cc=philipoakley@iee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).