git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Fwd: what is a snapshot?
       [not found] <CAHWPVgNrTEw9FmW6K7QucgA74QWsTKfxZGt+mGd099k+O7O+rw@mail.gmail.com>
@ 2016-06-19 14:15 ` Ovatta Bianca
  2016-06-19 15:20   ` Philip Oakley
  2016-06-29 19:28   ` Fwd: " Jakub Narębski
  0 siblings, 2 replies; 4+ messages in thread
From: Ovatta Bianca @ 2016-06-19 14:15 UTC (permalink / raw)
  To: git

I read in every comparison of git vs other version control systems,
that git does not record differences but takes "snapshots"
I was wondering what a "snapshot" is ? Does git store at every commit
the entire files which have been modified even if only a few bytes
were changed out of a large file?

thank you

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: what is a snapshot?
  2016-06-19 14:15 ` Fwd: what is a snapshot? Ovatta Bianca
@ 2016-06-19 15:20   ` Philip Oakley
  2016-06-24 15:36     ` Jeff King
  2016-06-29 19:28   ` Fwd: " Jakub Narębski
  1 sibling, 1 reply; 4+ messages in thread
From: Philip Oakley @ 2016-06-19 15:20 UTC (permalink / raw)
  To: Ovatta Bianca, git

From: "Ovatta Bianca" <ovattabianca@gmail.com>
>I read in every comparison of git vs other version control systems,
> that git does not record differences but takes "snapshots"
> I was wondering what a "snapshot" is ? Does git store at every commit
> the entire files which have been modified even if only a few bytes
> were changed out of a large file?
>
A snaphot is like a tar or zip of all your tracked files. This means it is 
easier to determine (compared to lots of diffs) the current content.

Keeping all the snapshots as separate loose items, when the majority of 
their content is unchanged would be very inefficient, so git then uses, at 
the right time, an efficient (and obviously lossless) compression technique 
to 'zip' all the snapshots together so that the final repository is 
'packed'. The overall effect is a very efficient storage scheme.

There are some good explanations on the web, such as the
https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
though you may want to scan from the beginning ;-)

--
 Philip 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: what is a snapshot?
  2016-06-19 15:20   ` Philip Oakley
@ 2016-06-24 15:36     ` Jeff King
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff King @ 2016-06-24 15:36 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Ovatta Bianca, git

On Sun, Jun 19, 2016 at 04:20:14PM +0100, Philip Oakley wrote:

> From: "Ovatta Bianca" <ovattabianca@gmail.com>
> > I read in every comparison of git vs other version control systems,
> > that git does not record differences but takes "snapshots"
> > I was wondering what a "snapshot" is ? Does git store at every commit
> > the entire files which have been modified even if only a few bytes
> > were changed out of a large file?
> > 
> A snaphot is like a tar or zip of all your tracked files. This means it is
> easier to determine (compared to lots of diffs) the current content.
> 
> Keeping all the snapshots as separate loose items, when the majority of
> their content is unchanged would be very inefficient, so git then uses, at
> the right time, an efficient (and obviously lossless) compression technique
> to 'zip' all the snapshots together so that the final repository is
> 'packed'. The overall effect is a very efficient storage scheme.
> 
> There are some good explanations on the web, such as the
> https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
> though you may want to scan from the beginning ;-)

I think the delta compression is only half the story.

Each commit is a snapshot in that it points to the sha1 of the root
tree, which points to the sha1 of other trees and blobs. And following
that chain gives you the whole state of the tree, without having to care
about other commits.

And if the next commit changes only a few files, the sha1 for all the
other files will remain unchanged, and git does not need to store them
again. So already, before any explicit compression has happened, we get
de-duplication of identical content from commit to commit, at the file
and tree level.

And then when a file does change, we store the whole new version, then
delta compress it during "git gc", etc, as you describe.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: what is a snapshot?
  2016-06-19 14:15 ` Fwd: what is a snapshot? Ovatta Bianca
  2016-06-19 15:20   ` Philip Oakley
@ 2016-06-29 19:28   ` Jakub Narębski
  1 sibling, 0 replies; 4+ messages in thread
From: Jakub Narębski @ 2016-06-29 19:28 UTC (permalink / raw)
  To: Ovatta Bianca, git

W dniu 2016-06-19 o 16:15, Ovatta Bianca pisze:

> I read in every comparison of git vs other version control systems,
> that git does not record differences but takes "snapshots"
> I was wondering what a "snapshot" is ? Does git store at every commit
> the entire files which have been modified even if only a few bytes
> were changed out of a large file?

There are two things: the conceptual level, and actual storage. On the
conceptual level, object representing revisions (commit) refer to
object representing top directory (tree) of a project, that is a snapshot
of a project state at given revision.

On the storage level, Git has two types of object storage. In "loose"
format (used for new objects), each object is stored as a separate
file. This is not as wasteful as you think: first, there is deduplication,
that is each version of a file is stored only once. Second, contents
(usually text) is stored compressed.

In "packed" format (nowadays Git automatically repacks from "loose"
to "packed" when it looks like it is needed) there is additional
libxdiff-like deltafication. In this format Git stores differences
(well, it also ensures that delta chain doesn't gets too long).

HTH
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-06-29 19:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHWPVgNrTEw9FmW6K7QucgA74QWsTKfxZGt+mGd099k+O7O+rw@mail.gmail.com>
2016-06-19 14:15 ` Fwd: what is a snapshot? Ovatta Bianca
2016-06-19 15:20   ` Philip Oakley
2016-06-24 15:36     ` Jeff King
2016-06-29 19:28   ` Fwd: " Jakub Narębski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).