git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* How would Git chapter look like in "The Architecture of Open Source Applications"?
@ 2011-05-28 12:17 Jakub Narebski
  2011-05-30  3:40 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Narebski @ 2011-05-28 12:17 UTC (permalink / raw
  To: git

>From LWN.net (http://lwn.net/Articles/444981/):

  "The Architecture of Open Source Applications"[1] is a new book with
  chapters on the design of a wide variety of programs, including
  Asterisk, bash, Eclipse, LLVM, *Mercurial*, Sendmail, Telepathy,
  and many more. It's available for purchase or downloadable under
  the terms of the CC Attribution 3.0 license; some readers have
  already taken advantage of that license to make an epub[2] version
  available. Revenue from sales go to Amnesty International.

[1]: http://www.aosabook.org/en/
[2]: http://media.dropdo.com.s3.amazonaws.com/2Wo/Architecture%20of%20Open%20Source%20Applications.epub

Among covered programs is Mercurial (chapter by Dirkjan Ochtman)...
but unfortunately no Git (they probably thought that one DVCS is enough).

How would such chapter on Git look like?  Authors of this book
encourage (among others) to write new chapters.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How would Git chapter look like in "The Architecture of Open Source Applications"?
  2011-05-28 12:17 How would Git chapter look like in "The Architecture of Open Source Applications"? Jakub Narebski
@ 2011-05-30  3:40 ` Jeff King
  2011-05-30 10:30   ` Jakub Narebski
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2011-05-30  3:40 UTC (permalink / raw
  To: Jakub Narebski; +Cc: git

On Sat, May 28, 2011 at 02:17:38PM +0200, Jakub Narebski wrote:

> Among covered programs is Mercurial (chapter by Dirkjan Ochtman)...
> but unfortunately no Git (they probably thought that one DVCS is enough).
> 
> How would such chapter on Git look like?  Authors of this book
> encourage (among others) to write new chapters.

I just skimmed the Mercurial chapter, but they do cover a fair bit of
general DVCS architecture. For git, I would guess a good approach would
be to describe the data structures (i.e., content-addressable object
database, DAG of commits, refs storing branches and tags), as everything
else falls out from there. Most of the basic commands can be explained
as "do some simple operation to the history graph or object db" and the
more complex commands are compositions of the simple ones. So the
architecture is really about having a data structure that represents the
problem, exposing it to the user, and then building some niceties around
the basic data structure operations.

Of course that's just my perspective. Linus might have written something
totally different. :)

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How would Git chapter look like in "The Architecture of Open Source Applications"?
  2011-05-30  3:40 ` Jeff King
@ 2011-05-30 10:30   ` Jakub Narebski
  0 siblings, 0 replies; 3+ messages in thread
From: Jakub Narebski @ 2011-05-30 10:30 UTC (permalink / raw
  To: Jeff King; +Cc: git, Linus Torvalds, Junio C Hamano

On Mon, 30 May 2011, Jeff King <peff@peff.org> wrote:
> On Sat, May 28, 2011 at 02:17:38PM +0200, Jakub Narebski wrote:
> 
> > Among covered programs is Mercurial (chapter by Dirkjan Ochtman)...
> > but unfortunately no Git (they probably thought that one DVCS is enough).
> > 
> > How would such chapter on Git look like?  Authors of this book
> > encourage (among others) to write new chapters.
> 
> I just skimmed the Mercurial chapter, but they do cover a fair bit of
> general DVCS architecture. For git, I would guess a good approach would
> be to describe the data structures (i.e., content-addressable object
> database, DAG of commits, refs storing branches and tags), as everything
> else falls out from there. Most of the basic commands can be explained
> as "do some simple operation to the history graph or object db" and the
> more complex commands are compositions of the simple ones. So the
> architecture is really about having a data structure that represents the
> problem, exposing it to the user, and then building some niceties around
> the basic data structure operations.

The repository model that Git uses is quite well described in "Pro Git",
in "Discussion" section of git(1) manpage, in "Git concepts" section of
Git User's Manual and in gitcore-tutorial(7).

What I am more interested in is design *goals*, i.e. what's behind
choosing this and not other architecture.  

The chapter on Mercurial, in '12.2. Data Structures > 12.2.1. Challenges'
subsection says about limiting technology factors (quoting [Mac06]):
 * speed: CPU
 * capacity: disk and memory
 * bandwidth: memory, LAN, disk, and WAN
 * disk seek rate

This was for Mercurial; from what I remember from KernelTrap articles,
which covered beginnings of Git development quite well, and from other
sources, the main limiting factor considered was __speed__.  

Not disk space.  At first Git had only 'loose' format -- do you remember
"disk space is cheap" comment by Linus?  Admittedly Git used zlib
compression from very beginning (which works well for text).  IIRC at
first when _model_ that Git uses for repository was being drafted
LAN/WAN bandwidth wasn't consideration; AFAIK first transport that Git
used was nowadays deprecated rsync:// (UNIX philosophy of prototyping
and developing using existing ready tools, see [TAOUP], [Ben86]).

I think it was assumed that operating system would be good enough that
we don't have to worry about seek rates: Git is optimized for "hot cache"
case.  Note however that adoption of 'packed' format as on-disk format
was driven by speed (disk seek rate) as well as disk capacity i.e. 
reducing repository size.  Well, at least from what I remember.


The Mercurial's '12.2.1. Challenges' subsection continues from:

  The paper [i.e. [Mac06]] goes on to review common scenarios or
  criteria for evaluating the performance of such a system at
  the file level:

    * Storage compression: what kind of compression is best suited
      to save the file history on disk? Effectively, what algorithm
      makes the most out of the I/O performance while preventing
      CPU time from becoming a bottleneck?
    * Retrieving arbitrary file revisions: a number of version control
      systems will store a given revision in such a way that a large
      number of older revisions must be read to reconstruct the newer
      one (using deltas). We want to control this to make sure that
      retrieving old revisions is still fast.
    * Adding file revisions: we regularly add new revisions. We don't
      want to rewrite old revisions every time we add a new one, because
      that would become too slow when there are many revisions.
    * Showing file history: we want to be able to review a history of
      all changesets that touched a certain file. This also allows us
      to do annotations (which used to be called `blame` in CVS but was
      renamed to `annotate` in some later systems to remove the negative
      connotation): reviewing the originating changeset for each line
      currently in a file.

>From what *I* understand Linus approached the problem of DVCS design
from different direction: he is maintainer rather than ordinary developer,
and (from what he said) filesystem designer at heart, and not version
control developer.  Thus the common scenarios or criteria were different:

 * Merging and applying patches
 * Showing _subsystem_ history
 * ???

That is what I am interested in.


Some of Git history, and I think of motivations behind design, can be
found in "Git Chronicle" slides by Junio from GitTogether.

> Of course that's just my perspective. Linus might have written something
> totally different. :)

Well, only Linus can be definitive source of initial *design goals*
(behind core design of Git)...


References:
~~~~~~~~~~~
[Mac06]: Matt Mackall: "Towards a Better SCM: Revlog and Mercurial".
         2006 Ottawa Linux Symposium, 2006.
         http://selenic.com/mercurial/wiki/index.cgi/Presentations?action=AttachFile&do=get&target=ols-mercurial-paper.pdf
         (see also http://mercurial.selenic.com/wiki/Presentations)
[TAOUP]: Eric Raymond: "The Art of Unix Programming", 2003
         http://www.faqs.org/docs/artu/
         http://www.catb.org/~esr/writings/taoup/
[Ben86]: Jon Bentley: "Programming Pearls", chapter about implementing
         and prototyping UNIX 'spell' program (from Polish translation).
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-05-30 10:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-28 12:17 How would Git chapter look like in "The Architecture of Open Source Applications"? Jakub Narebski
2011-05-30  3:40 ` Jeff King
2011-05-30 10:30   ` Jakub Narebski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).