git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Mike Coleman <tutufan@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: newbie questions about git design and features (some wrt hg)
Date: Tue, 30 Jan 2007 11:55:48 -0500	[thread overview]
Message-ID: <20070130165548.GF25950@spearce.org> (raw)
In-Reply-To: <3c6c07c20701300820l42cfc8dbsb80393fc1469f667@mail.gmail.com>

Mike Coleman <tutufan@gmail.com> wrote:
> 1.  As of today, is there any real safety concern with either tool's
> repo format?  Is either tool significantly better in this regard?
> (Keith Packard's post hints at a problem here, but doesn't really make
> the case.)

I think the Git format is tighter in terms of compression,
and simpler in terms of understanding and writing code.  I have
personally written the code to read and write the Git repository
format in both C and Java, and in both cases it falls out in just
a few hundred lines of code (assuming you have libz handy to do
the compression/decompression for you).

The Git format is completely safe with regards to parallel
modification of a repository, which is good for shared repositories
that might have multiple people pushing into it at once.

Git's format is also safe with regards to *any* update.
You literally cannot destroy the repository during an update.
Its impossible.  You'd have to physically destroy the storage device.
(OK, that's overstating it a bit, but it is really hard.)

The point Keith was making was the Git format is "add-only".
Once something has been stored, we NEVER modify it again.  This
bypasses any sort of possible problems that can occur with partial
modifications caused by a process aborting in the middle of a change.

I think hg modifies files as it goes, which could cause some issues
when a writer is aborted.  I'm sure they have thought about the
problem and tried to make it safe, but there isn't anything safer
than just leaving the damn thing alone.  :)
 
> 2.  Does the git packed object format solve the performance problem
> alluded to in posts from a year or two ago?

Yes.  By a huge margin.  Git's *fast*.  Ignore anything from a year
or two ago.
 
> 3.  Someone mentioned that git bisect can work between any two
> commits, not necessarily just one that happens to be an ancestor of
> the other.  This sounds really cool.  Can hg's bisect do this, too?

No clue.
 
> 4.  What is git's index good for?  I find that I like the idea of it,
> but I'm not sure I could justify it's presence to someone else, as
> opposed to having it hidden in the way that hg's dircache (?) is.  Can
> anyone think of a good scenario where it's a pretty obvious benefit?

Its a good way to stage the stuff in your next commit.  By that I
mean you edit some code.  Then you look at what differs between the
index and your working directory.  You decide "this hunk is good, it
passed the tests, I want to commit that, so toss it into the index".
Now that hunk isn't different anymore.

When it comes time to commit, all of your already reviewed stuff is
staged in the index.  You just need to issue a commit and supply
the message.  But you can leave modified stuff in the working
directory, even for files that were alerady updated in the index.

This really helps during a merge.  Only the stuff which Git could
not merge for you is seen as different between the index and the
working directory; all of the stuff that Git merged for you is
already staged in the index.  So you can focus on the conflicts,
and stage their resolutions into the index as you go.  This makes
it easier to work through larger merges where more than 1 or 2
files contains conflicts.

> 5.  I think I read that there'd been just one incompatible change over
> time in the git repo format.  What was it?

A LONG time ago, like in the very first version Linus offered out
to the public, we computed the identity of an object using the
SHA-1 hash of the *compressed* data.  This is sensitive to the
compression settings used, and was not the best idea as a result.

It was very quickly changed to compute the identity of the object
using the SHA-1 has of the raw (user) data, removing any dependence
on the compression routine to always yield the same result for the
same input.

We haven't had a change since then.  We have added some new
compression options which are just that, options.  If you use them
older Git binaries won't necessarily recognize the repository data,
but these are off by default and can be enabled on a per-repository
basis.  E.g. if you are only using newer Git on a given system you
can enable the newer compression features on all of the repositories
on that system.
 
> 6.  Does either tool use hard links?  This matters to me because I do
> development on a connected machine and a disconnected machine, using a
> usb drive to rsync between.  (Perhaps there'll be some way to transfer
> changes using git or hg instead of rsync, but I haven't figured that
> out yet.)

Git can use hardlinks if you ask it to.  We only use them for the
repository files, not for the user's actual source files.

Git has its own native transport (git-push, git-fetch) which can
move data between two Git repositories via local filesystem access,
SSH, HTTP, FTP, and rsync (latter two are read-only transports).
 
> 7.  I'm a fan of Python, and I'm really a fan of using high-level
> languages with performance-critical parts in a lower-level language,
> so in that regard, I really like hg's implementation.  If someone
> wanted to do it, is a Python clone of git conceivable?  Is there
> something about it that just requires C?

Yes, a Python clone of Git is conceivable.  Indeed, there is a
pure Java clone in process (jgit) for an Eclipse plugin (egit).
If you wanted to rewrite Git in Python, knock yourself out.
But we've ported all of our Python to C, as its just faster.
 
> 8.  It feels like hg is not really comfortable with parallel
> development over time on different heads within a single repo.
> Rather, it seems that multiple repos are supposed to be used for this.
> Does this lead to any problems?  For example, is it harder or
> different to merge two heads if they're in different repo than if
> they're in the same repo?

No clue.  I know multiple heads in one Git repository works
*awesome*.  Especially on large repositories (>10k files) as the time
required to start a new branch is only the time needed to update the
files in the working directory which don't have the correct version.
Usually that's a small percentage (<200) of the files and thus its
very fast to switch to a new branch of development, and switch back.

On a decent UNIX system (and my Mac OS X PowerBook doesn't really
count) flipping branches in git-gui is almost immediate.  You pick
the branch in the menu and *wham* its switched.  And that's my
PowerBook, which as I said, doesn't quite count as good UNIX
system...

-- 
Shawn.

  parent reply	other threads:[~2007-01-30 16:55 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
2007-01-30 16:41 ` Johannes Schindelin
2007-01-30 16:55 ` Shawn O. Pearce [this message]
2007-01-31  1:55   ` Theodore Tso
2007-01-31 10:56     ` Jakub Narebski
2007-01-31 20:01       ` Junio C Hamano
2007-01-31 22:25       ` Matt Mackall
2007-01-31 23:58         ` Jakub Narebski
2007-02-01  0:34           ` Matt Mackall
2007-02-01  0:57             ` Jakub Narebski
2007-02-01  7:59               ` Simon 'corecode' Schubert
2007-02-01 10:09                 ` Johannes Schindelin
2007-02-01 10:15                   ` Simon 'corecode' Schubert
2007-02-01 10:49                     ` Johannes Schindelin
2007-02-01 16:28                     ` Linus Torvalds
2007-02-01 19:36                       ` Eric Wong
2007-02-01 21:13                         ` Linus Torvalds
2007-02-02  9:55             ` Jakub Narebski
2007-02-02 13:51               ` Simon 'corecode' Schubert
2007-02-02 14:23                 ` Jakub Narebski
2007-02-02 15:02                   ` Shawn O. Pearce
2007-02-02 15:38               ` Mark Wooding
2007-02-02 16:09                 ` Jakub Narebski
2007-02-02 16:42                   ` Linus Torvalds
2007-02-02 16:59                     ` Jakub Narebski
2007-02-02 17:11                       ` Linus Torvalds
2007-02-02 17:59                     ` Brendan Cully
2007-02-02 18:19                       ` Jakub Narebski
2007-02-02 19:28                         ` Brendan Cully
2007-02-02 18:27                       ` Giorgos Keramidas
2007-02-02 19:01                         ` Linus Torvalds
2007-02-03 21:20                           ` Giorgos Keramidas
2007-02-03 21:37                             ` Matthias Kestenholz
2007-02-03 21:41                             ` Linus Torvalds
2007-02-03 21:45                             ` Jakub Narebski
2007-02-02 18:32                       ` Linus Torvalds
2007-02-02 19:26                         ` Brendan Cully
2007-02-02 19:42                           ` Linus Torvalds
2007-02-02 19:55                             ` Brendan Cully
2007-02-02 20:15                               ` Jakub Narebski
2007-02-02 20:21                               ` Linus Torvalds
2007-02-02 16:03               ` Matt Mackall
2007-02-02 17:18                 ` Jakub Narebski
2007-02-02 17:37                   ` Matt Mackall
2007-02-02 18:44                     ` Jakub Narebski
2007-02-02 19:56                       ` Jakub Narebski
2007-02-03 20:06                         ` Brendan Cully
2007-02-03 20:55                           ` Jakub Narebski
2007-02-03 21:00                             ` Jakub Narebski
2007-01-30 17:44 ` Jakub Narebski
2007-01-30 18:06 ` Linus Torvalds
2007-01-30 19:37   ` Linus Torvalds
2007-01-30 18:11 ` Junio C Hamano
2007-01-31  3:38   ` Mike Coleman
2007-01-31  4:35     ` Linus Torvalds
2007-01-31  4:57       ` Junio C Hamano
2007-01-31 16:22         ` Linus Torvalds
2007-01-31 16:41           ` Johannes Schindelin
2007-01-31  7:11       ` Mike Coleman
2007-01-31 15:03     ` Nicolas Pitre
2007-01-31 16:58       ` Mike Coleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070130165548.GF25950@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=tutufan@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).