git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Matt Mackall <mpm@selenic.com>
Cc: mercurial@selenic.com, git@vger.kernel.org,
	Junio C Hamano <junkio@cox.net>
Subject: Re: newbie questions about git design and features (some wrt hg)
Date: Thu, 1 Feb 2007 00:58:42 +0100	[thread overview]
Message-ID: <200702010058.43431.jnareb@gmail.com> (raw)
In-Reply-To: <20070131222507.GO10108@waste.org>

Matt Mackall wrote:
> On Wed, Jan 31, 2007 at 11:56:01AM +0100, Jakub Narebski wrote:
>> Theodore Tso wrote:
>> 
>>> On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
>>>> I think hg modifies files as it goes, which could cause some issues
>>>> when a writer is aborted.  I'm sure they have thought about the
>>>> problem and tried to make it safe, but there isn't anything safer
>>>> than just leaving the damn thing alone.  :)
>>> 
>>> To be fair hg modifies files using O_APPEND only.  That isn't quite
>>> as safe as "only creating new files", but it is relatively safe.
>> 
>>>From (libc.info):
>> 
>>  -- Macro: int O_APPEND
[...] 
>> I don't quote understand how that would help hg (Mercurial) to have
>> operations like commit, pull/fetch or push atomic, i.e. all or
>> nothing. 
> 
> That's because it's unrelated.
[...]
> Mercurial has write-side locks so there can only ever be one writer at
> a time. There are no locks needed on the read side, so there can be
> any number of readers, even while commits are happening.
> 
>> What happens if operation is interrupted (e.g. lost connection to
>> network during fetch)?
> 
> We keep a simple transaction journal. As Mercurial revlogs are
> append-only, rolling back a transaction just means truncating all
> files in a transaction to their original length.

Thanks a lot for complete answer. So Mercurial uses write-side locks
for dealing with concurrent operations, and transaction journal for
dealing with interrupted operations. I guess that incomplete transactions
are rolled back on next hg command...

I guess (please correct me if I'm wrong) that git uses "put reference
after putting data" scheme, and write-side lock in few places when it
is needed.
 
>> In git both situations result in some prune-able and fsck-visible crud in
>> repository, but repository stays uncorrupted, and all operations are atomic
>> (all or nothing).
> 
> If a Mercurial transaction is interrupted and not rolled back, the
> result is prune-able and fsck-visible crud. But this doesn't happen
> much in practice.
> 
> The claim that's been made is that a) truncate is unsafe because Linux
> has historically had problems in this area and b) git is safer because
> it doesn't do this sort of thing. 
> 
> My response is a) those problems are overstated and Linux has never
> had difficulty with the sorts of straightforward single writer
> operations Mercurial uses and b) normal git usage involves regular
> rewrites of data with packing operations that makes its exposure to
> filesystem bugs equivalent or greater.

Rewrites in git perhaps are (or should be) regular, but need not be often.
And with new idea/feature of kept packs rewrite need not be of full data.

One command which _is_ (a bit) unsafe in git is git-prune. I'm not sure
if it could be made safe. But not doing prune affects only a bit
repository size (where git is best I think of all SCMs) and not performance.

On the other hand hg repository structure (namely log like append changelog
/ revlog to store commits) makes it I think hard to have multiple persistent
branches.

Sidenote 1: it looks like git is optimized for speed of merge and checkout
(branch switching, or going to given point in history for bisect), and
probably accidentally for multi-branch repos, while Mercurial is optimized
for speed of commit and patch.

Sidenote 2: Mercurial repository structure might make it use "file-ids"
(perhaps implicitely), with all the disadvantages (different renames
on different branches) of those.

> In either case, both provide strong integrity checks with recursive
> SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
> "back-up"!) so this is largely a non-issue relative to traditional
> systems.

Integrity checks can tell you that repository is corrupted, but it would
be better if it didn't get corrupted in first place.

Besides: zlib CRC for Mercurial? I thought that hg didn't compress the
data, only delta chain store it?
-- 
Jakub Narebski
Poland

  reply	other threads:[~2007-01-31 23:57 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
2007-01-30 16:41 ` Johannes Schindelin
2007-01-30 16:55 ` Shawn O. Pearce
2007-01-31  1:55   ` Theodore Tso
2007-01-31 10:56     ` Jakub Narebski
2007-01-31 20:01       ` Junio C Hamano
2007-01-31 22:25       ` Matt Mackall
2007-01-31 23:58         ` Jakub Narebski [this message]
2007-02-01  0:34           ` Matt Mackall
2007-02-01  0:57             ` Jakub Narebski
2007-02-01  7:59               ` Simon 'corecode' Schubert
2007-02-01 10:09                 ` Johannes Schindelin
2007-02-01 10:15                   ` Simon 'corecode' Schubert
2007-02-01 10:49                     ` Johannes Schindelin
2007-02-01 16:28                     ` Linus Torvalds
2007-02-01 19:36                       ` Eric Wong
2007-02-01 21:13                         ` Linus Torvalds
2007-02-02  9:55             ` Jakub Narebski
2007-02-02 13:51               ` Simon 'corecode' Schubert
2007-02-02 14:23                 ` Jakub Narebski
2007-02-02 15:02                   ` Shawn O. Pearce
2007-02-02 15:38               ` Mark Wooding
2007-02-02 16:09                 ` Jakub Narebski
2007-02-02 16:42                   ` Linus Torvalds
2007-02-02 16:59                     ` Jakub Narebski
2007-02-02 17:11                       ` Linus Torvalds
2007-02-02 17:59                     ` Brendan Cully
2007-02-02 18:19                       ` Jakub Narebski
2007-02-02 19:28                         ` Brendan Cully
2007-02-02 18:27                       ` Giorgos Keramidas
2007-02-02 19:01                         ` Linus Torvalds
2007-02-03 21:20                           ` Giorgos Keramidas
2007-02-03 21:37                             ` Matthias Kestenholz
2007-02-03 21:41                             ` Linus Torvalds
2007-02-03 21:45                             ` Jakub Narebski
2007-02-02 18:32                       ` Linus Torvalds
2007-02-02 19:26                         ` Brendan Cully
2007-02-02 19:42                           ` Linus Torvalds
2007-02-02 19:55                             ` Brendan Cully
2007-02-02 20:15                               ` Jakub Narebski
2007-02-02 20:21                               ` Linus Torvalds
2007-02-02 16:03               ` Matt Mackall
2007-02-02 17:18                 ` Jakub Narebski
2007-02-02 17:37                   ` Matt Mackall
2007-02-02 18:44                     ` Jakub Narebski
2007-02-02 19:56                       ` Jakub Narebski
2007-02-03 20:06                         ` Brendan Cully
2007-02-03 20:55                           ` Jakub Narebski
2007-02-03 21:00                             ` Jakub Narebski
2007-01-30 17:44 ` Jakub Narebski
2007-01-30 18:06 ` Linus Torvalds
2007-01-30 19:37   ` Linus Torvalds
2007-01-30 18:11 ` Junio C Hamano
2007-01-31  3:38   ` Mike Coleman
2007-01-31  4:35     ` Linus Torvalds
2007-01-31  4:57       ` Junio C Hamano
2007-01-31 16:22         ` Linus Torvalds
2007-01-31 16:41           ` Johannes Schindelin
2007-01-31  7:11       ` Mike Coleman
2007-01-31 15:03     ` Nicolas Pitre
2007-01-31 16:58       ` Mike Coleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200702010058.43431.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=mercurial@selenic.com \
    --cc=mpm@selenic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).