From: Jakub Narebski <jnareb@gmail.com>
To: Matt Mackall <mpm@selenic.com>
Cc: mercurial@selenic.com, git@vger.kernel.org,
Junio C Hamano <junkio@cox.net>
Subject: Re: newbie questions about git design and features (some wrt hg)
Date: Thu, 1 Feb 2007 00:58:42 +0100 [thread overview]
Message-ID: <200702010058.43431.jnareb@gmail.com> (raw)
In-Reply-To: <20070131222507.GO10108@waste.org>
Matt Mackall wrote:
> On Wed, Jan 31, 2007 at 11:56:01AM +0100, Jakub Narebski wrote:
>> Theodore Tso wrote:
>>
>>> On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
>>>> I think hg modifies files as it goes, which could cause some issues
>>>> when a writer is aborted. I'm sure they have thought about the
>>>> problem and tried to make it safe, but there isn't anything safer
>>>> than just leaving the damn thing alone. :)
>>>
>>> To be fair hg modifies files using O_APPEND only. That isn't quite
>>> as safe as "only creating new files", but it is relatively safe.
>>
>>>From (libc.info):
>>
>> -- Macro: int O_APPEND
[...]
>> I don't quote understand how that would help hg (Mercurial) to have
>> operations like commit, pull/fetch or push atomic, i.e. all or
>> nothing.
>
> That's because it's unrelated.
[...]
> Mercurial has write-side locks so there can only ever be one writer at
> a time. There are no locks needed on the read side, so there can be
> any number of readers, even while commits are happening.
>
>> What happens if operation is interrupted (e.g. lost connection to
>> network during fetch)?
>
> We keep a simple transaction journal. As Mercurial revlogs are
> append-only, rolling back a transaction just means truncating all
> files in a transaction to their original length.
Thanks a lot for complete answer. So Mercurial uses write-side locks
for dealing with concurrent operations, and transaction journal for
dealing with interrupted operations. I guess that incomplete transactions
are rolled back on next hg command...
I guess (please correct me if I'm wrong) that git uses "put reference
after putting data" scheme, and write-side lock in few places when it
is needed.
>> In git both situations result in some prune-able and fsck-visible crud in
>> repository, but repository stays uncorrupted, and all operations are atomic
>> (all or nothing).
>
> If a Mercurial transaction is interrupted and not rolled back, the
> result is prune-able and fsck-visible crud. But this doesn't happen
> much in practice.
>
> The claim that's been made is that a) truncate is unsafe because Linux
> has historically had problems in this area and b) git is safer because
> it doesn't do this sort of thing.
>
> My response is a) those problems are overstated and Linux has never
> had difficulty with the sorts of straightforward single writer
> operations Mercurial uses and b) normal git usage involves regular
> rewrites of data with packing operations that makes its exposure to
> filesystem bugs equivalent or greater.
Rewrites in git perhaps are (or should be) regular, but need not be often.
And with new idea/feature of kept packs rewrite need not be of full data.
One command which _is_ (a bit) unsafe in git is git-prune. I'm not sure
if it could be made safe. But not doing prune affects only a bit
repository size (where git is best I think of all SCMs) and not performance.
On the other hand hg repository structure (namely log like append changelog
/ revlog to store commits) makes it I think hard to have multiple persistent
branches.
Sidenote 1: it looks like git is optimized for speed of merge and checkout
(branch switching, or going to given point in history for bisect), and
probably accidentally for multi-branch repos, while Mercurial is optimized
for speed of commit and patch.
Sidenote 2: Mercurial repository structure might make it use "file-ids"
(perhaps implicitely), with all the disadvantages (different renames
on different branches) of those.
> In either case, both provide strong integrity checks with recursive
> SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
> "back-up"!) so this is largely a non-issue relative to traditional
> systems.
Integrity checks can tell you that repository is corrupted, but it would
be better if it didn't get corrupted in first place.
Besides: zlib CRC for Mercurial? I thought that hg didn't compress the
data, only delta chain store it?
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2007-01-31 23:57 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
2007-01-30 16:41 ` Johannes Schindelin
2007-01-30 16:55 ` Shawn O. Pearce
2007-01-31 1:55 ` Theodore Tso
2007-01-31 10:56 ` Jakub Narebski
2007-01-31 20:01 ` Junio C Hamano
2007-01-31 22:25 ` Matt Mackall
2007-01-31 23:58 ` Jakub Narebski [this message]
2007-02-01 0:34 ` Matt Mackall
2007-02-01 0:57 ` Jakub Narebski
2007-02-01 7:59 ` Simon 'corecode' Schubert
2007-02-01 10:09 ` Johannes Schindelin
2007-02-01 10:15 ` Simon 'corecode' Schubert
2007-02-01 10:49 ` Johannes Schindelin
2007-02-01 16:28 ` Linus Torvalds
2007-02-01 19:36 ` Eric Wong
2007-02-01 21:13 ` Linus Torvalds
2007-02-02 9:55 ` Jakub Narebski
2007-02-02 13:51 ` Simon 'corecode' Schubert
2007-02-02 14:23 ` Jakub Narebski
2007-02-02 15:02 ` Shawn O. Pearce
2007-02-02 15:38 ` Mark Wooding
2007-02-02 16:09 ` Jakub Narebski
2007-02-02 16:42 ` Linus Torvalds
2007-02-02 16:59 ` Jakub Narebski
2007-02-02 17:11 ` Linus Torvalds
2007-02-02 17:59 ` Brendan Cully
2007-02-02 18:19 ` Jakub Narebski
2007-02-02 19:28 ` Brendan Cully
2007-02-02 18:27 ` Giorgos Keramidas
2007-02-02 19:01 ` Linus Torvalds
2007-02-03 21:20 ` Giorgos Keramidas
2007-02-03 21:37 ` Matthias Kestenholz
2007-02-03 21:41 ` Linus Torvalds
2007-02-03 21:45 ` Jakub Narebski
2007-02-02 18:32 ` Linus Torvalds
2007-02-02 19:26 ` Brendan Cully
2007-02-02 19:42 ` Linus Torvalds
2007-02-02 19:55 ` Brendan Cully
2007-02-02 20:15 ` Jakub Narebski
2007-02-02 20:21 ` Linus Torvalds
2007-02-02 16:03 ` Matt Mackall
2007-02-02 17:18 ` Jakub Narebski
2007-02-02 17:37 ` Matt Mackall
2007-02-02 18:44 ` Jakub Narebski
2007-02-02 19:56 ` Jakub Narebski
2007-02-03 20:06 ` Brendan Cully
2007-02-03 20:55 ` Jakub Narebski
2007-02-03 21:00 ` Jakub Narebski
2007-01-30 17:44 ` Jakub Narebski
2007-01-30 18:06 ` Linus Torvalds
2007-01-30 19:37 ` Linus Torvalds
2007-01-30 18:11 ` Junio C Hamano
2007-01-31 3:38 ` Mike Coleman
2007-01-31 4:35 ` Linus Torvalds
2007-01-31 4:57 ` Junio C Hamano
2007-01-31 16:22 ` Linus Torvalds
2007-01-31 16:41 ` Johannes Schindelin
2007-01-31 7:11 ` Mike Coleman
2007-01-31 15:03 ` Nicolas Pitre
2007-01-31 16:58 ` Mike Coleman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200702010058.43431.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=mercurial@selenic.com \
--cc=mpm@selenic.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).