From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>,
Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: reducing prune sync()s
Date: Fri, 30 May 2008 08:57:21 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.1.10.0805300844310.3141@woody.linux-foundation.org> (raw)
In-Reply-To: <20080530152527.GF4032@redhat.com>
On Fri, 30 May 2008, Frank Ch. Eigler wrote:
>
> On Thu, May 29, 2008 at 05:27:35PM -0700, Linus Torvalds wrote:
> > [...]
> > > Or perhaps having the blanket sync be replaced a
> > > list of fsync()s for only the relevant git repository files?
> > [...]
> > Soemthing like this *may* work. THIS IS TOTALLY UNTESTED. And when I say
> > "TOTALLY UNTESTED", I mean it. Zero testing. None. Nada. Zilch. Testing is
> > for people who are actually interested in the feature (hint, hint).
>
> The patch does add an fsync or two into the mix, a "git gc" or
> "git repack -a" still goes through the "git-repack" shell script, which
> still did its "sync".
Yes.
But I actually think there is a simpler and more straightforward approach.
Instead of being careful when removing objects (whether old packs or loose
objects that are made redundant by a new pack), the simpler approach is to
just always fsync() the new pack when creating it.
I was always very careful to *not* make git depend on any serialized IO,
but the reason for that was literally the fact that I wanted to make sure
that I could batch up things efficiently, and do any serialization (if I
wanted to) later. So it was literally always about the whole "apply
several hundred patches in one go" kind of thing.
And the thing is, the repacking phase *is* the "serialize things later (if
you want)" thing, so doing things synchronously at that point is actually
perfectly fine.
And every single "let's remove objects" operation is literally always
about the fact that we have a new better pack-file, making old objects
redundant, so if we just create those new pack-files stably on disk, then
any subsequent action pretty much by definition doesn't need any sync.
Because we know that the only thing we can really care about *is* stable.
So this is a conceptually much more direct approach. Creating pack-files
really is the special occasion, since it's (a) literally the event that
causes other objects to potentially be stale (b) fairly rare and (c) not
normally limited by disk-IO anyway (ie a "git fetch" will create a new
pack-file, but it's normally limited by the network overhead or the cost
of creating the pack-file, not by adding a fsync() to make sure that the
end result is stable).
So I'll follow up with a two-patch series (the first to create pack-files
and their indexes stably on disk, the second to just remove the now
unnecessary 'sync()' calls). I'll give it *some* basic testing first,
though.
Linus
next prev parent reply other threads:[~2008-05-30 15:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-29 20:57 reducing prune sync()s Frank Ch. Eigler
2008-05-30 0:27 ` Linus Torvalds
2008-05-30 0:32 ` Linus Torvalds
2008-05-30 1:50 ` Frank Ch. Eigler
2008-05-30 20:07 ` Florian Weimer
2008-05-30 1:51 ` David Dillow
2008-05-30 2:17 ` Linus Torvalds
2008-05-30 2:30 ` Linus Torvalds
2008-05-30 15:25 ` Frank Ch. Eigler
2008-05-30 15:57 ` Linus Torvalds [this message]
2008-05-30 16:08 ` [PATCH 1/2] Make pack creation always fsync() the result Linus Torvalds
2008-05-30 16:11 ` [PATCH 2/2] Remove now unnecessary 'sync()' calls Linus Torvalds
2008-05-30 20:27 ` [PATCH 1/2] Make pack creation always fsync() the result Nicolas Pitre
2008-05-31 14:19 ` Frank Ch. Eigler
2008-06-02 22:23 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.10.0805300844310.3141@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=fche@redhat.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=torvalds@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).