git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>,
	Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: reducing prune sync()s
Date: Fri, 30 May 2008 08:57:21 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.1.10.0805300844310.3141@woody.linux-foundation.org> (raw)
In-Reply-To: <20080530152527.GF4032@redhat.com>



On Fri, 30 May 2008, Frank Ch. Eigler wrote:
> 
> On Thu, May 29, 2008 at 05:27:35PM -0700, Linus Torvalds wrote:
> > [...]
> > >	  Or perhaps having the blanket sync be replaced a
> > > list of fsync()s for only the relevant git repository files?
> > [...]
> > Soemthing like this *may* work. THIS IS TOTALLY UNTESTED. And when I say 
> > "TOTALLY UNTESTED", I mean it. Zero testing. None. Nada. Zilch. Testing is 
> > for people who are actually interested in the feature (hint, hint).
> 
> The patch does add an fsync or two into the mix, a "git gc" or 
> "git repack -a" still goes through the "git-repack" shell script, which
> still did its "sync".

Yes.

But I actually think there is a simpler and more straightforward approach.

Instead of being careful when removing objects (whether old packs or loose 
objects that are made redundant by a new pack), the simpler approach is to 
just always fsync() the new pack when creating it.

I was always very careful to *not* make git depend on any serialized IO, 
but the reason for that was literally the fact that I wanted to make sure 
that I could batch up things efficiently, and do any serialization (if I 
wanted to) later. So it was literally always about the whole "apply 
several hundred patches in one go" kind of thing.

And the thing is, the repacking phase *is* the "serialize things later (if 
you want)" thing, so doing things synchronously at that point is actually 
perfectly fine.

And every single "let's remove objects" operation is literally always 
about the fact that we have a new better pack-file, making old objects 
redundant, so if we just create those new pack-files stably on disk, then 
any subsequent action pretty much by definition doesn't need any sync. 
Because we know that the only thing we can really care about *is* stable.

So this is a conceptually much more direct approach. Creating pack-files 
really is the special occasion, since it's (a) literally the event that 
causes other objects to potentially be stale (b) fairly rare and (c) not 
normally limited by disk-IO anyway (ie a "git fetch" will create a new 
pack-file, but it's normally limited by the network overhead or the cost 
of creating the pack-file, not by adding a fsync() to make sure that the 
end result is stable).

So I'll follow up with a two-patch series (the first to create pack-files 
and their indexes stably on disk, the second to just remove the now 
unnecessary 'sync()' calls). I'll give it *some* basic testing first, 
though.

		Linus

  reply	other threads:[~2008-05-30 15:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-29 20:57 reducing prune sync()s Frank Ch. Eigler
2008-05-30  0:27 ` Linus Torvalds
2008-05-30  0:32   ` Linus Torvalds
2008-05-30  1:50     ` Frank Ch. Eigler
2008-05-30 20:07     ` Florian Weimer
2008-05-30  1:51   ` David Dillow
2008-05-30  2:17     ` Linus Torvalds
2008-05-30  2:30       ` Linus Torvalds
2008-05-30 15:25   ` Frank Ch. Eigler
2008-05-30 15:57     ` Linus Torvalds [this message]
2008-05-30 16:08       ` [PATCH 1/2] Make pack creation always fsync() the result Linus Torvalds
2008-05-30 16:11         ` [PATCH 2/2] Remove now unnecessary 'sync()' calls Linus Torvalds
2008-05-30 20:27         ` [PATCH 1/2] Make pack creation always fsync() the result Nicolas Pitre
2008-05-31 14:19         ` Frank Ch. Eigler
2008-06-02 22:23           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.10.0805300844310.3141@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=fche@redhat.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).