git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: steve.norman@thomsonreuters.com
Cc: pclouds@gmail.com, git@vger.kernel.org,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir
Date: Fri, 22 May 2015 19:51:16 -0400	[thread overview]
Message-ID: <20150522235116.GA4300@peff.net> (raw)
In-Reply-To: <7FAE15F0A93C0144AD8B5FBD584E1C551975B851@C111KXTEMBX51.ERF.thomson.com>

On Fri, May 22, 2015 at 03:02:45PM +0000, steve.norman@thomsonreuters.com wrote:

> On Friday, May 22, 2015 @ 11:06 AM Duy Nguyen did write:
> 
> > Strange. Maybe there is something else... Anyway some numbers from me.
> > This is nfs3 hosted by Raspberry Pi, accessed over wireless. I just
> > run index-pack on git.git pack instead of full clone.
> > 
> >  - v1.8.4.1 34s
> >  - v1.8.4.2 519s (oh.. wireless..)
> >  - v1.8.4.2 without has_sha1_file() in index-pack.c 33s
> >  - v1.8.4.2 + Jeff's mtime patch 36s
> 
> Just ran the test again and it was 12 seconds.   Too many versions of git available on the 
> machine and I think I might have run the wrong one:

Thanks for re-checking. Here's a series that fixes the rough edges of
the patch I sent earlier. I'd appreciate it if you can re-confirm that
it improves things for you.

  [1/3]: stat_validity: handle non-regular files
  [2/3]: cache.h: move stat_validity definition up
  [3/3]: prepare_packed_git: use stat_validity to avoid re-reading packs

I'm adding Michael to the cc, as I'm abusing the stat_validity code
which we worked on in 2013.

My big concern here is that using stat_validity with a directory is
racy. It works for a regular file like packed-refs because that file is
replaced atomically.  We fill the validity using fstat() on the same
descriptor we read the data from, and nobody modifies an already-written
file. So we know it matches what we read.

For a directory, I don't think that atomicity guarantee is there.
Somebody can modify the direct while we're reading. For the most part, I
think that is OK. We fstat() before reading any entries, so our fstat
data will then become out of date, and we'll re-read next time. It would
be a problem if opendir() looked at the entries at all (e.g., if it
called getdents() under Linux before our first readdir() call, then our
fstat is happening after the first read, and wouldn't match what we
read). I don't have any reason to believe that any libc does that, but
it is making an assumption about how opendir() is implemented.

The other problem is that I'm not sure stat data is enough to notice
when a directory changes. Certainly the mtime should change, but if you
have only one-second resolution on your mtimes, we can be fooled. For a
regular file that is replaced atomically, the inode will change (and
probably the size, too). But I don't know if that is the case for a
directory. Can writing an entry go undetected between two stat() calls
(or something even more pathological, like deleting one entry and
writing another one)?

-Peff

  reply	other threads:[~2015-05-22 23:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 13:13 Troubleshoot clone issue to NFS steve.norman
2015-05-21 14:00 ` Christian Couder
2015-05-21 14:31 ` Duy Nguyen
2015-05-21 14:38   ` Duy Nguyen
2015-05-21 15:53     ` steve.norman
2015-05-22  0:16       ` Duy Nguyen
2015-05-22  7:12         ` Jeff King
2015-05-22  8:35           ` steve.norman
2015-05-22 10:05             ` Duy Nguyen
2015-05-22 14:37               ` Junio C Hamano
2015-05-22 15:02               ` steve.norman
2015-05-22 23:51                 ` Jeff King [this message]
2015-05-22 23:51                   ` [PATCH 1/3] stat_validity: handle non-regular files Jeff King
2015-05-23 11:00                     ` Michael Haggerty
2015-05-24  8:29                       ` Jeff King
2015-05-22 23:52                   ` [PATCH 2/3] cache.h: move stat_validity definition up Jeff King
2015-05-22 23:54                   ` [PATCH 3/3] prepare_packed_git: use stat_validity to avoid re-reading packs Jeff King
2015-05-23  1:19                   ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Duy Nguyen
2015-05-23  1:21                     ` Duy Nguyen
2015-05-24  8:20                     ` Jeff King
2015-05-24  9:00           ` Troubleshoot clone issue to NFS Duy Nguyen
2015-06-05 12:01             ` steve.norman
2015-06-05 12:18               ` Jeff King
2015-06-05 12:29                 ` [PATCH] index-pack: avoid excessive re-reading of pack directory Jeff King
2015-06-09 17:24                   ` Jeff King
2015-06-09 17:41                     ` Jeff King
2015-06-10  3:46                   ` Shawn Pearce
2015-06-10 14:00                     ` Jeff King
2015-06-10 14:36                       ` Duy Nguyen
2015-06-10 21:34                       ` Shawn Pearce
2015-06-05 14:20                 ` Troubleshoot clone issue to NFS steve.norman
2015-06-16 20:50                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150522235116.GA4300@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=steve.norman@thomsonreuters.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).