git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: steve.norman@thomsonreuters.com,
	Git Mailing List <git@vger.kernel.org>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir
Date: Sun, 24 May 2015 04:20:16 -0400	[thread overview]
Message-ID: <20150524082016.GA8718@peff.net> (raw)
In-Reply-To: <CACsJy8BjFM_OecoVU9DV3GmJafatSR2yPt6Xb6dETEpYjc1ODA@mail.gmail.com>

On Sat, May 23, 2015 at 08:19:03AM +0700, Duy Nguyen wrote:

> On Sat, May 23, 2015 at 6:51 AM, Jeff King <peff@peff.net> wrote:
> > The other problem is that I'm not sure stat data is enough to notice
> > when a directory changes. Certainly the mtime should change, but if you
> > have only one-second resolution on your mtimes, we can be fooled.
> 
> mtime may or may not change. I based my untracked-cache series
> entirely on this directory mtime and did some research about it. For
> UNIXy filesystems on UNIXy OSes, mtime should work as you expect. FAT
> on Windows does not (but FAT on Linux does..). NTFS works fine
> according to some MS document. No idea about CIFS. But people often
> just do open operation of a time and this racy is not an issue. It is
> only an issue on the busy server side, and you can make sure you run
> on the right fs+OS.

Even on Linux+ext4, I think there is some raciness.

For instance, the program at the end of this mail will loop forever,
running "stat" on an open directory fd, then enumerating the entries in
the directory.  If we ever get the same stat data as the last iteration
but different contents, then it complains. If you run it alongside
something simple, like touching 10,000 files in the directory, it fails
pretty quickly.

This is because we have no atomicity guarantees with directories. We can
stat() them, and then while we are reading them, somebody else can be
changing the entries. Whether we see the "before" or "after" state
depends on the timing.

I'm not 100% sure this translates into real-world problems for
packfiles. If you notice that an object is missing and re-scan the pack
directory (stat-ing it during the re-scan), then the change that made
the object go missing must have happened _before_ the stat, and would
presumably be reflected in it (modulo mtime resolution issues). But I
haven't thought that hard about it yet, and I have a nagging feeling
that there will be problem cases.

It might be that you could get an "atomic" read of the directory by
doing a stat before and after and making sure they're the same (and if
not, repeating the readdir() calls). But I think that suffers from the
same mtime-resolution problems.

Linux+ext4 _does_ have nanosecond mtimes, which perhaps is enough to
assume that any change will be reflected.

I dunno. I guess the most interesting test would be something closer to
the real world: one process repeatedly making sure the object pointed to
by "master" exists, and another one constantly rewriting "master" and
then repacking the object.

-- >8 --
#include "cache.h"
#include "string-list.h"

static void get_data(const char *path, struct string_list *entries,
		     struct stat_validity *validity)
{
	DIR *dir = opendir(path);
	struct dirent *de;

	stat_validity_update(validity, dirfd(dir));
	while ((de = readdir(dir)))
		string_list_insert(entries, de->d_name);
	closedir(dir);
}

static int list_eq(const struct string_list *a,
		    const struct string_list *b)
{
	int i;

	if (a->nr != b->nr)
		return 0;
	for (i = 0; i < a->nr; i++)
		if (strcmp(a->items[i].string, b->items[i].string))
			return 0;
	return 1;
}

static void monitor(const char *path)
{
	struct string_list last_entries = STRING_LIST_INIT_DUP;
	struct stat_validity last_validity;

	get_data(path, &last_entries, &last_validity);
	while (1) {
		struct string_list cur_entries = STRING_LIST_INIT_DUP;
		struct stat_validity cur_validity;

		get_data(path, &cur_entries, &cur_validity);
		if (!memcmp(&last_validity, &cur_validity, sizeof(last_validity)) &&
		    !list_eq(&cur_entries, &last_entries))
			die("mismatch");

		string_list_clear(&last_entries, 0);
		memcpy(&last_entries, &cur_entries, sizeof(last_entries));
		stat_validity_clear(&last_validity);
		memcpy(&last_validity, &cur_validity, sizeof(last_validity));
	}
}

int main(int argc, const char **argv)
{
	monitor(*++argv);
	return 0;
}

  parent reply	other threads:[~2015-05-24  8:20 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 13:13 Troubleshoot clone issue to NFS steve.norman
2015-05-21 14:00 ` Christian Couder
2015-05-21 14:31 ` Duy Nguyen
2015-05-21 14:38   ` Duy Nguyen
2015-05-21 15:53     ` steve.norman
2015-05-22  0:16       ` Duy Nguyen
2015-05-22  7:12         ` Jeff King
2015-05-22  8:35           ` steve.norman
2015-05-22 10:05             ` Duy Nguyen
2015-05-22 14:37               ` Junio C Hamano
2015-05-22 15:02               ` steve.norman
2015-05-22 23:51                 ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Jeff King
2015-05-22 23:51                   ` [PATCH 1/3] stat_validity: handle non-regular files Jeff King
2015-05-23 11:00                     ` Michael Haggerty
2015-05-24  8:29                       ` Jeff King
2015-05-22 23:52                   ` [PATCH 2/3] cache.h: move stat_validity definition up Jeff King
2015-05-22 23:54                   ` [PATCH 3/3] prepare_packed_git: use stat_validity to avoid re-reading packs Jeff King
2015-05-23  1:19                   ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Duy Nguyen
2015-05-23  1:21                     ` Duy Nguyen
2015-05-24  8:20                     ` Jeff King [this message]
2015-05-24  9:00           ` Troubleshoot clone issue to NFS Duy Nguyen
2015-06-05 12:01             ` steve.norman
2015-06-05 12:18               ` Jeff King
2015-06-05 12:29                 ` [PATCH] index-pack: avoid excessive re-reading of pack directory Jeff King
2015-06-09 17:24                   ` Jeff King
2015-06-09 17:41                     ` Jeff King
2015-06-10  3:46                   ` Shawn Pearce
2015-06-10 14:00                     ` Jeff King
2015-06-10 14:36                       ` Duy Nguyen
2015-06-10 21:34                       ` Shawn Pearce
2015-06-05 14:20                 ` Troubleshoot clone issue to NFS steve.norman
2015-06-16 20:50                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150524082016.GA8718@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=steve.norman@thomsonreuters.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).