git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: git@vger.kernel.org, trast@student.ethz.ch, mhagger@alum.mit.edu,
	pclouds@gmail.com, robin.rosenberg@dewire.com
Subject: Re: [PATCH/RFC v3 01/13] Move index v2 specific functions to their own file
Date: Thu, 09 Aug 2012 17:13:16 -0700	[thread overview]
Message-ID: <7v7gt7mwo3.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <7vtxwbn2qe.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Thu, 09 Aug 2012 15:02:17 -0700")

Junio C Hamano <gitster@pobox.com> writes:

> If you found that an entry you read halfway has an inconsistent crc,
> and if you suspect that is because somebody else was writing to the
> same index, it is a _sure_ sign that you are not alone, and all the
> entries you read so far to the core, even if they weren't touched by
> that sombody else when you read them, may be stale, and worse yet,
> what you are going to read may be inconsistent with what you've read
> and have in-core (e.g. you may have read "f" before somebody else
> that is racing with you have turned it into a directory, and your
> next read may find "f/d" in the index without crc error).
>
> One sane way to avoid reading such an inconsistent state may be to
> redo this whole function, starting from the part that calls mmap().
> IOW,
>
> 	do {
> 		fd = open()
> 		mmap = xmmap(fd);
> 		close(fd);
>                 verify_various_fields(mmap);
>                 status = istate->ops->read_index(istate, mmap, mmap_size));
> 	} while (status == READ_AGAIN);
>
> I do not think the "pass fd around so that we can redo the mapping
> deep inside the callchain" is either a good idea or necessary.

By the way, you can only detect such inconsistency when you are
lucky enough that you catch the other person in the middle of
writing.

If the index you are looking at holds a large tree with very many
paths, it is possible that there are two large directories, and
after you read all entries from one, the other process starts
modifying the paths in that directory, without you ever finding it
out.  If the goal of the topic is to make the index work better in
projects with large trees, it may be wise to think about locking the
whole thing, so that you do not have to rely on the per-entry crc
and you being lucky to detect such a race.  The per-entry crc, as
far as I understand, may have been introduced primarily to detect
on-disk data corruption; it is not a suitable mechanism to detect
conflicting accesses.

  parent reply	other threads:[~2012-08-10  0:13 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-08 11:17 [PATCH/RFC v3 0/13] Introduce index file format version 5 Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 01/13] Move index v2 specific functions to their own file Thomas Gummerer
2012-08-08 12:04   ` Nguyen Thai Ngoc Duy
2012-08-08 19:21     ` Thomas Gummerer
2012-08-09 22:02   ` Junio C Hamano
2012-08-09 22:54     ` Thomas Gummerer
2012-08-10  0:13     ` Junio C Hamano [this message]
2012-08-10  2:23       ` Nguyen Thai Ngoc Duy
2012-08-10 14:24     ` Thomas Rast
2012-08-10 14:58       ` Junio C Hamano
2012-08-10 15:40         ` Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 02/13] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 03/13] t3700: Avoid interfering with the racy code Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 04/13] Add documentation of the index-v5 file format Thomas Gummerer
2012-08-09 22:41   ` Junio C Hamano
2012-08-09 23:10     ` Thomas Gummerer
2012-08-09 23:13     ` Junio C Hamano
2012-08-08 11:17 ` [PATCH/RFC v3 05/13] Make in-memory format aware of stat_crc Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 06/13] Read index-v5 Thomas Gummerer
2012-08-08 12:05   ` Nguyen Thai Ngoc Duy
2012-08-08 12:18     ` Johannes Sixt
2012-08-08 17:05     ` Junio C Hamano
2012-08-08 19:29     ` Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 07/13] Read resolve-undo data Thomas Gummerer
2012-08-09 22:51   ` Junio C Hamano
2012-08-09 23:23     ` Thomas Gummerer
2012-08-10  0:02       ` Junio C Hamano
2012-08-10  9:27         ` Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 08/13] Read cache-tree in index-v5 Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 09/13] Write index-v5 Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 10/13] Write index-v5 cache-tree data Thomas Gummerer
2012-08-08 11:17 ` [PATCH/RFC v3 11/13] Write resolve-undo data for index-v5 Thomas Gummerer
2012-08-08 11:18 ` [PATCH/RFC v3 12/13] update-index.c: always rewrite the index when index-version is given Thomas Gummerer
2012-08-08 11:18 ` [PATCH/RFC v3 13/13] p0002-index.sh: add perf test for the index formats Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v7gt7mwo3.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=t.gummerer@gmail.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).