git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christoph Groth <christoph@grothesque.org>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org
Subject: Re: Stat cache in .git/index hinders syncing of repositories
Date: Tue, 21 Jan 2020 00:53:22 +0100	[thread overview]
Message-ID: <87ftg9n0r1.fsf@drac> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.2001201248480.46@tvgsbejvaqbjf.bet> (Johannes Schindelin's message of "Mon, 20 Jan 2020 13:01:54 +0100 (CET)")

[-- Attachment #1: Type: text/plain, Size: 3127 bytes --]

Johannes Schindelin wrote:
>
> On Sat, 18 Jan 2020, Christoph Groth wrote:
>
> > OK, I see.  But please consider (one day) to split up the index file
> > to separate the local stat cache from the globally valid data.
>
> I am sure that this has been considered even before Git was publicly
> announced,

I would be very interested to hear the rationale for keeping the
information about what is staged and the stat cache together in the same
file.  I, or someone else, might actually work on a patch one day, but
before starting, it would be good to understand the reasoning behind the
current design.

> and I would wager a guess that it was determined that it would be
> better to keep all of Git's private data in one place.

My point is that it’s not just private data: When I excluded .git/index
from synchronization, staging files for a commit was no longer
synchronized.

> > (By the way, even after 12 years of using Git intensely I am
> > confused about what actually is the index.  I believed that it is
> > the "staging area", like in "git-add - Add file contents to the
> > index".  But then the .git/index file reflects all the tracked
> > files, and not just staged ones.  This usage is also reflected by
> > the command "git update-index".)
>
> The concept of the Git index is slightly different from what is
> actually stored inside `.git/index`. You should consider the latter to
> be an implementation detail that is of concern only if you want to
> work on internals. Otherwise the description of the index as a staging
> area is a pretty good image.

To me, it does not seem to be a mere implementation detail.  For example
the command ’git update-index --refresh’ is part of the "public API" and
its action is to update the stat cache.  It does not modify what is
staged or not.

> > Still, this is a workaround, and the price is reduced robustness of
> > file modification detection.
>
> You misunderstand how Git detects whether a file is modified or not.
>
> A file is re-hashed if its mtime is newer than, _or equal to_, the
> mtime of `.git/index`.

You must mean "the mtime in ’.git/index’", but OK, I see.  Makes sense
of course.  So setting core.trustctime to false and core.checkstat to
minimal only means that some avoidable rehashings may be made.  But this
would require two modifications of a file in the same second, without
a change to the file size.

> In general, I am not sure that you are using the right tool for
> synchronizing. If you cannot guarantee that a snapshot of the
> directory is copied, you will always run the risk of inconsistent
> data, which is worse than not having a backup at all: at least without
> a backup you do not have a false sense of security.

I do not understand what makes you think so.

Unison is very robust software, I never had any problems with it and
never heard of anyone having any.  Moreover, as I noted in the opening
message of this thread, it recently gained an option to treat chosen
directories as atomic.  I’m using this for ".git" subdirectories.

Christoph

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2020-01-20 23:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-17 23:57 Stat cache in .git/index hinders syncing of repositories Christoph Groth
2020-01-18 18:15 ` Junio C Hamano
2020-01-18 19:06   ` Christoph Groth
2020-01-18 19:42     ` brian m. carlson
2020-01-18 22:04       ` Christoph Groth
2020-01-20 12:01         ` Johannes Schindelin
2020-01-20 23:53           ` Christoph Groth [this message]
2020-01-21  2:53             ` brian m. carlson
2020-01-24  9:16               ` Christoph Groth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ftg9n0r1.fsf@drac \
    --to=christoph@grothesque.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).