git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Christoph Groth <christoph@grothesque.org>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: Stat cache in .git/index hinders syncing of repositories
Date: Mon, 20 Jan 2020 13:01:54 +0100 (CET)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2001201248480.46@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <87d0bgs9o4.fsf@drac>

[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]

Hi Christoph,

On Sat, 18 Jan 2020, Christoph Groth wrote:

> brian m. carlson wrote:
> > On 2020-01-18 at 19:06:21, Christoph Groth wrote:
> > > But if the above is not feasible for some reason, would it be
> > > possible to provide a switch for disabling stat caching
> > > optimization?
> >
> > Git is going to perform really terribly on repositories of any size if
> > you disable stat caching, so we're not very likely to implement such
> > a feature.  Even if we did implement it, you probably wouldn't want to
> > use it.
>
> OK, I see.  But please consider (one day) to split up the index file to
> separate the local stat cache from the globally valid data.

I am sure that this has been considered even before Git was publicly
announced, and I would wager a guess that it was determined that it would
be better to keep all of Git's private data in one place.

Now, you are totally free to disagree, and even to work on a patch series
to separate the stat cache and offer a compelling argument why this change
should be made. If I were you, I would not expect any other person to be
interested in working on this.

> (By the way, even after 12 years of using Git intensely I am confused
> about what actually is the index.  I believed that it is the "staging
> area", like in "git-add - Add file contents to the index".  But then the
> .git/index file reflects all the tracked files, and not just staged
> ones.  This usage is also reflected by the command "git update-index".)

The concept of the Git index is slightly different from what is actually
stored inside `.git/index`. You should consider the latter to be an
implementation detail that is of concern only if you want to work on
internals. Otherwise the description of the index as a staging area is a
pretty good image.

The staging area contains of course more than just the stages you changed.
It contains the entire tree that is staged in order to become the next
commit.

If you asked a worker at a theater to make a minor change to the stage,
you would not expect the staging area to be empty, either.

> > However, there are the core.checkStat and core.trustctime options
> > which can control which information is used in the stat caching.  You
> > can restrict it to the whole second part of mtime and the file size if
> > you want.  See git-config(1) for more details.
>
> Thanks a lot, that did the trick!  I’ve been already syncing mtimes.
> Setting both core.checkStat and core.trustctime to the "weak" values
> made the spurious modifications go away.

And of course now you have a less performant setup because files have a
much better chance of being "racily clean", i.e. their mtime could be
identical to the `.git/index` file, in which case Git has to assume that
the file might have changed, and the index has to be refreshed.

Just saying that what you think of as a silver bullet comes at a price.

> Still, this is a workaround, and the price is reduced robustness of file
> modification detection.

You misunderstand how Git detects whether a file is modified or not.

A file is re-hashed if its mtime is newer than, _or equal to_, the mtime
of `.git/index`.

So no, it is not the robustness that is the problem. It is no less robust.
The problem is that you force re-hashing where it would not be necessary
otherwise.

In general, I am not sure that you are using the right tool for
synchronizing. If you cannot guarantee that a snapshot of the directory is
copied, you will always run the risk of inconsistent data, which is worse
than not having a backup at all: at least without a backup you do not have
a false sense of security.

Ciao,
Johannes

  reply	other threads:[~2020-01-20 12:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-17 23:57 Stat cache in .git/index hinders syncing of repositories Christoph Groth
2020-01-18 18:15 ` Junio C Hamano
2020-01-18 19:06   ` Christoph Groth
2020-01-18 19:42     ` brian m. carlson
2020-01-18 22:04       ` Christoph Groth
2020-01-20 12:01         ` Johannes Schindelin [this message]
2020-01-20 23:53           ` Christoph Groth
2020-01-21  2:53             ` brian m. carlson
2020-01-24  9:16               ` Christoph Groth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2001201248480.46@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=christoph@grothesque.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).