git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Christoph Groth <christoph@grothesque.org>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>, git@vger.kernel.org
Subject: Re: Stat cache in .git/index hinders syncing of repositories
Date: Tue, 21 Jan 2020 02:53:11 +0000	[thread overview]
Message-ID: <20200121025311.GA4113372@camp.crustytoothpaste.net> (raw)
In-Reply-To: <87ftg9n0r1.fsf@drac>

[-- Attachment #1: Type: text/plain, Size: 2887 bytes --]

On 2020-01-20 at 23:53:22, Christoph Groth wrote:
> Johannes Schindelin wrote:
> >
> > On Sat, 18 Jan 2020, Christoph Groth wrote:
> >
> > > OK, I see.  But please consider (one day) to split up the index file
> > > to separate the local stat cache from the globally valid data.
> >
> > I am sure that this has been considered even before Git was publicly
> > announced,
> 
> I would be very interested to hear the rationale for keeping the
> information about what is staged and the stat cache together in the same
> file.  I, or someone else, might actually work on a patch one day, but
> before starting, it would be good to understand the reasoning behind the
> current design.
> 
> > and I would wager a guess that it was determined that it would be
> > better to keep all of Git's private data in one place.
> 
> My point is that it’s not just private data: When I excluded .git/index
> from synchronization, staging files for a commit was no longer
> synchronized.

To try to answer this question, Git stores all of its state about the
working tree in the index.  Bare repositories don't typically have an
index because they don't have a working tree.  Whether that state is
staged contents or stat information, all of it is in one file.

Storing all of this data in one file means that only one file need be
mapped into memory and rewritten.  Git writes to the index by atomically
creating a lock file along side of it and writing the new contents into
it, and then doing an atomic replace.  This approach wouldn't be
possible with multiple files, and any update to it wouldn't be atomic.

There is support for a split index mode which means that the main index
need not be rewritten as often, which is helpful when making small
updates to large trees, where the cost of rewriting the index is
significant.  I don't know how locking is handled there[0], but I assume
that it is, because the people who implemented and reviewed it are
capable and thoughtful.

However, having said that, nobody has provided a compelling case for
using multiple files for storing different types of working tree state.
The existing options are available for cases like yours and others', and
they work.  Since there are clear benefits to the current model,
including simplicity and robustness, and few downsides, nobody has
decided to change it.

I should add that even if, for some reason, we did add support for
splitting this data out, I'm not sure if we'd support syncing only part
of the repository state and blowing away other state.  We don't really
support that now (other than through tools like fetch and clone) and I
don't think we'd want to encourage that behavior in the future.

[0] And I have not had the interest to look at this present moment.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

  reply	other threads:[~2020-01-21  2:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-17 23:57 Stat cache in .git/index hinders syncing of repositories Christoph Groth
2020-01-18 18:15 ` Junio C Hamano
2020-01-18 19:06   ` Christoph Groth
2020-01-18 19:42     ` brian m. carlson
2020-01-18 22:04       ` Christoph Groth
2020-01-20 12:01         ` Johannes Schindelin
2020-01-20 23:53           ` Christoph Groth
2020-01-21  2:53             ` brian m. carlson [this message]
2020-01-24  9:16               ` Christoph Groth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200121025311.GA4113372@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=christoph@grothesque.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).