git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Stat cache in .git/index hinders syncing of repositories
@ 2020-01-17 23:57 Christoph Groth
  2020-01-18 18:15 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Groth @ 2020-01-17 23:57 UTC (permalink / raw)
  To: git

Hello,

I am using unison to sync home directories across multiple machines.
This includes a fair number of git repositories and works very well.
Unison recently acquired a new feature that allows to treat selected
subdirectories (like .git) atomically.  This makes the syncing perfectly
safe.

Some people say that one should use git itself to sync git working
directories, but IMHO these people oversee the difference between
collaboration (using git) and being able to continue one’s own
unfinished work on a different machine, including uncommitted files,
stashes, and - if it has to be - in the middle of a merge.  Moreover, it
is simpler not to have to treat git repositories specially when syncing.
Syncing git repositories is thus clearly useful.

However, there is one problem with syncing git repositories, that has
been noticed by multiple people [1]: The file .git/index contains not
only the “git index”, but also a cache of stat-data of the files in the
working directory.  Some file synchronizers are able to sync mtimes, but
syncing ctimes would be bizarre (if it is even possible).

So, say that machines A and B are synced.  A new git repository appears
on machine A.  The synchronizer is run which results in copying all the
files of the new repo verbatim to machine B.  Note that now on machine
B the cache inside the file .git/index contains invalid stat
information.  So when "git status" is run on B .git/index gets
rewritten, and the next sync operation copies it back to A, where again
it is rewritten even by something as harmless as "git status".  And so
on, and so forth...

In my opinion the root of this ping-pong problem is that .git/index
mixes information about the status of the repository (=what has been
staged) that should be synced with a cache of machine-specific
filesystem metadata.

I am not an expert of git-internals, but perhaps it would be a good idea
to move the cache into a separate file that could be put on a "ignore"
list for synchronizers?  It seems to me that this has been already
proposed in a different context [2], and I would not be surprised if
factoring out the cache had other beneficial effects.

If it is not feasible to separate the cache, perhaps another possibility
would be to add a new possible value for core.checkStat that would
disable stat structure checking except for file sizes?

As a workaround for now, I exclude .git/index from syncing.  This seems
to work quite well, but I would be scared to sync unfinished merges like
this.

Thanks
Christoph

[1] https://stackoverflow.com/questions/12126247/why-does-git-index-change-when-i-havent-done-anything-to-my-repository
[2] https://www.mail-archive.com/git@vger.kernel.org/msg48065.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-01-24  9:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17 23:57 Stat cache in .git/index hinders syncing of repositories Christoph Groth
2020-01-18 18:15 ` Junio C Hamano
2020-01-18 19:06   ` Christoph Groth
2020-01-18 19:42     ` brian m. carlson
2020-01-18 22:04       ` Christoph Groth
2020-01-20 12:01         ` Johannes Schindelin
2020-01-20 23:53           ` Christoph Groth
2020-01-21  2:53             ` brian m. carlson
2020-01-24  9:16               ` Christoph Groth

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).