git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Refreshing index timestamps without reading content
@ 2017-01-05 11:23 Quentin Casasnovas
  2017-01-09 12:02 ` Duy Nguyen
  0 siblings, 1 reply; 7+ messages in thread
From: Quentin Casasnovas @ 2017-01-05 11:23 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2657 bytes --]

Hi guys,

Apologies if this is documented somewhere, I have fairly bad search vudu
skills.

I'm looking for a way to cause a full refresh of the index without causing
any read of the files, basically telling git "trust me, all worktree files
are matching the index, but their stat information have changed".  I have
read about the update-index --assume-unchanged and --skip-worktree flags in
the documentation, but these do not cause any index refresh - rather, they
fake that the respective worktree files are matching the index until you
remove those assume-unchanged/skip-worktree bits.

This might sound like a really weird thing to do, but I do have a use case
for it - we have some build farm setup where the resulting objects of a
compilation are stored on a shared server.  The source files are not stored
on the shared server, but locally on each of the build server (as to
decrease network load and make good use of local storage as caches).

We then use an onion filesystem to mount the compiled objects on top of the
local sources - and change the modification time of the source to be older
than the object files, so that on subsequent builds, make does not rebuild
the whole world.

This works fine except for one thing, after changing the mtime of the
source files, the first subsequent git command needing to compare the tree
with the index will take a LONG time since it will read all of the object
content:

  cd linux-2.6

  # Less than a second  when the index is up to date
  time git status > /dev/null
  git status 0.06s user 0.09s system 172% cpu 0.087 total
                                              ~~~~~~~~~~~

  # Change the mtime..
  git ls-tree -r --name-only HEAD | xargs -n 1024 touch

  # Now 30s..
  time git status > /dev/null
  git status  2.73s user 1.79s system 13% cpu 32.453 total
                                              ~~~~~~~~~~~~

The timing information above was captured on my laptop SSD and the penalty
is obviously much higher on spinning disks - especially when this operation
is done on *hundreds* of different work tree in parallel, all hosted on the
same filesystem (it can take tens of minutes!).

Is there any way to tell git, after the git ls-tree command above, to
refresh its stat cache information and trust us that the file content has
not changed, as to avoid any useless file read (though it will obviously
will have to stat all of them, but that's not something we can really
avoid)

If not, I am willing to implement a --assume-content-unchanged to the git
update-index if you guys don't see something fundamentally wrong with this
approach.

Thanks for any hints you can give! :)

Q

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-10 14:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-05 11:23 Refreshing index timestamps without reading content Quentin Casasnovas
2017-01-09 12:02 ` Duy Nguyen
2017-01-09 12:17   ` Quentin Casasnovas
2017-01-09 12:22     ` Quentin Casasnovas
2017-01-09 15:01   ` Junio C Hamano
2017-01-09 15:55     ` Quentin Casasnovas
2017-01-10 14:17       ` Quentin Casasnovas

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).