git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Refreshing index timestamps without reading content
Date: Mon, 09 Jan 2017 07:01:36 -0800	[thread overview]
Message-ID: <xmqqr34cuvjj.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <CACsJy8BRfJG6L49VyC+qsrQ9Arz0gCGpMATpK9uLq61Lx6_Jtg@mail.gmail.com> (Duy Nguyen's message of "Mon, 9 Jan 2017 19:02:45 +0700")

Duy Nguyen <pclouds@gmail.com> writes:

> On Thu, Jan 5, 2017 at 6:23 PM, Quentin Casasnovas
> <quentin.casasnovas@oracle.com> wrote:
>> Is there any way to tell git, after the git ls-tree command above, to
>> refresh its stat cache information and trust us that the file content has
>> not changed, as to avoid any useless file read (though it will obviously
>> will have to stat all of them, but that's not something we can really
>> avoid)
>
> I don't think there's any way to do that, unfortunately.

Lose "unfortunately".

>> If not, I am willing to implement a --assume-content-unchanged to the git
>> update-index if you guys don't see something fundamentally wrong with this
>> approach.
>
> If you do that, I think you should go with either of the following options
>
> - Extend git-update-index --index-info to take stat info as well (or
> maybe make a new option instead). Then you can feed stat info directly
> to git without a use-case-specific "assume-content-unchanged".
>
> - Add "git update-index --touch" that does what "touch" does. In this
> case, it blindly updates stat info to latest. But like touch, we can
> also specify  mtime from command line if we need to. It's a bit less
> generic than the above option, but easier to use.

Even if we assume that it is a good idea to let people muck with the
index like this, either of the above would be a usable addition,
because the cached stat information does not consist solely of
mtime.

"git update-index --index-info" was invented for the case where a
user or a script _knows_ the object ID of the blob that _would_
result if a contents of a file on the filesystem were run through
hash-object.  So from the interface's point of view, it may make
sense to teach it to take an extra/optional argument that is the
path to the file and take the stat info out of the named file when
the extra/optional argument was given.

But that assumes that it is a good idea to do this in the first
place.  It was deliberate design decision that setting the cached
stat info for the entry was protected behind actual content
comparison, and removing that protection will open the index to
abuse.

The userbase of Git has grown wide enough that it is harder to say
"If you lie that a file whose contents does not match the index is
up to date using this mechanism, you will lose data and all bad
things happen---you can keep both halves".  Once we release a
version of Git with such a "feature", the first bug report will be
"I did not want to run 'update-index --refresh' because it takes
time, and some index entries apparently did not match what is on the
filesystem, and I got a corrupt working file after a merge.  Git
should make sure that the contents match when using the new 'path to
the file' argument when updating the cached stat info!".  I do not
have a good answer to such a bug report.

So...

  parent reply	other threads:[~2017-01-09 15:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-05 11:23 Refreshing index timestamps without reading content Quentin Casasnovas
2017-01-09 12:02 ` Duy Nguyen
2017-01-09 12:17   ` Quentin Casasnovas
2017-01-09 12:22     ` Quentin Casasnovas
2017-01-09 15:01   ` Junio C Hamano [this message]
2017-01-09 15:55     ` Quentin Casasnovas
2017-01-10 14:17       ` Quentin Casasnovas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqr34cuvjj.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=quentin.casasnovas@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).