git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] read-cache: close index.lock in do_write_index
@ 2017-04-26 20:05 Johannes Schindelin
  2017-04-27  3:13 ` Jeff King
  2017-04-27  3:21 ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Johannes Schindelin @ 2017-04-26 20:05 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Junio C Hamano

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach do_write_index() to close the index.lock file
before getting the mtime and updating the istate.timestamp
fields.

On Windows, a file's mtime is not updated until the file is
closed.  On Linux, the mtime is set after the last flush.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Published-As: https://github.com/dscho/git/releases/tag/do-write-index-mtime-v1
Fetch-It-Via: git fetch https://github.com/dscho/git do-write-index-mtime-v1

 read-cache.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 008b335844c..b0276fd5510 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2051,9 +2051,10 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
-static int do_write_index(struct index_state *istate, int newfd,
+static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 			  int strip_extensions)
 {
+	int newfd = tempfile->fd;
 	git_SHA_CTX c;
 	struct cache_header hdr;
 	int i, err, removed, extended, hdr_version;
@@ -2162,7 +2163,11 @@ static int do_write_index(struct index_state *istate, int newfd,
 			return -1;
 	}
 
-	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
+	if (ce_flush(&c, newfd, istate->sha1))
+		return -1;
+	if (close_tempfile(tempfile))
+		return error(_("could not close '%s'"), tempfile->filename.buf);
+	if (lstat(tempfile->filename.buf, &st))
 		return -1;
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
@@ -2185,7 +2190,7 @@ static int commit_locked_index(struct lock_file *lk)
 static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
 				 unsigned flags)
 {
-	int ret = do_write_index(istate, get_lock_file_fd(lock), 0);
+	int ret = do_write_index(istate, &lock->tempfile, 0);
 	if (ret)
 		return ret;
 	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
@@ -2282,7 +2287,7 @@ static int write_shared_index(struct index_state *istate,
 		return do_write_locked_index(istate, lock, flags);
 	}
 	move_cache_to_base_index(istate);
-	ret = do_write_index(si->base, fd, 1);
+	ret = do_write_index(si->base, &temporary_sharedindex, 1);
 	if (ret) {
 		delete_tempfile(&temporary_sharedindex);
 		return ret;

base-commit: e2cb6ab84c94f147f1259260961513b40c36108a
-- 
2.12.2.windows.2.800.gede8f145e06

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] read-cache: close index.lock in do_write_index
  2017-04-26 20:05 [PATCH] read-cache: close index.lock in do_write_index Johannes Schindelin
@ 2017-04-27  3:13 ` Jeff King
  2017-04-27 16:51   ` Jeff Hostetler
  2017-04-27  3:21 ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff King @ 2017-04-27  3:13 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff Hostetler, Junio C Hamano

On Wed, Apr 26, 2017 at 10:05:23PM +0200, Johannes Schindelin wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Teach do_write_index() to close the index.lock file
> before getting the mtime and updating the istate.timestamp
> fields.
> 
> On Windows, a file's mtime is not updated until the file is
> closed.  On Linux, the mtime is set after the last flush.

I wondered at first what this would mean for atomicity. The original
code does an fstat, so we're sure to get the timestamp of what we just
wrote.

I think we should be OK after your change, though. We're stat()ing the
lockfile itself, so nobody else should be touching it (because they'd be
violating the lock to do so).

> -static int do_write_index(struct index_state *istate, int newfd,
> +static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>  			  int strip_extensions)
> [...]
> -	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
> +	if (ce_flush(&c, newfd, istate->sha1))
> +		return -1;
> +	if (close_tempfile(tempfile))
> +		return error(_("could not close '%s'"), tempfile->filename.buf);
> +	if (lstat(tempfile->filename.buf, &st))
>  		return -1;

So now we unconditionally close in do_write_index(), but I don't see any
close_tempfile() calls going away. For the call in write_shared_index(),
that's because we either call delete_tempfile() or rename_tempfile(),
either of which would close as needed, but can handle an already-closed
file.

The other caller is do_write_locked_index(), which accepts either a
flag: either COMMIT_LOCK, CLOSE_LOCK, or neither. COMMIT_LOCK is OK; it
can handle the already-closed file. CLOSE_LOCK is obviously fine. It
just becomes a noop. But when neither flag is set, now we close the
lock. Are there any callers that will be affected?

There are three callers, but I think they all eventually trace up to
write_locked_index(). And grepping for callers of that function, it
looks like each one uses either COMMIT_LOCK or CLOSE_LOCK.

So perhaps we'd want to squash in (or perhaps do as a preparatory
patch) something like:

diff --git a/read-cache.c b/read-cache.c
index b0276fd55..db7a812af 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2193,14 +2193,16 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 	int ret = do_write_index(istate, &lock->tempfile, 0);
 	if (ret)
 		return ret;
+
+	/* Callers must specify exactly one of COMMIT/CLOSE */
 	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
 	       (COMMIT_LOCK | CLOSE_LOCK));
+	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) != 0);
+
 	if (flags & COMMIT_LOCK)
 		return commit_locked_index(lock);
-	else if (flags & CLOSE_LOCK)
-		return close_lock_file(lock);
 	else
-		return ret;
+		return close_lock_file(lock);
 }
 
 static int write_split_index(struct index_state *istate,

We could also get rid of CLOSE_LOCK entirely at this point. Or since
these are the only two flags, just turn the flags field into a boolean
"int commit_lock". But doing it as above is perhaps more readable
(callers say CLOSE_LOCK instead of an unannotated "0"), and the extra
assert will catch any topics in flight that add calls using "0" for
flags.

-Peff

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] read-cache: close index.lock in do_write_index
  2017-04-26 20:05 [PATCH] read-cache: close index.lock in do_write_index Johannes Schindelin
  2017-04-27  3:13 ` Jeff King
@ 2017-04-27  3:21 ` Junio C Hamano
  2017-04-27 16:45   ` Jeff Hostetler
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2017-04-27  3:21 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff Hostetler

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach do_write_index() to close the index.lock file
> before getting the mtime and updating the istate.timestamp
> fields.
>
> On Windows, a file's mtime is not updated until the file is
> closed.  On Linux, the mtime is set after the last flush.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> Published-As: https://github.com/dscho/git/releases/tag/do-write-index-mtime-v1
> Fetch-It-Via: git fetch https://github.com/dscho/git do-write-index-mtime-v1
>
>  read-cache.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index 008b335844c..b0276fd5510 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2051,9 +2051,10 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
>  		rollback_lock_file(lockfile);
>  }
>  
> -static int do_write_index(struct index_state *istate, int newfd,
> +static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>  			  int strip_extensions)
>  {
> +	int newfd = tempfile->fd;
>  	git_SHA_CTX c;
>  	struct cache_header hdr;
>  	int i, err, removed, extended, hdr_version;
> @@ -2162,7 +2163,11 @@ static int do_write_index(struct index_state *istate, int newfd,
>  			return -1;
>  	}
>  
> -	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
> +	if (ce_flush(&c, newfd, istate->sha1))
> +		return -1;
> +	if (close_tempfile(tempfile))
> +		return error(_("could not close '%s'"), tempfile->filename.buf);
> +	if (lstat(tempfile->filename.buf, &st))
>  		return -1;


stat/lstat with path may be slower than fstat on an open file
descriptor, and I think that is the reason why we do it this way,
but the performance difference would probably be unmeasurable and
does not matter in practice.

As we are not using the fact that we still have the file descriptor
open when we do the stat for any purpose (e.g. like locking other
processes out), this move to "close first and then stat" is a good
workaround for the problem.  I wonder if we have been seeing false
"racy git" problem more often due to this issue on Windows than
other platforms.

When code uses lstat, it gives a signal to the readers of the code
that the code is prepared to see a symlink and when it is a symlink
it wants to grab the property of the link itself, not the target of
the link.  I do not think the temporary index can be a symbolic
link, and even if that were the case, we do want the mtime of the
link target, so it is a wrong signal to give to the readers.  Hence,
it would be better to use stat() here from the readability's point
of view.  Of course, when the path is always a regular file, lstat()
vs stat() technically does not give any different result, so this
comment is purely about the maintainability, not about correctness.

Other than that, looks good to me.

>  	istate->timestamp.sec = (unsigned int)st.st_mtime;
>  	istate->timestamp.nsec = ST_MTIME_NSEC(st);
> @@ -2185,7 +2190,7 @@ static int commit_locked_index(struct lock_file *lk)
>  static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
>  				 unsigned flags)
>  {
> -	int ret = do_write_index(istate, get_lock_file_fd(lock), 0);
> +	int ret = do_write_index(istate, &lock->tempfile, 0);
>  	if (ret)
>  		return ret;
>  	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
> @@ -2282,7 +2287,7 @@ static int write_shared_index(struct index_state *istate,
>  		return do_write_locked_index(istate, lock, flags);
>  	}
>  	move_cache_to_base_index(istate);
> -	ret = do_write_index(si->base, fd, 1);
> +	ret = do_write_index(si->base, &temporary_sharedindex, 1);
>  	if (ret) {
>  		delete_tempfile(&temporary_sharedindex);
>  		return ret;
>
> base-commit: e2cb6ab84c94f147f1259260961513b40c36108a

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] read-cache: close index.lock in do_write_index
  2017-04-27  3:21 ` Junio C Hamano
@ 2017-04-27 16:45   ` Jeff Hostetler
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Hostetler @ 2017-04-27 16:45 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin; +Cc: git, Jeff Hostetler



On 4/26/2017 11:21 PM, Junio C Hamano wrote:
> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
>
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach do_write_index() to close the index.lock file
>> before getting the mtime and updating the istate.timestamp
>> fields.
>>
>> On Windows, a file's mtime is not updated until the file is
>> closed.  On Linux, the mtime is set after the last flush.
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> ---
>> Published-As: https://github.com/dscho/git/releases/tag/do-write-index-mtime-v1
>> Fetch-It-Via: git fetch https://github.com/dscho/git do-write-index-mtime-v1
>>
>>  read-cache.c | 13 +++++++++----
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/read-cache.c b/read-cache.c
>> index 008b335844c..b0276fd5510 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -2051,9 +2051,10 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
>>  		rollback_lock_file(lockfile);
>>  }
>>
>> -static int do_write_index(struct index_state *istate, int newfd,
>> +static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>>  			  int strip_extensions)
>>  {
>> +	int newfd = tempfile->fd;
>>  	git_SHA_CTX c;
>>  	struct cache_header hdr;
>>  	int i, err, removed, extended, hdr_version;
>> @@ -2162,7 +2163,11 @@ static int do_write_index(struct index_state *istate, int newfd,
>>  			return -1;
>>  	}
>>
>> -	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
>> +	if (ce_flush(&c, newfd, istate->sha1))
>> +		return -1;
>> +	if (close_tempfile(tempfile))
>> +		return error(_("could not close '%s'"), tempfile->filename.buf);
>> +	if (lstat(tempfile->filename.buf, &st))
>>  		return -1;
>
>
> stat/lstat with path may be slower than fstat on an open file
> descriptor, and I think that is the reason why we do it this way,
> but the performance difference would probably be unmeasurable and
> does not matter in practice.
>
> As we are not using the fact that we still have the file descriptor
> open when we do the stat for any purpose (e.g. like locking other
> processes out), this move to "close first and then stat" is a good
> workaround for the problem.  I wonder if we have been seeing false
> "racy git" problem more often due to this issue on Windows than
> other platforms.

I was wondering about that too.

>
> When code uses lstat, it gives a signal to the readers of the code
> that the code is prepared to see a symlink and when it is a symlink
> it wants to grab the property of the link itself, not the target of
> the link.  I do not think the temporary index can be a symbolic
> link, and even if that were the case, we do want the mtime of the
> link target, so it is a wrong signal to give to the readers.  Hence,
> it would be better to use stat() here from the readability's point
> of view.  Of course, when the path is always a regular file, lstat()
> vs stat() technically does not give any different result, so this
> comment is purely about the maintainability, not about correctness.
>
> Other than that, looks good to me.

Force of habit using lstat() rather than stat().  I'll change it.
Thanks.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] read-cache: close index.lock in do_write_index
  2017-04-27  3:13 ` Jeff King
@ 2017-04-27 16:51   ` Jeff Hostetler
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Hostetler @ 2017-04-27 16:51 UTC (permalink / raw)
  To: Jeff King, Johannes Schindelin; +Cc: git, Jeff Hostetler, Junio C Hamano



On 4/26/2017 11:13 PM, Jeff King wrote:
> On Wed, Apr 26, 2017 at 10:05:23PM +0200, Johannes Schindelin wrote:
>
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach do_write_index() to close the index.lock file
>> before getting the mtime and updating the istate.timestamp
>> fields.
>>
>> On Windows, a file's mtime is not updated until the file is
>> closed.  On Linux, the mtime is set after the last flush.
>
> I wondered at first what this would mean for atomicity. The original
> code does an fstat, so we're sure to get the timestamp of what we just
> wrote.
>
> I think we should be OK after your change, though. We're stat()ing the
> lockfile itself, so nobody else should be touching it (because they'd be
> violating the lock to do so).
>
>> -static int do_write_index(struct index_state *istate, int newfd,
>> +static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>>  			  int strip_extensions)
>> [...]
>> -	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
>> +	if (ce_flush(&c, newfd, istate->sha1))
>> +		return -1;
>> +	if (close_tempfile(tempfile))
>> +		return error(_("could not close '%s'"), tempfile->filename.buf);
>> +	if (lstat(tempfile->filename.buf, &st))
>>  		return -1;
>
> So now we unconditionally close in do_write_index(), but I don't see any
> close_tempfile() calls going away. For the call in write_shared_index(),
> that's because we either call delete_tempfile() or rename_tempfile(),
> either of which would close as needed, but can handle an already-closed
> file.
>
> The other caller is do_write_locked_index(), which accepts either a
> flag: either COMMIT_LOCK, CLOSE_LOCK, or neither. COMMIT_LOCK is OK; it
> can handle the already-closed file. CLOSE_LOCK is obviously fine. It
> just becomes a noop. But when neither flag is set, now we close the
> lock. Are there any callers that will be affected?
>
> There are three callers, but I think they all eventually trace up to
> write_locked_index(). And grepping for callers of that function, it
> looks like each one uses either COMMIT_LOCK or CLOSE_LOCK.
>

Yes, we only took a casual look at the calling environment(s)
and didn't try to do a full reduction/refactoring.  In the
absence of any other red-flags, I'll look at doing this.

Thanks!

> So perhaps we'd want to squash in (or perhaps do as a preparatory
> patch) something like:
>
> diff --git a/read-cache.c b/read-cache.c
> index b0276fd55..db7a812af 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2193,14 +2193,16 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>  	int ret = do_write_index(istate, &lock->tempfile, 0);
>  	if (ret)
>  		return ret;
> +
> +	/* Callers must specify exactly one of COMMIT/CLOSE */
>  	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
>  	       (COMMIT_LOCK | CLOSE_LOCK));
> +	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) != 0);
> +
>  	if (flags & COMMIT_LOCK)
>  		return commit_locked_index(lock);
> -	else if (flags & CLOSE_LOCK)
> -		return close_lock_file(lock);
>  	else
> -		return ret;
> +		return close_lock_file(lock);
>  }
>
>  static int write_split_index(struct index_state *istate,
>
> We could also get rid of CLOSE_LOCK entirely at this point. Or since
> these are the only two flags, just turn the flags field into a boolean
> "int commit_lock". But doing it as above is perhaps more readable
> (callers say CLOSE_LOCK instead of an unannotated "0"), and the extra
> assert will catch any topics in flight that add calls using "0" for
> flags.
>
> -Peff
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-27 16:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26 20:05 [PATCH] read-cache: close index.lock in do_write_index Johannes Schindelin
2017-04-27  3:13 ` Jeff King
2017-04-27 16:51   ` Jeff Hostetler
2017-04-27  3:21 ` Junio C Hamano
2017-04-27 16:45   ` Jeff Hostetler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).