[PATCH] entry: check for fstat() errors after checkout

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] entry: check for fstat() errors after checkout
@ 2020-07-09  2:10 Matheus Tavares
  2020-07-09 11:41 ` Derrick Stolee
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Matheus Tavares @ 2020-07-09  2:10 UTC (permalink / raw)
  To: git

In 11179eb311 ("entry.c: check if file exists after checkout",
2017-10-05) we started checking the result of the lstat() call done
after writing a file, to avoid writing garbage to the corresponding
cache entry. However, the code skips calling lstat() if it's possible
to use fstat() when it still has the file descriptor open. And when
calling fstat() we don't do the same error checking. To fix that, let
the callers of fstat_output() know when fstat() fails. In this case,
write_entry() will try to use lstat() and properly report an error if
that fails as well.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 entry.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/entry.c b/entry.c
index 00b4903366..449bd32dee 100644
--- a/entry.c
+++ b/entry.c
@@ -113,8 +113,7 @@ static int fstat_output(int fd, const struct checkout *state, struct stat *st)
 	/* use fstat() only when path == ce->name */
 	if (fstat_is_reliable() &&
 	    state->refresh_cache && !state->base_dir_len) {
-		fstat(fd, st);
-		return 1;
+		return !fstat(fd, st);
 	}
 	return 0;
 }
-- 
2.27.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] entry: check for fstat() errors after checkout
  2020-07-09  2:10 [PATCH] entry: check for fstat() errors after checkout Matheus Tavares
@ 2020-07-09 11:41 ` Derrick Stolee
  2020-07-09 14:08   ` Junio C Hamano
  2020-07-09 17:08 ` Junio C Hamano
  2020-07-21 15:39 ` Matheus Tavares Bernardino
  2 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2020-07-09 11:41 UTC (permalink / raw)
  To: Matheus Tavares, git

On 7/8/2020 10:10 PM, Matheus Tavares wrote:
> In 11179eb311 ("entry.c: check if file exists after checkout",
> 2017-10-05) we started checking the result of the lstat() call done
> after writing a file, to avoid writing garbage to the corresponding
> cache entry. However, the code skips calling lstat() if it's possible
> to use fstat() when it still has the file descriptor open. And when
> calling fstat() we don't do the same error checking. To fix that, let
> the callers of fstat_output() know when fstat() fails. In this case,
> write_entry() will try to use lstat() and properly report an error if
> that fails as well.

Looking at this for the first time, I was confused because 11179eb311
doesn't touch these lines. But that's the point: it should have.

Thanks for finding this! I wonder if there is a way to expose this
behavior in a test... it definitely seems like this is only something
that happens if there is a failure in the filesystem, so I'm not sure
such a thing is possible.

It would just be nice to know the ramifications of this change in
behavior, keeping in mind that this behavior started way back in
e4c7292353 (write_entry(): use fstat() instead of lstat() when file is
open, 2009-02-09), over 11 years ago!

Thanks,
-Stolee

> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  entry.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/entry.c b/entry.c
> index 00b4903366..449bd32dee 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -113,8 +113,7 @@ static int fstat_output(int fd, const struct checkout *state, struct stat *st)
>  	/* use fstat() only when path == ce->name */
>  	if (fstat_is_reliable() &&
>  	    state->refresh_cache && !state->base_dir_len) {
> -		fstat(fd, st);
> -		return 1;
> +		return !fstat(fd, st);
>  	}
>  	return 0;
>  }
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] entry: check for fstat() errors after checkout
  2020-07-09 11:41 ` Derrick Stolee
@ 2020-07-09 14:08   ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2020-07-09 14:08 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Matheus Tavares, git

Derrick Stolee <stolee@gmail.com> writes:

> On 7/8/2020 10:10 PM, Matheus Tavares wrote:
>> In 11179eb311 ("entry.c: check if file exists after checkout",
>> 2017-10-05) we started checking the result of the lstat() call done
>> after writing a file, to avoid writing garbage to the corresponding
>> cache entry. However, the code skips calling lstat() if it's possible
>> to use fstat() when it still has the file descriptor open. And when
>> calling fstat() we don't do the same error checking. To fix that, let
>> the callers of fstat_output() know when fstat() fails. In this case,
>> write_entry() will try to use lstat() and properly report an error if
>> that fails as well.
>
> Looking at this for the first time, I was confused because 11179eb311
> doesn't touch these lines. But that's the point: it should have.
>
> Thanks for finding this! I wonder if there is a way to expose this
> behavior in a test... it definitely seems like this is only something
> that happens if there is a failure in the filesystem, so I'm not sure
> such a thing is possible.

If another process removed the path from the filesystem after this
process created it and before this codepath used fstat() on it,
fstat() may succeed while lstat() would definitely fail.  There is
no "failure in the filesystem", but it would be harder to arrange.

> It would just be nice to know the ramifications of this change in
> behavior, keeping in mind that this behavior started way back in
> e4c7292353 (write_entry(): use fstat() instead of lstat() when file is
> open, 2009-02-09), over 11 years ago!

Yup.  I am curious how this was found ;-)

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] entry: check for fstat() errors after checkout
  2020-07-09  2:10 [PATCH] entry: check for fstat() errors after checkout Matheus Tavares
  2020-07-09 11:41 ` Derrick Stolee
@ 2020-07-09 17:08 ` Junio C Hamano
  2020-07-09 17:39   ` Matheus Tavares Bernardino
  2020-07-21 15:39 ` Matheus Tavares Bernardino
  2 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2020-07-09 17:08 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git

Matheus Tavares <matheus.bernardino@usp.br> writes:

> In 11179eb311 ("entry.c: check if file exists after checkout",
> 2017-10-05) we started checking the result of the lstat() call done
> after writing a file, to avoid writing garbage to the corresponding
> cache entry. However, the code skips calling lstat() if it's possible
> to use fstat() when it still has the file descriptor open. And when
> calling fstat() we don't do the same error checking. To fix that, let
> the callers of fstat_output() know when fstat() fails. In this case,
> write_entry() will try to use lstat() and properly report an error if
> that fails as well.

The original is not correct as you point out, as it loses the error
return from fstat(), but I do not think this is right, either.

The returned value from fstat_output() is suppsed to be "have we
done fstat() so that we do not need to do a lstat()?"  Don't you
instead want to extend it to "0 means we didn't, 1 means we did
successfully, and -1 means we did and failed"?  At least, the way
_this_ function is modified by this patch is in line with that.

Which means that we'd need to update the caller(s) to match, to
avoid risking this change to be just half a change, very similarly
to how the change in 11179eb311 was just half a change.

Perhaps like this?

 entry.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/entry.c b/entry.c
index 53380bb614..f48507ca42 100644
--- a/entry.c
+++ b/entry.c
@@ -108,14 +108,21 @@ static int open_output_fd(char *path, const struct cache_entry *ce, int to_tempf
 	}
 }
 
+/*
+ * We have an open fd to a file that we may use lstat() on later. 
+ * When able, try doing a fstat(fd) instead and tell the caller it
+ * does not have to do an extra lstat()
+ *
+ * Return 1 if we successfully ran fstat() and *st is valid.
+ * Return 0 if we did not do fstat() and the caller should do lstat().
+ * Return -1 if we got failure from fstat()---the caller can skip lstat().
+ */
 static int fstat_output(int fd, const struct checkout *state, struct stat *st)
 {
 	/* use fstat() only when path == ce->name */
 	if (fstat_is_reliable() &&
-	    state->refresh_cache && !state->base_dir_len) {
-		fstat(fd, st);
-		return 1;
-	}
+	    state->refresh_cache && !state->base_dir_len)
+		return (fstat(fd, st) < 0) ? -1 : 1;
 	return 0;
 }
 
@@ -369,10 +376,10 @@ static int write_entry(struct cache_entry *ce,
 finish:
 	if (state->refresh_cache) {
 		assert(state->istate);
-		if (!fstat_done)
-			if (lstat(ce->name, &st) < 0)
-				return error_errno("unable to stat just-written file %s",
-						   ce->name);
+		if (fstat_done < 0 ||
+		    (!fstat_done && lstat(ce->name, &st) < 0))
+			return error_errno("unable to stat just-written file %s",
+					   ce->name);
 		fill_stat_cache_info(state->istate, ce, &st);
 		ce->ce_flags |= CE_UPDATE_IN_BASE;
 		mark_fsmonitor_invalid(state->istate, ce);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] entry: check for fstat() errors after checkout
  2020-07-09 17:08 ` Junio C Hamano
@ 2020-07-09 17:39   ` Matheus Tavares Bernardino
  2020-07-09 18:09     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Matheus Tavares Bernardino @ 2020-07-09 17:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Jul 9, 2020 at 2:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> The returned value from fstat_output() is suppsed to be "have we
> done fstat() so that we do not need to do a lstat()?"  Don't you
> instead want to extend it to "0 means we didn't, 1 means we did
> successfully, and -1 means we did and failed"?  At least, the way
> _this_ function is modified by this patch is in line with that.

Makes sense, thanks for spotting this issue.

> Which means that we'd need to update the caller(s) to match, to
> avoid risking this change to be just half a change, very similarly
> to how the change in 11179eb311 was just half a change.
>
> Perhaps like this?
>
>  entry.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/entry.c b/entry.c
> index 53380bb614..f48507ca42 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -108,14 +108,21 @@ static int open_output_fd(char *path, const struct cache_entry *ce, int to_tempf
>         }
>  }
>
> +/*
> + * We have an open fd to a file that we may use lstat() on later.
> + * When able, try doing a fstat(fd) instead and tell the caller it
> + * does not have to do an extra lstat()
> + *
> + * Return 1 if we successfully ran fstat() and *st is valid.
> + * Return 0 if we did not do fstat() and the caller should do lstat().
> + * Return -1 if we got failure from fstat()---the caller can skip lstat().
> + */
>  static int fstat_output(int fd, const struct checkout *state, struct stat *st)
>  {
>         /* use fstat() only when path == ce->name */
>         if (fstat_is_reliable() &&
> -           state->refresh_cache && !state->base_dir_len) {
> -               fstat(fd, st);
> -               return 1;
> -       }
> +           state->refresh_cache && !state->base_dir_len)
> +               return (fstat(fd, st) < 0) ? -1 : 1;
>         return 0;
>  }
>
> @@ -369,10 +376,10 @@ static int write_entry(struct cache_entry *ce,
>  finish:
>         if (state->refresh_cache) {
>                 assert(state->istate);
> -               if (!fstat_done)
> -                       if (lstat(ce->name, &st) < 0)
> -                               return error_errno("unable to stat just-written file %s",
> -                                                  ce->name);
> +               if (fstat_done < 0 ||
> +                   (!fstat_done && lstat(ce->name, &st) < 0))
> +                       return error_errno("unable to stat just-written file %s",
> +                                          ce->name);

If fstat() failed or we couldn't fstat() but lstat() failed, we return
an error. Nice! Thanks for the correction.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] entry: check for fstat() errors after checkout
  2020-07-09 17:39   ` Matheus Tavares Bernardino
@ 2020-07-09 18:09     ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2020-07-09 18:09 UTC (permalink / raw)
  To: Matheus Tavares Bernardino; +Cc: git

Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

> On Thu, Jul 9, 2020 at 2:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> The returned value from fstat_output() is suppsed to be "have we
>> done fstat() so that we do not need to do a lstat()?"  Don't you
>> instead want to extend it to "0 means we didn't, 1 means we did
>> successfully, and -1 means we did and failed"?  At least, the way
>> _this_ function is modified by this patch is in line with that.
>
> Makes sense, thanks for spotting this issue.
>
>> Which means that we'd need to update the caller(s) to match, to
>> avoid risking this change to be just half a change, very similarly
>> to how the change in 11179eb311 was just half a change.

Thinking about this again, you _could_ argue that your version is
being more defensive.  fstat_is_reliable() might lie and tell us it
is OK to use fstat() when we should do lstat(), and in such a case,
we take a failure from fstat() as a sign to pretend that we didn't
even call it, and tell the caller to do an lstat().  I am actually
OK to go in that direction, but then we probably should save away
errno before making this fstat() call, and restore it after it when
we see an error, if we were to truly pretend that we didn't make a
call.  Otherwise error_errno() call we will make later in the flow
would end up reporting the error from the fstat() we chose to pretend
that we didn't call.

And having said all that, I think fstat_is_reliable() can be trusted
(it says false on Windows and says true on all others).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] entry: check for fstat() errors after checkout
  2020-07-09  2:10 [PATCH] entry: check for fstat() errors after checkout Matheus Tavares
  2020-07-09 11:41 ` Derrick Stolee
  2020-07-09 17:08 ` Junio C Hamano
@ 2020-07-21 15:39 ` Matheus Tavares Bernardino
  2020-07-21 20:00   ` Junio C Hamano
  2 siblings, 1 reply; 9+ messages in thread
From: Matheus Tavares Bernardino @ 2020-07-21 15:39 UTC (permalink / raw)
  To: git, Junio C Hamano, Derrick Stolee

Hi, Junio and Stolee

I was looking further at this code and noticed that the conditions
under which we fstat() (or lstat()) an entry are slightly different
throughout entry.c:

- In write_entry()'s footer, we call lstat() iff stat->refresh_cache.
- In write_entry()'s `write_file_entry` label, we call fstat_output()
when !to_tempfile.
- In streaming_write_entry() we call fstat_output() without checking
if !to_tempfile.
- And, finally, in fstat_output() itself, we check
`state->refresh_cache && !state->base_dir_len`.

I understand we always check state->refresh_cache to avoid getting
stat information we won't really need later, as we are not updating
the index. But why do we check !to_tempfile and !state->base_dir_len?
Doesn't writing to a tempfile or using a checkout prefix already imply
!state->refresh_cache?

Thanks,
Matheus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] entry: check for fstat() errors after checkout
  2020-07-21 15:39 ` Matheus Tavares Bernardino
@ 2020-07-21 20:00   ` Junio C Hamano
  2020-07-21 20:57     ` Derrick Stolee
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2020-07-21 20:00 UTC (permalink / raw)
  To: Matheus Tavares Bernardino; +Cc: git, Derrick Stolee

Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

> I was looking further at this code and noticed that the conditions
> under which we fstat() (or lstat()) an entry are slightly different
> throughout entry.c:
>
> - In write_entry()'s footer, we call lstat() iff stat->refresh_cache.
> - In write_entry()'s `write_file_entry` label, we call fstat_output()
> when !to_tempfile.
> - In streaming_write_entry() we call fstat_output() without checking
> if !to_tempfile.
> - And, finally, in fstat_output() itself, we check
> `state->refresh_cache && !state->base_dir_len`.
>
> I understand we always check state->refresh_cache to avoid getting
> stat information we won't really need later, as we are not updating
> the index. But why do we check !to_tempfile and !state->base_dir_len?
> Doesn't writing to a tempfile or using a checkout prefix already imply
> !state->refresh_cache?

You can easily blame the code back to e4c72923 (write_entry(): use
fstat() instead of lstat() when file is open, 2009-02-09).  Back
then, only a single place assigned 0 to state.refresh_cache and that
is in "checkout-index" with either base_dir_len or to_tempfile set.

I do not remember, and I am fairly sure Stolee does not remember
either.  If I have to guess, this was done merely to be extra
cautious, perhaps? As refresh_cache bit is checked first, check for
!to_tempfile and !base_dir_len would be dead at best and redundant
at worst.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] entry: check for fstat() errors after checkout
  2020-07-21 20:00   ` Junio C Hamano
@ 2020-07-21 20:57     ` Derrick Stolee
  0 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee @ 2020-07-21 20:57 UTC (permalink / raw)
  To: Junio C Hamano, Matheus Tavares Bernardino; +Cc: git

On 7/21/2020 4:00 PM, Junio C Hamano wrote:
> Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:
> 
>> I was looking further at this code and noticed that the conditions
>> under which we fstat() (or lstat()) an entry are slightly different
>> throughout entry.c:
>>
>> - In write_entry()'s footer, we call lstat() iff stat->refresh_cache.
>> - In write_entry()'s `write_file_entry` label, we call fstat_output()
>> when !to_tempfile.
>> - In streaming_write_entry() we call fstat_output() without checking
>> if !to_tempfile.
>> - And, finally, in fstat_output() itself, we check
>> `state->refresh_cache && !state->base_dir_len`.
>>
>> I understand we always check state->refresh_cache to avoid getting
>> stat information we won't really need later, as we are not updating
>> the index. But why do we check !to_tempfile and !state->base_dir_len?
>> Doesn't writing to a tempfile or using a checkout prefix already imply
>> !state->refresh_cache?
> 
> You can easily blame the code back to e4c72923 (write_entry(): use
> fstat() instead of lstat() when file is open, 2009-02-09).  Back
> then, only a single place assigned 0 to state.refresh_cache and that
> is in "checkout-index" with either base_dir_len or to_tempfile set.
> 
> I do not remember, and I am fairly sure Stolee does not remember
> either.  If I have to guess, this was done merely to be extra
> cautious, perhaps? As refresh_cache bit is checked first, check for
> !to_tempfile and !base_dir_len would be dead at best and redundant
> at worst.

Yeah, this portion is way outside of my expertise. I'm happy
to _try_ reading patches, but I'd have difficulty being
confident in any change in this area.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-21 20:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09  2:10 [PATCH] entry: check for fstat() errors after checkout Matheus Tavares
2020-07-09 11:41 ` Derrick Stolee
2020-07-09 14:08   ` Junio C Hamano
2020-07-09 17:08 ` Junio C Hamano
2020-07-09 17:39   ` Matheus Tavares Bernardino
2020-07-09 18:09     ` Junio C Hamano
2020-07-21 15:39 ` Matheus Tavares Bernardino
2020-07-21 20:00   ` Junio C Hamano
2020-07-21 20:57     ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).