git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/2] fix temporary garbage in the cache entry
@ 2017-10-05 10:44 lars.schneider
  2017-10-05 10:44 ` [PATCH v1 1/2] entry.c: update cache entry only for existing files lars.schneider
  2017-10-05 10:44 ` [PATCH v1 2/2] entry.c: check if file exists after checkout lars.schneider
  0 siblings, 2 replies; 26+ messages in thread
From: lars.schneider @ 2017-10-05 10:44 UTC (permalink / raw)
  To: git; +Cc: peff, t.gummerer, jrnieder, gitster, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Hi,

in [1] Peff noticed that the delayed filter mechanism causes garbage
in the cache entry for a brief moment during a delayed checkout process.

The first patch fixes this issue. The second patch ensures that we don't
write garbage in that way to the cache entry caused by any other unknown
or external reason ever again.

Please review carefully in case I overlooked some unintended side
effect.

Thank you,
Lars

[1] https://public-inbox.org/git/20171005034632.kzsspk7wsuk23kf2@sigill.intra.peff.net/



Base Ref: v2.15.0-rc0
Web-Diff: https://github.com/larsxschneider/git/commit/d6fff1a941
Checkout: git fetch https://github.com/larsxschneider/git filter-process/fix-cache-entry-v1 && git checkout d6fff1a941

Lars Schneider (2):
  entry.c: update cache entry only for existing files
  entry.c: check if file exists after checkout

 entry.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


base-commit: 217f2767cbcb562872437eed4dec62e00846d90c
--
2.14.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 1/2] entry.c: update cache entry only for existing files
  2017-10-05 10:44 [PATCH v1 0/2] fix temporary garbage in the cache entry lars.schneider
@ 2017-10-05 10:44 ` lars.schneider
  2017-10-05 11:12   ` Jeff King
  2017-10-05 11:19   ` Junio C Hamano
  2017-10-05 10:44 ` [PATCH v1 2/2] entry.c: check if file exists after checkout lars.schneider
  1 sibling, 2 replies; 26+ messages in thread
From: lars.schneider @ 2017-10-05 10:44 UTC (permalink / raw)
  To: git; +Cc: peff, t.gummerer, jrnieder, gitster, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

In 2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) we taught the filter process protocol to delay responses.

That means an external filter might answer in the first write_entry()
call on a file that requires filtering  "I got your request, but I
can't answer right now. Ask again later!". As Git got no answer, we do
not write anything to the filesystem. Consequently, the lstat() call in
the finish block of the function writes garbage to the cache entry.
The garbage is eventually overwritten when the filter answers with
the final file content in a subsequent write_entry() call.

Fix the brief time window of garbage in the cache entry by adding a
special finish block that does nothing for delayed responses. The cache
entry is written properly in a subsequent write_entry() call where
the filter responds with the final file content.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 entry.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/entry.c b/entry.c
index 1c7e3c11d5..5dab656364 100644
--- a/entry.c
+++ b/entry.c
@@ -304,7 +304,7 @@ static int write_entry(struct cache_entry *ce,
 					ce->name, new, size, &buf, dco);
 				if (ret && string_list_has_string(&dco->paths, ce->name)) {
 					free(new);
-					goto finish;
+					goto delayed;
 				}
 			} else
 				ret = convert_to_working_tree(
@@ -360,6 +360,7 @@ static int write_entry(struct cache_entry *ce,
 		ce->ce_flags |= CE_UPDATE_IN_BASE;
 		state->istate->cache_changed |= CE_ENTRY_CHANGED;
 	}
+delayed:
 	return 0;
 }
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v1 2/2] entry.c: check if file exists after checkout
  2017-10-05 10:44 [PATCH v1 0/2] fix temporary garbage in the cache entry lars.schneider
  2017-10-05 10:44 ` [PATCH v1 1/2] entry.c: update cache entry only for existing files lars.schneider
@ 2017-10-05 10:44 ` lars.schneider
  2017-10-05 11:23   ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: lars.schneider @ 2017-10-05 10:44 UTC (permalink / raw)
  To: git; +Cc: peff, t.gummerer, jrnieder, gitster, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

If we are checking out a file and somebody else racily deletes our file,
then we would write garbage to the cache entry. Fix that by checking
the result of the lstat() call on that file. Print an error to the user
if the file does not exist.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 entry.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/entry.c b/entry.c
index 5dab656364..2252d96756 100644
--- a/entry.c
+++ b/entry.c
@@ -355,7 +355,8 @@ static int write_entry(struct cache_entry *ce,
 	if (state->refresh_cache) {
 		assert(state->istate);
 		if (!fstat_done)
-			lstat(ce->name, &st);
+			if (lstat(ce->name, &st) < 0)
+				return error("unable to get status of file %s", ce->name);
 		fill_stat_cache_info(ce, &st);
 		ce->ce_flags |= CE_UPDATE_IN_BASE;
 		state->istate->cache_changed |= CE_ENTRY_CHANGED;
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/2] entry.c: update cache entry only for existing files
  2017-10-05 10:44 ` [PATCH v1 1/2] entry.c: update cache entry only for existing files lars.schneider
@ 2017-10-05 11:12   ` Jeff King
  2017-10-05 11:19   ` Junio C Hamano
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-10-05 11:12 UTC (permalink / raw)
  To: lars.schneider; +Cc: git, t.gummerer, jrnieder, gitster, Lars Schneider

On Thu, Oct 05, 2017 at 12:44:06PM +0200, lars.schneider@autodesk.com wrote:

> From: Lars Schneider <larsxschneider@gmail.com>
> 
> In 2841e8f ("convert: add "status=delayed" to filter process protocol",
> 2017-06-30) we taught the filter process protocol to delay responses.
> 
> That means an external filter might answer in the first write_entry()
> call on a file that requires filtering  "I got your request, but I
> can't answer right now. Ask again later!". As Git got no answer, we do
> not write anything to the filesystem. Consequently, the lstat() call in
> the finish block of the function writes garbage to the cache entry.
> The garbage is eventually overwritten when the filter answers with
> the final file content in a subsequent write_entry() call.
> 
> Fix the brief time window of garbage in the cache entry by adding a
> special finish block that does nothing for delayed responses. The cache
> entry is written properly in a subsequent write_entry() call where
> the filter responds with the final file content.

Nicely explained and the patch looks correct. I also verified that MSan
is happy with t0021 after this.

Thanks for the quick turnaround.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/2] entry.c: update cache entry only for existing files
  2017-10-05 10:44 ` [PATCH v1 1/2] entry.c: update cache entry only for existing files lars.schneider
  2017-10-05 11:12   ` Jeff King
@ 2017-10-05 11:19   ` Junio C Hamano
  2017-10-05 11:26     ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-10-05 11:19 UTC (permalink / raw)
  To: lars.schneider; +Cc: git, peff, t.gummerer, jrnieder, Lars Schneider

lars.schneider@autodesk.com writes:

> diff --git a/entry.c b/entry.c
> index 1c7e3c11d5..5dab656364 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -304,7 +304,7 @@ static int write_entry(struct cache_entry *ce,
>  					ce->name, new, size, &buf, dco);
>  				if (ret && string_list_has_string(&dco->paths, ce->name)) {
>  					free(new);
> -					goto finish;
> +					goto delayed;
>  				}
>  			} else
>  				ret = convert_to_working_tree(

This is unrelated to the main topic of this patch, but we see this
just before the precontext of this hunk:

			if (dco && dco->state != CE_NO_DELAY) {
				/* Do not send the blob in case of a retry. */
				if (dco->state == CE_RETRY) {
					new = NULL;
					size = 0;
				}
				ret = async_convert_to_working_tree(
					ce->name, new, size, &buf, dco);

Aren't we leaking "new" in that CE_RETRY case?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/2] entry.c: check if file exists after checkout
  2017-10-05 10:44 ` [PATCH v1 2/2] entry.c: check if file exists after checkout lars.schneider
@ 2017-10-05 11:23   ` Jeff King
  2017-10-06  4:26     ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-10-05 11:23 UTC (permalink / raw)
  To: lars.schneider; +Cc: git, t.gummerer, jrnieder, gitster, Lars Schneider

On Thu, Oct 05, 2017 at 12:44:07PM +0200, lars.schneider@autodesk.com wrote:

> From: Lars Schneider <larsxschneider@gmail.com>
> 
> If we are checking out a file and somebody else racily deletes our file,
> then we would write garbage to the cache entry. Fix that by checking
> the result of the lstat() call on that file. Print an error to the user
> if the file does not exist.

My gut tells me this is the right thing to be doing, but this commit
message gives very little analysis. Let's see if we can talk it out a
bit.

Aside from bizarre lstat failures, the plausible reason for seeing this
is that somebody racily deleted the file. I.e.,:

  1. We wrote the file.

  2. They deleted it.

  3. We ran lstat() on it and found that it went away.

But imagine that the race went the other way, and (3) happened before
(2). Then we'd actually get a real index entry, but the file would
appear deleted to anybody who checks the filesystem against the stat
data.

So I guess my question is: is step 3 an integral part of the checkout
procedure, or is it simply an opportunity to refresh the index (since we
know we just wrote out the content)?

If it's an integral part, then I agree that the error return you add
here is the right thing to do. But if it's just an index refresh, then I
wonder if we should report a successful checkout, but mark the entry as
stat-dirty.

I dunno. It's pretty philosophical, and I have a feeling that nobody
really cares all that much in practice. Certainly the error return seems
like the easiest fix.

> diff --git a/entry.c b/entry.c
> index 5dab656364..2252d96756 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -355,7 +355,8 @@ static int write_entry(struct cache_entry *ce,
>  	if (state->refresh_cache) {
>  		assert(state->istate);
>  		if (!fstat_done)
> -			lstat(ce->name, &st);
> +			if (lstat(ce->name, &st) < 0)
> +				return error("unable to get status of file %s", ce->name);

We could probably be a bit more specific about the situation, since the
user will see this message with no context. Maybe something like:

  unable to stat just-written file %s

or something. We should probably also use error_errno(). I'd bet if this
ever triggers that it's likely to be ENOENT, but certainly if it _isn't_
that would be interesting information.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/2] entry.c: update cache entry only for existing files
  2017-10-05 11:19   ` Junio C Hamano
@ 2017-10-05 11:26     ` Jeff King
  2017-10-05 23:01       ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-10-05 11:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: lars.schneider, git, t.gummerer, jrnieder, Lars Schneider

On Thu, Oct 05, 2017 at 08:19:13PM +0900, Junio C Hamano wrote:

> > diff --git a/entry.c b/entry.c
> > index 1c7e3c11d5..5dab656364 100644
> > --- a/entry.c
> > +++ b/entry.c
> > @@ -304,7 +304,7 @@ static int write_entry(struct cache_entry *ce,
> >  					ce->name, new, size, &buf, dco);
> >  				if (ret && string_list_has_string(&dco->paths, ce->name)) {
> >  					free(new);
> > -					goto finish;
> > +					goto delayed;
> >  				}
> >  			} else
> >  				ret = convert_to_working_tree(
> 
> This is unrelated to the main topic of this patch, but we see this
> just before the precontext of this hunk:
> 
> 			if (dco && dco->state != CE_NO_DELAY) {
> 				/* Do not send the blob in case of a retry. */
> 				if (dco->state == CE_RETRY) {
> 					new = NULL;
> 					size = 0;
> 				}
> 				ret = async_convert_to_working_tree(
> 					ce->name, new, size, &buf, dco);
> 
> Aren't we leaking "new" in that CE_RETRY case?

Yes, it certainly looks like it. Wouldn't we want to avoid reading the
file from disk entirely in that case?

I.e., I think free(new) is sufficient to fix the leak you mentioned. But
I think we'd want to protect the read_blob_entry() call at the top of
the case with a check for dco->state == CE_RETRY.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/2] entry.c: update cache entry only for existing files
  2017-10-05 11:26     ` Jeff King
@ 2017-10-05 23:01       ` Junio C Hamano
  2017-10-06  4:54         ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-10-05 23:01 UTC (permalink / raw)
  To: Jeff King; +Cc: lars.schneider, git, t.gummerer, jrnieder, Lars Schneider

Jeff King <peff@peff.net> writes:

> On Thu, Oct 05, 2017 at 08:19:13PM +0900, Junio C Hamano wrote:
>
>> This is unrelated to the main topic of this patch, but we see this
>> just before the precontext of this hunk:
>> 
>> 			if (dco && dco->state != CE_NO_DELAY) {
>> 				/* Do not send the blob in case of a retry. */
>> 				if (dco->state == CE_RETRY) {
>> 					new = NULL;
>> 					size = 0;
>> 				}
>> 				ret = async_convert_to_working_tree(
>> 					ce->name, new, size, &buf, dco);
>> 
>> Aren't we leaking "new" in that CE_RETRY case?
>
> Yes, it certainly looks like it. Wouldn't we want to avoid reading the
> file from disk entirely in that case?

Probably.  But that is more of a removal of pessimization than a fix ;-)

> I.e., I think free(new) is sufficient to fix the leak you
> mentioned.

In addition to keeping the new = NULL assignment, of course.

> But
> I think we'd want to protect the read_blob_entry() call at the top of
> the case with a check for dco->state == CE_RETRY.

Yeah, I think that makes more sense.

A patch may look like this on top of these two patches, but I'd
prefer to see Lars's eyeballing and possibly wrapping it up in an
applicable patch after taking the authorship.

I considered initializing new to NULL and size to 0 but decided
against it, as that would lose the justification to have an if
statement that marks that "dco->state == CE_RETRY" is a special
case.  I think explicit if() with clearing these two variables makes
it clearer to show what is going on.

By the way, the S_IFLNK handling seems iffy with or without this
change (or for that matter, I suspect this iffy-ness existed before
Lars's delayed filtering change).  On a platform without symlinks,
we do the same as S_IFREG, but obviously we do not want any content
conversion that happens to the regular files in such a case.  So we
may further want to fix that, but I left it outside the scope of
fixing the leak of NULL and optimizing the blob reading out.


 entry.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/entry.c b/entry.c
index cac5bf5af2..74e35f942c 100644
--- a/entry.c
+++ b/entry.c
@@ -274,14 +274,12 @@ static int write_entry(struct cache_entry *ce,
 	}
 
 	switch (ce_mode_s_ifmt) {
-	case S_IFREG:
 	case S_IFLNK:
 		new = read_blob_entry(ce, &size);
 		if (!new)
 			return error("unable to read sha1 file of %s (%s)",
 				path, oid_to_hex(&ce->oid));
-
-		if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
+		if (has_symlinks && !to_tempfile) {
 			ret = symlink(new, path);
 			free(new);
 			if (ret)
@@ -289,18 +287,28 @@ static int write_entry(struct cache_entry *ce,
 						   path);
 			break;
 		}
-
+		/* fallthru */
+	case S_IFREG:
 		/*
 		 * Convert from git internal format to working tree format
 		 */
 		if (ce_mode_s_ifmt == S_IFREG) {
 			struct delayed_checkout *dco = state->delayed_checkout;
+
+			/* 
+			 * In case of a retry, we do not send blob, hence no
+			 * need to read it, either.
+			 */
+			if (dco && dco->state == CE_RETRY) {
+				new = NULL;
+				size = 0;
+			} else {
+				new = read_blob_entry(ce, &size);
+				if (!new)
+					return error("unable to read sha1 file of %s (%s)",
+						     path, oid_to_hex(&ce->oid));
+			}
 			if (dco && dco->state != CE_NO_DELAY) {
-				/* Do not send the blob in case of a retry. */
-				if (dco->state == CE_RETRY) {
-					new = NULL;
-					size = 0;
-				}
 				ret = async_convert_to_working_tree(
 					ce->name, new, size, &buf, dco);
 				if (ret && string_list_has_string(&dco->paths, ce->name)) {

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/2] entry.c: check if file exists after checkout
  2017-10-05 11:23   ` Jeff King
@ 2017-10-06  4:26     ` Junio C Hamano
  2017-10-06  4:56       ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-10-06  4:26 UTC (permalink / raw)
  To: Jeff King; +Cc: lars.schneider, git, t.gummerer, jrnieder, Lars Schneider

Jeff King <peff@peff.net> writes:

>> diff --git a/entry.c b/entry.c
>> index 5dab656364..2252d96756 100644
>> --- a/entry.c
>> +++ b/entry.c
>> @@ -355,7 +355,8 @@ static int write_entry(struct cache_entry *ce,
>>  	if (state->refresh_cache) {
>>  		assert(state->istate);
>>  		if (!fstat_done)
>> -			lstat(ce->name, &st);
>> +			if (lstat(ce->name, &st) < 0)
>> +				return error("unable to get status of file %s", ce->name);
>
> We could probably be a bit more specific about the situation, since the
> user will see this message with no context. Maybe something like:
>
>   unable to stat just-written file %s
>
> or something. We should probably also use error_errno(). I'd bet if this
> ever triggers that it's likely to be ENOENT, but certainly if it _isn't_
> that would be interesting information.

ENOTDIR and to a lesser degree EACCES and ELOOP are also
uninteresting, as we are talking about somebody else mucking with
the filesystem.

To tie the loose end, here is what will be queued and merged to
'next' soonish.

Thanks.

-- >8 --
From: Lars Schneider <larsxschneider@gmail.com>
Date: Thu, 5 Oct 2017 12:44:07 +0200
Subject: [PATCH] entry.c: check if file exists after checkout

If we are checking out a file and somebody else racily deletes our file,
then we would write garbage to the cache entry. Fix that by checking
the result of the lstat() call on that file. Print an error to the user
if the file does not exist.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 entry.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/entry.c b/entry.c
index f879758c73..6d9de3a5aa 100644
--- a/entry.c
+++ b/entry.c
@@ -341,7 +341,9 @@ static int write_entry(struct cache_entry *ce,
 	if (state->refresh_cache) {
 		assert(state->istate);
 		if (!fstat_done)
-			lstat(ce->name, &st);
+			if (lstat(ce->name, &st) < 0)
+				return error_errno("unable stat just-written file %s",
+						   ce->name);
 		fill_stat_cache_info(ce, &st);
 		ce->ce_flags |= CE_UPDATE_IN_BASE;
 		state->istate->cache_changed |= CE_ENTRY_CHANGED;
-- 
2.15.0-rc0-155-g07e9c1a78d


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/2] entry.c: update cache entry only for existing files
  2017-10-05 23:01       ` Junio C Hamano
@ 2017-10-06  4:54         ` Jeff King
  2017-10-08 21:37           ` Lars Schneider
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-10-06  4:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: lars.schneider, git, t.gummerer, jrnieder, Lars Schneider

On Fri, Oct 06, 2017 at 08:01:48AM +0900, Junio C Hamano wrote:

> > But
> > I think we'd want to protect the read_blob_entry() call at the top of
> > the case with a check for dco->state == CE_RETRY.
> 
> Yeah, I think that makes more sense.
> 
> A patch may look like this on top of these two patches, but I'd
> prefer to see Lars's eyeballing and possibly wrapping it up in an
> applicable patch after taking the authorship.

Yeah, agreed.

> I considered initializing new to NULL and size to 0 but decided
> against it, as that would lose the justification to have an if
> statement that marks that "dco->state == CE_RETRY" is a special
> case.  I think explicit if() with clearing these two variables makes
> it clearer to show what is going on.

Also agreed.

> By the way, the S_IFLNK handling seems iffy with or without this
> change (or for that matter, I suspect this iffy-ness existed before
> Lars's delayed filtering change).  On a platform without symlinks,
> we do the same as S_IFREG, but obviously we do not want any content
> conversion that happens to the regular files in such a case.  So we
> may further want to fix that, but I left it outside the scope of
> fixing the leak of NULL and optimizing the blob reading out.

I think the current code is correct because the conversion all happens
in the S_IFREG if-block. We just fall-through down to the actual write
phase for the symlink case.

That said, I found the fall-through here confusing as hell. I actually
think it would be a lot clearer with a goto, which is saying something.
Here's the "diff -w" of what I mean, for readability (the real patch is
at the bottom for reference, but it adjusts the indentation quite a
bit).

diff --git a/entry.c b/entry.c
index 1c7e3c11d5..d28b42d82d 100644
--- a/entry.c
+++ b/entry.c
@@ -261,6 +261,7 @@ static int write_entry(struct cache_entry *ce,
 	size_t newsize = 0;
 	struct stat st;
 	const struct submodule *sub;
+	struct delayed_checkout *dco = state->delayed_checkout;
 
 	if (ce_mode_s_ifmt == S_IFREG) {
 		struct stream_filter *filter = get_stream_filter(ce->name,
@@ -273,33 +274,39 @@ static int write_entry(struct cache_entry *ce,
 	}
 
 	switch (ce_mode_s_ifmt) {
-	case S_IFREG:
 	case S_IFLNK:
 		new = read_blob_entry(ce, &size);
 		if (!new)
 			return error("unable to read sha1 file of %s (%s)",
 				path, oid_to_hex(&ce->oid));
 
-		if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
+		/* fallback to handling it like a regular file if we must */
+		if (!has_symlinks || to_tempfile)
+			goto write_out_file;
+
 		ret = symlink(new, path);
 		free(new);
 		if (ret)
 			return error_errno("unable to create symlink %s",
 					   path);
 		break;
-		}
 
+	case S_IFREG:
 		/*
 		 * Convert from git internal format to working tree format
 		 */
-		if (ce_mode_s_ifmt == S_IFREG) {
-			struct delayed_checkout *dco = state->delayed_checkout;
-			if (dco && dco->state != CE_NO_DELAY) {
-				/* Do not send the blob in case of a retry. */
-				if (dco->state == CE_RETRY) {
+
+		if (dco && dco->state == CE_RETRY) {
 			new = NULL;
 			size = 0;
+		} else {
+			new = read_blob_entry(ce, &size);
+			if (!new)
+				return error ("unable to read sha1 file of %s (%s)",
+					      path, oid_to_hex(&ce->oid));
 		}
+
+		if (dco && dco->state != CE_NO_DELAY) {
 			ret = async_convert_to_working_tree(
 							    ce->name, new, size, &buf, dco);
 			if (ret && string_list_has_string(&dco->paths, ce->name)) {
@@ -320,8 +327,8 @@ static int write_entry(struct cache_entry *ce,
 		 * point. If the error would have been fatal (e.g.
 		 * filter is required), then we would have died already.
 		 */
-		}
 
+write_out_file:
 		fd = open_output_fd(path, ce, to_tempfile);
 		if (fd < 0) {
 			free(new);

The "structured" way, of course, would be to put everything under
write_out_file into a helper function and just call it from both places
rather than relying on a spaghetti of gotos and switch-breaks.

I'm OK with whatever structure we end up with, as long as it fixes the
leak (and ideally the pessimization).

Anyway, here's the real patch in case anybody wants to apply it and play
with it further.

-- >8 --
diff --git a/entry.c b/entry.c
index 1c7e3c11d5..d28b42d82d 100644
--- a/entry.c
+++ b/entry.c
@@ -261,6 +261,7 @@ static int write_entry(struct cache_entry *ce,
 	size_t newsize = 0;
 	struct stat st;
 	const struct submodule *sub;
+	struct delayed_checkout *dco = state->delayed_checkout;
 
 	if (ce_mode_s_ifmt == S_IFREG) {
 		struct stream_filter *filter = get_stream_filter(ce->name,
@@ -273,55 +274,61 @@ static int write_entry(struct cache_entry *ce,
 	}
 
 	switch (ce_mode_s_ifmt) {
-	case S_IFREG:
 	case S_IFLNK:
 		new = read_blob_entry(ce, &size);
 		if (!new)
 			return error("unable to read sha1 file of %s (%s)",
 				path, oid_to_hex(&ce->oid));
 
-		if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
-			ret = symlink(new, path);
-			free(new);
-			if (ret)
-				return error_errno("unable to create symlink %s",
-						   path);
-			break;
-		}
+		/* fallback to handling it like a regular file if we must */
+		if (!has_symlinks || to_tempfile)
+			goto write_out_file;
 
+		ret = symlink(new, path);
+		free(new);
+		if (ret)
+			return error_errno("unable to create symlink %s",
+					   path);
+		break;
+
+	case S_IFREG:
 		/*
 		 * Convert from git internal format to working tree format
 		 */
-		if (ce_mode_s_ifmt == S_IFREG) {
-			struct delayed_checkout *dco = state->delayed_checkout;
-			if (dco && dco->state != CE_NO_DELAY) {
-				/* Do not send the blob in case of a retry. */
-				if (dco->state == CE_RETRY) {
-					new = NULL;
-					size = 0;
-				}
-				ret = async_convert_to_working_tree(
-					ce->name, new, size, &buf, dco);
-				if (ret && string_list_has_string(&dco->paths, ce->name)) {
-					free(new);
-					goto finish;
-				}
-			} else
-				ret = convert_to_working_tree(
-					ce->name, new, size, &buf);
 
-			if (ret) {
+		if (dco && dco->state == CE_RETRY) {
+			new = NULL;
+			size = 0;
+		} else {
+			new = read_blob_entry(ce, &size);
+			if (!new)
+				return error ("unable to read sha1 file of %s (%s)",
+					      path, oid_to_hex(&ce->oid));
+		}
+
+		if (dco && dco->state != CE_NO_DELAY) {
+			ret = async_convert_to_working_tree(
+							    ce->name, new, size, &buf, dco);
+			if (ret && string_list_has_string(&dco->paths, ce->name)) {
 				free(new);
-				new = strbuf_detach(&buf, &newsize);
-				size = newsize;
+				goto finish;
 			}
-			/*
-			 * No "else" here as errors from convert are OK at this
-			 * point. If the error would have been fatal (e.g.
-			 * filter is required), then we would have died already.
-			 */
+		} else
+			ret = convert_to_working_tree(
+						      ce->name, new, size, &buf);
+
+		if (ret) {
+			free(new);
+			new = strbuf_detach(&buf, &newsize);
+			size = newsize;
 		}
+		/*
+		 * No "else" here as errors from convert are OK at this
+		 * point. If the error would have been fatal (e.g.
+		 * filter is required), then we would have died already.
+		 */
 
+write_out_file:
 		fd = open_output_fd(path, ce, to_tempfile);
 		if (fd < 0) {
 			free(new);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/2] entry.c: check if file exists after checkout
  2017-10-06  4:26     ` Junio C Hamano
@ 2017-10-06  4:56       ` Jeff King
  2017-10-06  6:03         ` Junio C Hamano
  2017-10-08 21:41         ` Lars Schneider
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff King @ 2017-10-06  4:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: lars.schneider, git, t.gummerer, jrnieder, Lars Schneider

On Fri, Oct 06, 2017 at 01:26:48PM +0900, Junio C Hamano wrote:

> > We could probably be a bit more specific about the situation, since the
> > user will see this message with no context. Maybe something like:
> >
> >   unable to stat just-written file %s
> >
> > or something. We should probably also use error_errno(). I'd bet if this
> > ever triggers that it's likely to be ENOENT, but certainly if it _isn't_
> > that would be interesting information.
> 
> ENOTDIR and to a lesser degree EACCES and ELOOP are also
> uninteresting, as we are talking about somebody else mucking with
> the filesystem.

True. The nice thing about the error() route is that we don't need to
make such judgements. The user can decide what is unexpected.

> -- >8 --
> From: Lars Schneider <larsxschneider@gmail.com>
> Date: Thu, 5 Oct 2017 12:44:07 +0200
> Subject: [PATCH] entry.c: check if file exists after checkout
> 
> If we are checking out a file and somebody else racily deletes our file,
> then we would write garbage to the cache entry. Fix that by checking
> the result of the lstat() call on that file. Print an error to the user
> if the file does not exist.

I don't know if we wanted to capture any of the reasoning behind using
error() here or not. Frankly, I'm not sure how to argue for it
succinctly. :) I'm happy with letting it live on in the list archive.

> diff --git a/entry.c b/entry.c
> index f879758c73..6d9de3a5aa 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -341,7 +341,9 @@ static int write_entry(struct cache_entry *ce,
>  	if (state->refresh_cache) {
>  		assert(state->istate);
>  		if (!fstat_done)
> -			lstat(ce->name, &st);
> +			if (lstat(ce->name, &st) < 0)
> +				return error_errno("unable stat just-written file %s",
> +						   ce->name);

s/unable stat/unable to stat/, I think.

Other than that, this looks fine to me.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/2] entry.c: check if file exists after checkout
  2017-10-06  4:56       ` Jeff King
@ 2017-10-06  6:03         ` Junio C Hamano
  2017-10-06  6:05           ` Jeff King
  2017-10-08 21:41         ` Lars Schneider
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-10-06  6:03 UTC (permalink / raw)
  To: Jeff King; +Cc: lars.schneider, git, t.gummerer, jrnieder, Lars Schneider

Jeff King <peff@peff.net> writes:

> I don't know if we wanted to capture any of the reasoning behind using
> error() here or not. Frankly, I'm not sure how to argue for it
> succinctly. :) I'm happy with letting it live on in the list archive.

Are you talking about the "philosophical" thing?  

Because we cannot quite tell between the two cases (one is error--we
wrote or we thought we wrote, but we cannot find it, the other is
dubious--somebody was racing with us in the filesystem), I think it
is reasonable to err on the safer side, even though an error abort
while doing "as we know we wrote the thing that match the index, we
might as well lstat and mark the cache entry as up-to-date" might be
a bit irritating.


>> diff --git a/entry.c b/entry.c
>> index f879758c73..6d9de3a5aa 100644
>> --- a/entry.c
>> +++ b/entry.c
>> @@ -341,7 +341,9 @@ static int write_entry(struct cache_entry *ce,
>>  	if (state->refresh_cache) {
>>  		assert(state->istate);
>>  		if (!fstat_done)
>> -			lstat(ce->name, &st);
>> +			if (lstat(ce->name, &st) < 0)
>> +				return error_errno("unable stat just-written file %s",
>> +						   ce->name);
>
> s/unable stat/unable to stat/, I think.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/2] entry.c: check if file exists after checkout
  2017-10-06  6:03         ` Junio C Hamano
@ 2017-10-06  6:05           ` Jeff King
  2017-10-06  7:58             ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-10-06  6:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: lars.schneider, git, t.gummerer, jrnieder, Lars Schneider

On Fri, Oct 06, 2017 at 03:03:49PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I don't know if we wanted to capture any of the reasoning behind using
> > error() here or not. Frankly, I'm not sure how to argue for it
> > succinctly. :) I'm happy with letting it live on in the list archive.
> 
> Are you talking about the "philosophical" thing?

Right, whether we ought to just mark the entry as stat-dirty and return
success.

> Because we cannot quite tell between the two cases (one is error--we
> wrote or we thought we wrote, but we cannot find it, the other is
> dubious--somebody was racing with us in the filesystem), I think it
> is reasonable to err on the safer side, even though an error abort
> while doing "as we know we wrote the thing that match the index, we
> might as well lstat and mark the cache entry as up-to-date" might be
> a bit irritating.

OK. I can live with that line of thought.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/2] entry.c: check if file exists after checkout
  2017-10-06  6:05           ` Jeff King
@ 2017-10-06  7:58             ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-10-06  7:58 UTC (permalink / raw)
  To: Jeff King
  Cc: lars.schneider, Git Mailing List, Thomas Gummerer,
	Jonathan Nieder, Lars Schneider

On Fri, Oct 6, 2017 at 3:05 PM, Jeff King <peff@peff.net> wrote:
>
>> Because we cannot quite tell between the two cases (one is error--we
>> wrote or we thought we wrote, but we cannot find it, the other is
>> dubious--somebody was racing with us in the filesystem), I think it
>> is reasonable to err on the safer side, even though an error abort
>> while doing "as we know we wrote the thing that match the index, we
>> might as well lstat and mark the cache entry as up-to-date" might be
>> a bit irritating.
>
> OK. I can live with that line of thought.

Still that, or any other, line of thought we follow to declare that it
is a good change should be recorded in the log ;-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/2] entry.c: update cache entry only for existing files
  2017-10-06  4:54         ` Jeff King
@ 2017-10-08 21:37           ` Lars Schneider
  2017-10-09 17:47             ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Lars Schneider @ 2017-10-08 21:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Users, Thomas Gummerer, Jonathan Nieder


> On 06 Oct 2017, at 06:54, Jeff King <peff@peff.net> wrote:
> 
> On Fri, Oct 06, 2017 at 08:01:48AM +0900, Junio C Hamano wrote:
> 
>>> But
>>> I think we'd want to protect the read_blob_entry() call at the top of
>>> the case with a check for dco->state == CE_RETRY.
>> 
>> Yeah, I think that makes more sense.
>> 
>> A patch may look like this on top of these two patches, but I'd
>> prefer to see Lars's eyeballing and possibly wrapping it up in an
>> applicable patch after taking the authorship.
> 

This looks all good to me. Thank you!
A few minor style suggestions below.


> ...
> 
> The "structured" way, of course, would be to put everything under
> write_out_file into a helper function and just call it from both places
> rather than relying on a spaghetti of gotos and switch-breaks.
> 
> I'm OK with whatever structure we end up with, as long as it fixes the
> leak (and ideally the pessimization).
> 
> Anyway, here's the real patch in case anybody wants to apply it and play
> with it further.
> 
> -- >8 --
> diff --git a/entry.c b/entry.c
> index 1c7e3c11d5..d28b42d82d 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -261,6 +261,7 @@ static int write_entry(struct cache_entry *ce,
> 	size_t newsize = 0;
> 	struct stat st;
> 	const struct submodule *sub;
> +	struct delayed_checkout *dco = state->delayed_checkout;
> 
> 	if (ce_mode_s_ifmt == S_IFREG) {
> 		struct stream_filter *filter = get_stream_filter(ce->name,
> @@ -273,55 +274,61 @@ static int write_entry(struct cache_entry *ce,
> 	}
> 
> 	switch (ce_mode_s_ifmt) {
> -	case S_IFREG:
> 	case S_IFLNK:
> 		new = read_blob_entry(ce, &size);
> 		if (!new)
> 			return error("unable to read sha1 file of %s (%s)",
> 				path, oid_to_hex(&ce->oid));
> 
> -		if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
> -			ret = symlink(new, path);
> -			free(new);
> -			if (ret)
> -				return error_errno("unable to create symlink %s",
> -						   path);

Nit: This could go into one line now.


> -			break;
> -		}
> +		/* fallback to handling it like a regular file if we must */
> +		if (!has_symlinks || to_tempfile)
> +			goto write_out_file;
> 
> +		ret = symlink(new, path);
> +		free(new);
> +		if (ret)
> +			return error_errno("unable to create symlink %s",
> +					   path);
> +		break;
> +
> +	case S_IFREG:
> 		/*
> 		 * Convert from git internal format to working tree format
> 		 */
> -		if (ce_mode_s_ifmt == S_IFREG) {
> -			struct delayed_checkout *dco = state->delayed_checkout;
> -			if (dco && dco->state != CE_NO_DELAY) {
> -				/* Do not send the blob in case of a retry. */
> -				if (dco->state == CE_RETRY) {

Maybe we could add here something like:
            /* The filer process got the blob already in case of a retry. Unnecessary to send it, again! */

> -					new = NULL;
> -					size = 0;
> -				}
> -				ret = async_convert_to_working_tree(
> -					ce->name, new, size, &buf, dco);

Nit: This could go into one line now.


> -				if (ret && string_list_has_string(&dco->paths, ce->name)) {
> -					free(new);
> -					goto finish;
> -				}
> -			} else
> -				ret = convert_to_working_tree(
> -					ce->name, new, size, &buf);

Nit: This could go into one line now.


> 
> -			if (ret) {
> +		if (dco && dco->state == CE_RETRY) {
> +			new = NULL;
> +			size = 0;
> +		} else {
> +			new = read_blob_entry(ce, &size);
> +			if (!new)
> +				return error ("unable to read sha1 file of %s (%s)",
> +					      path, oid_to_hex(&ce->oid));
> +		}
> +
> +		if (dco && dco->state != CE_NO_DELAY) {
> +			ret = async_convert_to_working_tree(
> +							    ce->name, new, size, &buf, dco);
> +			if (ret && string_list_has_string(&dco->paths, ce->name)) {
> 				free(new);
> -				new = strbuf_detach(&buf, &newsize);
> -				size = newsize;
> +				goto finish;
> 			}
> -			/*
> -			 * No "else" here as errors from convert are OK at this
> -			 * point. If the error would have been fatal (e.g.
> -			 * filter is required), then we would have died already.
> -			 */
> +		} else
> +			ret = convert_to_working_tree(
> +						      ce->name, new, size, &buf);
> +
> +		if (ret) {
> +			free(new);
> +			new = strbuf_detach(&buf, &newsize);
> +			size = newsize;
> 		}
> +		/*
> +		 * No "else" here as errors from convert are OK at this
> +		 * point. If the error would have been fatal (e.g.
> +		 * filter is required), then we would have died already.
> +		 */
> 
> +write_out_file:
> 		fd = open_output_fd(path, ce, to_tempfile);
> 		if (fd < 0) {
> 			free(new);

...

>		break;
>	case S_IFGITLINK:

Maybe add a newline above "S_IFGITLINK" and "default" for readability. 
Above "case S_IFREG:" we have a newline, too. 


- Lars







^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/2] entry.c: check if file exists after checkout
  2017-10-06  4:56       ` Jeff King
  2017-10-06  6:03         ` Junio C Hamano
@ 2017-10-08 21:41         ` Lars Schneider
  1 sibling, 0 replies; 26+ messages in thread
From: Lars Schneider @ 2017-10-08 21:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Users, Thomas Gummerer, Jonathan Nieder, Jeff King


> On 06 Oct 2017, at 06:56, Jeff King <peff@peff.net> wrote:
> 
> On Fri, Oct 06, 2017 at 01:26:48PM +0900, Junio C Hamano wrote:
> 
> ...
>> -- >8 --
>> From: Lars Schneider <larsxschneider@gmail.com>
>> Date: Thu, 5 Oct 2017 12:44:07 +0200
>> Subject: [PATCH] entry.c: check if file exists after checkout
>> 
>> If we are checking out a file and somebody else racily deletes our file,
>> then we would write garbage to the cache entry. Fix that by checking
>> the result of the lstat() call on that file. Print an error to the user
>> if the file does not exist.
> 
> I don't know if we wanted to capture any of the reasoning behind using
> error() here or not. Frankly, I'm not sure how to argue for it
> succinctly. :) I'm happy with letting it live on in the list archive.
> 
>> diff --git a/entry.c b/entry.c
>> index f879758c73..6d9de3a5aa 100644
>> --- a/entry.c
>> +++ b/entry.c
>> @@ -341,7 +341,9 @@ static int write_entry(struct cache_entry *ce,
>> 	if (state->refresh_cache) {
>> 		assert(state->istate);
>> 		if (!fstat_done)
>> -			lstat(ce->name, &st);
>> +			if (lstat(ce->name, &st) < 0)
>> +				return error_errno("unable stat just-written file %s",
>> +						   ce->name);
> 
> s/unable stat/unable to stat/, I think.
> 
> Other than that, this looks fine to me.
> 
> -Peff

Looks fine to me, too.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/2] entry.c: update cache entry only for existing files
  2017-10-08 21:37           ` Lars Schneider
@ 2017-10-09 17:47             ` Jeff King
  2017-10-09 17:48               ` [PATCH 1/3] write_entry: fix leak when retrying delayed filter Jeff King
                                 ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Jeff King @ 2017-10-09 17:47 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Junio C Hamano, Git Users, Thomas Gummerer, Jonathan Nieder

On Sun, Oct 08, 2017 at 11:37:14PM +0200, Lars Schneider wrote:

> >> Yeah, I think that makes more sense.
> >> 
> >> A patch may look like this on top of these two patches, but I'd
> >> prefer to see Lars's eyeballing and possibly wrapping it up in an
> >> applicable patch after taking the authorship.
> > 
> 
> This looks all good to me. Thank you!
> A few minor style suggestions below.

Thanks, these were all reasonable (I actually avoided unwrapping some of
the lines in the original to make the "-w" diff more readable :) ).

I ended up breaking this into three commits, since I think that makes it
easier to see what each of the changes is doing. Here's what I have (on
top of what Junio has already queued in ls/filter-process-delayed):

  [v2 1/3]: write_entry: fix leak when retrying delayed filter
  [v2 2/3]: write_entry: avoid reading blobs in CE_RETRY case
  [v2 3/3]: write_entry: untangle symlink and regular-file cases

 entry.c | 83 ++++++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 48 insertions(+), 35 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] write_entry: fix leak when retrying delayed filter
  2017-10-09 17:47             ` Jeff King
@ 2017-10-09 17:48               ` Jeff King
  2017-10-10  0:00                 ` Junio C Hamano
  2017-10-09 17:48               ` [PATCH 2/3] write_entry: avoid reading blobs in CE_RETRY case Jeff King
  2017-10-09 17:50               ` [PATCH 3/3] write_entry: untangle symlink and regular-file cases Jeff King
  2 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-10-09 17:48 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Junio C Hamano, Git Users, Thomas Gummerer, Jonathan Nieder

When write_entry() retries a delayed filter request, we
don't need to send the blob content to the filter again, and
set the pointer to NULL. But doing so means we leak the
contents we read earlier from read_blob_entry(). Let's make
sure to free it before dropping the pointer.

Signed-off-by: Jeff King <peff@peff.net>
---
 entry.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/entry.c b/entry.c
index ab79f1f69c..637c5958b0 100644
--- a/entry.c
+++ b/entry.c
@@ -283,6 +283,7 @@ static int write_entry(struct cache_entry *ce,
 			if (dco && dco->state != CE_NO_DELAY) {
 				/* Do not send the blob in case of a retry. */
 				if (dco->state == CE_RETRY) {
+					free(new);
 					new = NULL;
 					size = 0;
 				}
-- 
2.15.0.rc0.421.gf5a676fd56


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] write_entry: avoid reading blobs in CE_RETRY case
  2017-10-09 17:47             ` Jeff King
  2017-10-09 17:48               ` [PATCH 1/3] write_entry: fix leak when retrying delayed filter Jeff King
@ 2017-10-09 17:48               ` Jeff King
  2017-10-10  0:00                 ` Junio C Hamano
  2017-10-09 17:50               ` [PATCH 3/3] write_entry: untangle symlink and regular-file cases Jeff King
  2 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-10-09 17:48 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Junio C Hamano, Git Users, Thomas Gummerer, Jonathan Nieder

When retrying a delayed filter-process request, we don't
need to send the blob to the filter a second time. However,
we read it unconditionally into a buffer, only to later
throw away that buffer. We can make this more efficient by
skipping the read in the first place when it isn't
necessary.

Signed-off-by: Jeff King <peff@peff.net>
---
 entry.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/entry.c b/entry.c
index 637c5958b0..bec51e37a2 100644
--- a/entry.c
+++ b/entry.c
@@ -240,6 +240,7 @@ static int write_entry(struct cache_entry *ce,
 		       char *path, const struct checkout *state, int to_tempfile)
 {
 	unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
+	struct delayed_checkout *dco = state->delayed_checkout;
 	int fd, ret, fstat_done = 0;
 	char *new;
 	struct strbuf buf = STRBUF_INIT;
@@ -261,10 +262,19 @@ static int write_entry(struct cache_entry *ce,
 	switch (ce_mode_s_ifmt) {
 	case S_IFREG:
 	case S_IFLNK:
-		new = read_blob_entry(ce, &size);
-		if (!new)
-			return error("unable to read sha1 file of %s (%s)",
-				path, oid_to_hex(&ce->oid));
+		/*
+		 * We do not send the blob in case of a retry, so do not
+		 * bother reading it at all.
+		 */
+		if (ce_mode_s_ifmt == S_IFREG && dco && dco->state == CE_RETRY) {
+			new = NULL;
+			size = 0;
+		} else {
+			new = read_blob_entry(ce, &size);
+			if (!new)
+				return error("unable to read sha1 file of %s (%s)",
+					     path, oid_to_hex(&ce->oid));
+		}
 
 		if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
 			ret = symlink(new, path);
@@ -279,14 +289,7 @@ static int write_entry(struct cache_entry *ce,
 		 * Convert from git internal format to working tree format
 		 */
 		if (ce_mode_s_ifmt == S_IFREG) {
-			struct delayed_checkout *dco = state->delayed_checkout;
 			if (dco && dco->state != CE_NO_DELAY) {
-				/* Do not send the blob in case of a retry. */
-				if (dco->state == CE_RETRY) {
-					free(new);
-					new = NULL;
-					size = 0;
-				}
 				ret = async_convert_to_working_tree(
 					ce->name, new, size, &buf, dco);
 				if (ret && string_list_has_string(&dco->paths, ce->name)) {
-- 
2.15.0.rc0.421.gf5a676fd56


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] write_entry: untangle symlink and regular-file cases
  2017-10-09 17:47             ` Jeff King
  2017-10-09 17:48               ` [PATCH 1/3] write_entry: fix leak when retrying delayed filter Jeff King
  2017-10-09 17:48               ` [PATCH 2/3] write_entry: avoid reading blobs in CE_RETRY case Jeff King
@ 2017-10-09 17:50               ` Jeff King
  2017-10-10  0:03                 ` Junio C Hamano
  2 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-10-09 17:50 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Junio C Hamano, Git Users, Thomas Gummerer, Jonathan Nieder

The write_entry() function switches on the mode of the entry
we're going to write out. The cases for S_IFLNK and S_IFREG
are lumped together. In earlier versions of the code, this
made some sense. They have a shared preamble (which reads
the blob content), a short type-specific body, and a shared
conclusion (which writes out the file contents; always for
S_IFREG and only sometimes for S_IFLNK).

But over time this has grown to make less sense. The preamble
now has conditional bits for each type, and the S_IFREG body
has grown a lot more complicated. It's hard to follow the
logic of which code is running for which mode.

Let's give each mode its own case arm. We will still share
the conclusion code, which means we now jump to it with a
goto. Ideally we'd pull that shared code into its own
function, but it touches so much internal state in the
write_entry() function that the end result is actually
harder to follow than the goto.

While we're here, we'll touch up a few bits of whitespace to
make the beginning and endings of the cases easier to read.

Signed-off-by: Jeff King <peff@peff.net>
---
 entry.c | 71 +++++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 40 insertions(+), 31 deletions(-)

diff --git a/entry.c b/entry.c
index bec51e37a2..206363fd15 100644
--- a/entry.c
+++ b/entry.c
@@ -260,13 +260,31 @@ static int write_entry(struct cache_entry *ce,
 	}
 
 	switch (ce_mode_s_ifmt) {
-	case S_IFREG:
 	case S_IFLNK:
+		new = read_blob_entry(ce, &size);
+		if (!new)
+			return error("unable to read sha1 file of %s (%s)",
+				     path, oid_to_hex(&ce->oid));
+
+		/*
+		 * We can't make a real symlink; write out a regular file entry
+		 * with the symlink destination as its contents.
+		 */
+		if (!has_symlinks || to_tempfile)
+			goto write_file_entry;
+
+		ret = symlink(new, path);
+		free(new);
+		if (ret)
+			return error_errno("unable to create symlink %s", path);
+		break;
+
+	case S_IFREG:
 		/*
 		 * We do not send the blob in case of a retry, so do not
 		 * bother reading it at all.
 		 */
-		if (ce_mode_s_ifmt == S_IFREG && dco && dco->state == CE_RETRY) {
+		if (dco && dco->state == CE_RETRY) {
 			new = NULL;
 			size = 0;
 		} else {
@@ -276,42 +294,31 @@ static int write_entry(struct cache_entry *ce,
 					     path, oid_to_hex(&ce->oid));
 		}
 
-		if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
-			ret = symlink(new, path);
-			free(new);
-			if (ret)
-				return error_errno("unable to create symlink %s",
-						   path);
-			break;
-		}
-
 		/*
 		 * Convert from git internal format to working tree format
 		 */
-		if (ce_mode_s_ifmt == S_IFREG) {
-			if (dco && dco->state != CE_NO_DELAY) {
-				ret = async_convert_to_working_tree(
-					ce->name, new, size, &buf, dco);
-				if (ret && string_list_has_string(&dco->paths, ce->name)) {
-					free(new);
-					goto delayed;
-				}
-			} else
-				ret = convert_to_working_tree(
-					ce->name, new, size, &buf);
-
-			if (ret) {
+		if (dco && dco->state != CE_NO_DELAY) {
+			ret = async_convert_to_working_tree(ce->name, new,
+							    size, &buf, dco);
+			if (ret && string_list_has_string(&dco->paths, ce->name)) {
 				free(new);
-				new = strbuf_detach(&buf, &newsize);
-				size = newsize;
+				goto delayed;
 			}
-			/*
-			 * No "else" here as errors from convert are OK at this
-			 * point. If the error would have been fatal (e.g.
-			 * filter is required), then we would have died already.
-			 */
+		} else
+			ret = convert_to_working_tree(ce->name, new, size, &buf);
+
+		if (ret) {
+			free(new);
+			new = strbuf_detach(&buf, &newsize);
+			size = newsize;
 		}
+		/*
+		 * No "else" here as errors from convert are OK at this
+		 * point. If the error would have been fatal (e.g.
+		 * filter is required), then we would have died already.
+		 */
 
+	write_file_entry:
 		fd = open_output_fd(path, ce, to_tempfile);
 		if (fd < 0) {
 			free(new);
@@ -326,6 +333,7 @@ static int write_entry(struct cache_entry *ce,
 		if (wrote != size)
 			return error("unable to write file %s", path);
 		break;
+
 	case S_IFGITLINK:
 		if (to_tempfile)
 			return error("cannot create temporary submodule %s", path);
@@ -337,6 +345,7 @@ static int write_entry(struct cache_entry *ce,
 				NULL, oid_to_hex(&ce->oid),
 				state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
 		break;
+
 	default:
 		return error("unknown file mode for %s in index", path);
 	}
-- 
2.15.0.rc0.421.gf5a676fd56

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] write_entry: fix leak when retrying delayed filter
  2017-10-09 17:48               ` [PATCH 1/3] write_entry: fix leak when retrying delayed filter Jeff King
@ 2017-10-10  0:00                 ` Junio C Hamano
  2017-10-10  9:23                   ` Simon Ruderich
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-10-10  0:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Git Users, Thomas Gummerer, Jonathan Nieder

Jeff King <peff@peff.net> writes:

> When write_entry() retries a delayed filter request, we
> don't need to send the blob content to the filter again, and
> set the pointer to NULL. But doing so means we leak the
> contents we read earlier from read_blob_entry(). Let's make
> sure to free it before dropping the pointer.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  entry.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/entry.c b/entry.c
> index ab79f1f69c..637c5958b0 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -283,6 +283,7 @@ static int write_entry(struct cache_entry *ce,
>  			if (dco && dco->state != CE_NO_DELAY) {
>  				/* Do not send the blob in case of a retry. */
>  				if (dco->state == CE_RETRY) {
> +					free(new);
>  					new = NULL;
>  					size = 0;
>  				}

Looks good to me.  Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] write_entry: avoid reading blobs in CE_RETRY case
  2017-10-09 17:48               ` [PATCH 2/3] write_entry: avoid reading blobs in CE_RETRY case Jeff King
@ 2017-10-10  0:00                 ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-10-10  0:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Git Users, Thomas Gummerer, Jonathan Nieder

Jeff King <peff@peff.net> writes:

> When retrying a delayed filter-process request, we don't
> need to send the blob to the filter a second time. However,
> we read it unconditionally into a buffer, only to later
> throw away that buffer. We can make this more efficient by
> skipping the read in the first place when it isn't
> necessary.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  entry.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)

Again, looks obviously correct.  Thanks.

>
> diff --git a/entry.c b/entry.c
> index 637c5958b0..bec51e37a2 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -240,6 +240,7 @@ static int write_entry(struct cache_entry *ce,
>  		       char *path, const struct checkout *state, int to_tempfile)
>  {
>  	unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
> +	struct delayed_checkout *dco = state->delayed_checkout;
>  	int fd, ret, fstat_done = 0;
>  	char *new;
>  	struct strbuf buf = STRBUF_INIT;
> @@ -261,10 +262,19 @@ static int write_entry(struct cache_entry *ce,
>  	switch (ce_mode_s_ifmt) {
>  	case S_IFREG:
>  	case S_IFLNK:
> -		new = read_blob_entry(ce, &size);
> -		if (!new)
> -			return error("unable to read sha1 file of %s (%s)",
> -				path, oid_to_hex(&ce->oid));
> +		/*
> +		 * We do not send the blob in case of a retry, so do not
> +		 * bother reading it at all.
> +		 */
> +		if (ce_mode_s_ifmt == S_IFREG && dco && dco->state == CE_RETRY) {
> +			new = NULL;
> +			size = 0;
> +		} else {
> +			new = read_blob_entry(ce, &size);
> +			if (!new)
> +				return error("unable to read sha1 file of %s (%s)",
> +					     path, oid_to_hex(&ce->oid));
> +		}
>  
>  		if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
>  			ret = symlink(new, path);
> @@ -279,14 +289,7 @@ static int write_entry(struct cache_entry *ce,
>  		 * Convert from git internal format to working tree format
>  		 */
>  		if (ce_mode_s_ifmt == S_IFREG) {
> -			struct delayed_checkout *dco = state->delayed_checkout;
>  			if (dco && dco->state != CE_NO_DELAY) {
> -				/* Do not send the blob in case of a retry. */
> -				if (dco->state == CE_RETRY) {
> -					free(new);
> -					new = NULL;
> -					size = 0;
> -				}
>  				ret = async_convert_to_working_tree(
>  					ce->name, new, size, &buf, dco);
>  				if (ret && string_list_has_string(&dco->paths, ce->name)) {

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] write_entry: untangle symlink and regular-file cases
  2017-10-09 17:50               ` [PATCH 3/3] write_entry: untangle symlink and regular-file cases Jeff King
@ 2017-10-10  0:03                 ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-10-10  0:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Git Users, Thomas Gummerer, Jonathan Nieder

Jeff King <peff@peff.net> writes:

> The write_entry() function switches on the mode of the entry
> we're going to write out. The cases for S_IFLNK and S_IFREG
> are lumped together. In earlier versions of the code, this
> made some sense. They have a shared preamble (which reads
> the blob content), a short type-specific body, and a shared
> conclusion (which writes out the file contents; always for
> S_IFREG and only sometimes for S_IFLNK).
>
> But over time this has grown to make less sense. The preamble
> now has conditional bits for each type, and the S_IFREG body
> has grown a lot more complicated. It's hard to follow the
> logic of which code is running for which mode.

Nicely done and looks correct.  Thanks.

>
> Let's give each mode its own case arm. We will still share
> the conclusion code, which means we now jump to it with a
> goto. Ideally we'd pull that shared code into its own
> function, but it touches so much internal state in the
> write_entry() function that the end result is actually
> harder to follow than the goto.

>
> While we're here, we'll touch up a few bits of whitespace to
> make the beginning and endings of the cases easier to read.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  entry.c | 71 +++++++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 40 insertions(+), 31 deletions(-)
>
> diff --git a/entry.c b/entry.c
> index bec51e37a2..206363fd15 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -260,13 +260,31 @@ static int write_entry(struct cache_entry *ce,
>  	}
>  
>  	switch (ce_mode_s_ifmt) {
> -	case S_IFREG:
>  	case S_IFLNK:
> +		new = read_blob_entry(ce, &size);
> +		if (!new)
> +			return error("unable to read sha1 file of %s (%s)",
> +				     path, oid_to_hex(&ce->oid));
> +
> +		/*
> +		 * We can't make a real symlink; write out a regular file entry
> +		 * with the symlink destination as its contents.
> +		 */
> +		if (!has_symlinks || to_tempfile)
> +			goto write_file_entry;
> +
> +		ret = symlink(new, path);
> +		free(new);
> +		if (ret)
> +			return error_errno("unable to create symlink %s", path);
> +		break;
> +
> +	case S_IFREG:
>  		/*
>  		 * We do not send the blob in case of a retry, so do not
>  		 * bother reading it at all.
>  		 */
> -		if (ce_mode_s_ifmt == S_IFREG && dco && dco->state == CE_RETRY) {
> +		if (dco && dco->state == CE_RETRY) {
>  			new = NULL;
>  			size = 0;
>  		} else {
> @@ -276,42 +294,31 @@ static int write_entry(struct cache_entry *ce,
>  					     path, oid_to_hex(&ce->oid));
>  		}
>  
> -		if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
> -			ret = symlink(new, path);
> -			free(new);
> -			if (ret)
> -				return error_errno("unable to create symlink %s",
> -						   path);
> -			break;
> -		}
> -
>  		/*
>  		 * Convert from git internal format to working tree format
>  		 */
> -		if (ce_mode_s_ifmt == S_IFREG) {
> -			if (dco && dco->state != CE_NO_DELAY) {
> -				ret = async_convert_to_working_tree(
> -					ce->name, new, size, &buf, dco);
> -				if (ret && string_list_has_string(&dco->paths, ce->name)) {
> -					free(new);
> -					goto delayed;
> -				}
> -			} else
> -				ret = convert_to_working_tree(
> -					ce->name, new, size, &buf);
> -
> -			if (ret) {
> +		if (dco && dco->state != CE_NO_DELAY) {
> +			ret = async_convert_to_working_tree(ce->name, new,
> +							    size, &buf, dco);
> +			if (ret && string_list_has_string(&dco->paths, ce->name)) {
>  				free(new);
> -				new = strbuf_detach(&buf, &newsize);
> -				size = newsize;
> +				goto delayed;
>  			}
> -			/*
> -			 * No "else" here as errors from convert are OK at this
> -			 * point. If the error would have been fatal (e.g.
> -			 * filter is required), then we would have died already.
> -			 */
> +		} else
> +			ret = convert_to_working_tree(ce->name, new, size, &buf);
> +
> +		if (ret) {
> +			free(new);
> +			new = strbuf_detach(&buf, &newsize);
> +			size = newsize;
>  		}
> +		/*
> +		 * No "else" here as errors from convert are OK at this
> +		 * point. If the error would have been fatal (e.g.
> +		 * filter is required), then we would have died already.
> +		 */
>  
> +	write_file_entry:
>  		fd = open_output_fd(path, ce, to_tempfile);
>  		if (fd < 0) {
>  			free(new);
> @@ -326,6 +333,7 @@ static int write_entry(struct cache_entry *ce,
>  		if (wrote != size)
>  			return error("unable to write file %s", path);
>  		break;
> +
>  	case S_IFGITLINK:
>  		if (to_tempfile)
>  			return error("cannot create temporary submodule %s", path);
> @@ -337,6 +345,7 @@ static int write_entry(struct cache_entry *ce,
>  				NULL, oid_to_hex(&ce->oid),
>  				state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
>  		break;
> +
>  	default:
>  		return error("unknown file mode for %s in index", path);
>  	}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] write_entry: fix leak when retrying delayed filter
  2017-10-10  0:00                 ` Junio C Hamano
@ 2017-10-10  9:23                   ` Simon Ruderich
  2017-10-10  9:25                     ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Simon Ruderich @ 2017-10-10  9:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Lars Schneider, Git Users, Thomas Gummerer,
	Jonathan Nieder

On Tue, Oct 10, 2017 at 09:00:09AM +0900, Junio C Hamano wrote:
>> --- a/entry.c
>> +++ b/entry.c
>> @@ -283,6 +283,7 @@ static int write_entry(struct cache_entry *ce,
>>  			if (dco && dco->state != CE_NO_DELAY) {
>>  				/* Do not send the blob in case of a retry. */
>>  				if (dco->state == CE_RETRY) {
>> +					free(new);
>>  					new = NULL;
>>  					size = 0;
>>  				}

FREE_AND_NULL(new)?

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] write_entry: fix leak when retrying delayed filter
  2017-10-10  9:23                   ` Simon Ruderich
@ 2017-10-10  9:25                     ` Jeff King
  2017-10-10  9:49                       ` Simon Ruderich
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-10-10  9:25 UTC (permalink / raw)
  To: Simon Ruderich
  Cc: Junio C Hamano, Lars Schneider, Git Users, Thomas Gummerer,
	Jonathan Nieder

On Tue, Oct 10, 2017 at 11:23:19AM +0200, Simon Ruderich wrote:

> On Tue, Oct 10, 2017 at 09:00:09AM +0900, Junio C Hamano wrote:
> >> --- a/entry.c
> >> +++ b/entry.c
> >> @@ -283,6 +283,7 @@ static int write_entry(struct cache_entry *ce,
> >>  			if (dco && dco->state != CE_NO_DELAY) {
> >>  				/* Do not send the blob in case of a retry. */
> >>  				if (dco->state == CE_RETRY) {
> >> +					free(new);
> >>  					new = NULL;
> >>  					size = 0;
> >>  				}
> 
> FREE_AND_NULL(new)?

Ah, yeah, I forgot we had that now. It would work here, but note that
this code ends up going away later in the series.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] write_entry: fix leak when retrying delayed filter
  2017-10-10  9:25                     ` Jeff King
@ 2017-10-10  9:49                       ` Simon Ruderich
  0 siblings, 0 replies; 26+ messages in thread
From: Simon Ruderich @ 2017-10-10  9:49 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Lars Schneider, Git Users, Thomas Gummerer,
	Jonathan Nieder

On Tue, Oct 10, 2017 at 05:25:43AM -0400, Jeff King wrote:
> On Tue, Oct 10, 2017 at 11:23:19AM +0200, Simon Ruderich wrote:
>> On Tue, Oct 10, 2017 at 09:00:09AM +0900, Junio C Hamano wrote:
>>>> --- a/entry.c
>>>> +++ b/entry.c
>>>> @@ -283,6 +283,7 @@ static int write_entry(struct cache_entry *ce,
>>>>  			if (dco && dco->state != CE_NO_DELAY) {
>>>>  				/* Do not send the blob in case of a retry. */
>>>>  				if (dco->state == CE_RETRY) {
>>>> +					free(new);
>>>>  					new = NULL;
>>>>  					size = 0;
>>>>  				}
>>
>> FREE_AND_NULL(new)?
>
> Ah, yeah, I forgot we had that now. It would work here, but note that
> this code ends up going away later in the series.

Ah sorry, missed that. Then never mind.

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-10-10  9:49 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-05 10:44 [PATCH v1 0/2] fix temporary garbage in the cache entry lars.schneider
2017-10-05 10:44 ` [PATCH v1 1/2] entry.c: update cache entry only for existing files lars.schneider
2017-10-05 11:12   ` Jeff King
2017-10-05 11:19   ` Junio C Hamano
2017-10-05 11:26     ` Jeff King
2017-10-05 23:01       ` Junio C Hamano
2017-10-06  4:54         ` Jeff King
2017-10-08 21:37           ` Lars Schneider
2017-10-09 17:47             ` Jeff King
2017-10-09 17:48               ` [PATCH 1/3] write_entry: fix leak when retrying delayed filter Jeff King
2017-10-10  0:00                 ` Junio C Hamano
2017-10-10  9:23                   ` Simon Ruderich
2017-10-10  9:25                     ` Jeff King
2017-10-10  9:49                       ` Simon Ruderich
2017-10-09 17:48               ` [PATCH 2/3] write_entry: avoid reading blobs in CE_RETRY case Jeff King
2017-10-10  0:00                 ` Junio C Hamano
2017-10-09 17:50               ` [PATCH 3/3] write_entry: untangle symlink and regular-file cases Jeff King
2017-10-10  0:03                 ` Junio C Hamano
2017-10-05 10:44 ` [PATCH v1 2/2] entry.c: check if file exists after checkout lars.schneider
2017-10-05 11:23   ` Jeff King
2017-10-06  4:26     ` Junio C Hamano
2017-10-06  4:56       ` Jeff King
2017-10-06  6:03         ` Junio C Hamano
2017-10-06  6:05           ` Jeff King
2017-10-06  7:58             ` Junio C Hamano
2017-10-08 21:41         ` Lars Schneider

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).