git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] apply: reload .gitattributes after patching it
@ 2019-08-09 10:02 brian m. carlson
  2019-08-09 11:14 ` Taylor Blau
                   ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-09 10:02 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Duy Nguyen, Junio C Hamano

When applying multiple patches with git am, or when rebasing using the
am backend, it's possible that one of our patches has updated a
gitattributes file. Currently, we cache this information, so if a
file in a subsequent patch has attributes applied, the file will be
written out with the attributes in place as of the time we started the
rebase or am operation, not with the attributes applied by the previous
patch. This problem does not occur when using the -m or -i flags to
rebase.

To ensure we write the correct data into the working tree, expire the
cache after each patch that touches a path ending in ".gitattributes".

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 apply.c           | 11 +++++++++++
 convert.c         |  9 ++++++++-
 convert.h         |  6 ++++++
 t/t3400-rebase.sh | 23 +++++++++++++++++++++++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index cde95369bb..b959b88b8e 100644
--- a/apply.c
+++ b/apply.c
@@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
 	struct patch *list = NULL, **listp = &list;
 	int skipped_patch = 0;
 	int res = 0;
+	int flush_attributes = 0;
 
 	state->patch_input_file = filename;
 	if (read_patch_file(&buf, fd) < 0)
@@ -4670,6 +4671,14 @@ static int apply_patch(struct apply_state *state,
 			patch_stats(state, patch);
 			*listp = patch;
 			listp = &patch->next;
+
+			if (!flush_attributes && patch->new_name) {
+				char *dummy = strip_path_suffix(patch->new_name, GITATTRIBUTES_FILE);
+				/* The patch applied to a .gitattributes file. */
+				if (dummy)
+					flush_attributes = 1;
+				free(dummy);
+			}
 		}
 		else {
 			if (state->apply_verbosity > verbosity_normal)
@@ -4746,6 +4755,8 @@ static int apply_patch(struct apply_state *state,
 	if (state->summary && state->apply_verbosity > verbosity_silent)
 		summary_patch_list(list);
 
+	if (flush_attributes)
+		reset_parsed_attributes();
 end:
 	free_patch_list(list);
 	strbuf_release(&buf);
diff --git a/convert.c b/convert.c
index 94ff837649..030e9b81b9 100644
--- a/convert.c
+++ b/convert.c
@@ -1293,10 +1293,11 @@ struct conv_attrs {
 	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
 };
 
+static struct attr_check *check;
+
 static void convert_attrs(const struct index_state *istate,
 			  struct conv_attrs *ca, const char *path)
 {
-	static struct attr_check *check;
 	struct attr_check_item *ccheck = NULL;
 
 	if (!check) {
@@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate,
 		ca->crlf_action = CRLF_AUTO_INPUT;
 }
 
+void reset_parsed_attributes(void)
+{
+	attr_check_free(check);
+	check = NULL;
+}
+
 int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
diff --git a/convert.h b/convert.h
index 831559f10d..3710969d43 100644
--- a/convert.h
+++ b/convert.h
@@ -94,6 +94,12 @@ void convert_to_git_filter_fd(const struct index_state *istate,
 int would_convert_to_git_filter_fd(const struct index_state *istate,
 				   const char *path);
 
+/*
+ * Reset the internal list of attributes used by convert_to_git and
+ * convert_to_working_tree.
+ */
+void reset_parsed_attributes(void);
+
 /*****************************************************************
  *
  * Streaming conversion support
diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
index 80b23fd326..062dc41df7 100755
--- a/t/t3400-rebase.sh
+++ b/t/t3400-rebase.sh
@@ -301,6 +301,29 @@ test_expect_success 'rebase --am and --show-current-patch' '
 	)
 '
 
+test_expect_success 'rebase --am and .gitattributes' '
+	test_create_repo attributes &&
+	(
+		cd attributes &&
+		test_commit init &&
+		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
+		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
+
+		test_commit second &&
+		git checkout -b test HEAD^ &&
+
+		echo "*.txt filter=test" >.gitattributes &&
+		git add .gitattributes &&
+		test_commit third &&
+
+		echo "This text is smudged." >a.txt &&
+		git add a.txt &&
+		test_commit fourth &&
+		git rebase master &&
+		grep "smudged" a.txt
+	)
+'
+
 test_expect_success 'rebase--merge.sh and --show-current-patch' '
 	test_create_repo conflict-merge &&
 	(

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] apply: reload .gitattributes after patching it
  2019-08-09 10:02 [PATCH] apply: reload .gitattributes after patching it brian m. carlson
@ 2019-08-09 11:14 ` Taylor Blau
  2019-08-09 11:25   ` brian m. carlson
  2019-08-11 17:47 ` [PATCH v2 0/2] Honor .gitattributes with rebase --am brian m. carlson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2019-08-09 11:14 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Stefan Beller, Duy Nguyen, Junio C Hamano

Hi brian,

On Fri, Aug 09, 2019 at 10:02:17AM +0000, brian m. carlson wrote:
> When applying multiple patches with git am, or when rebasing using the
> am backend, it's possible that one of our patches has updated a
> gitattributes file. Currently, we cache this information, so if a
> file in a subsequent patch has attributes applied, the file will be
> written out with the attributes in place as of the time we started the
> rebase or am operation, not with the attributes applied by the previous
> patch. This problem does not occur when using the -m or -i flags to
> rebase.

Ah. Thanks for a description of the issue.

> To ensure we write the correct data into the working tree, expire the
> cache after each patch that touches a path ending in ".gitattributes".

OK, that seems like a sensible direction...

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  apply.c           | 11 +++++++++++
>  convert.c         |  9 ++++++++-
>  convert.h         |  6 ++++++
>  t/t3400-rebase.sh | 23 +++++++++++++++++++++++
>  4 files changed, 48 insertions(+), 1 deletion(-)
>
> diff --git a/apply.c b/apply.c
> index cde95369bb..b959b88b8e 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
>  	struct patch *list = NULL, **listp = &list;
>  	int skipped_patch = 0;
>  	int res = 0;
> +	int flush_attributes = 0;
>
>  	state->patch_input_file = filename;
>  	if (read_patch_file(&buf, fd) < 0)
> @@ -4670,6 +4671,14 @@ static int apply_patch(struct apply_state *state,
>  			patch_stats(state, patch);
>  			*listp = patch;
>  			listp = &patch->next;
> +
> +			if (!flush_attributes && patch->new_name) {
> +				char *dummy = strip_path_suffix(patch->new_name, GITATTRIBUTES_FILE);

It's a shame that 'strip_path_suffix' doesn't take a 'char *out', and
accept NULL for that (which would save us the assignment and subsequent
'free'). In either case, this is certainly the appropriate function.

But, I'm not sure if 'dummy' is the best name for this variable or not.
My first thought was that I'd expect this to be named 'suffix' (or the
less descriptive 'p'), but I don't know if the change is warranted.
What *are* we supposed to call this variable?
"path_after_gitattributes" ;-)?

> +				/* The patch applied to a .gitattributes file. */
> +				if (dummy)
> +					flush_attributes = 1;
> +				free(dummy);
> +			}
>  		}
>  		else {
>  			if (state->apply_verbosity > verbosity_normal)
> @@ -4746,6 +4755,8 @@ static int apply_patch(struct apply_state *state,
>  	if (state->summary && state->apply_verbosity > verbosity_silent)
>  		summary_patch_list(list);
>
> +	if (flush_attributes)
> +		reset_parsed_attributes();
>  end:
>  	free_patch_list(list);
>  	strbuf_release(&buf);
> diff --git a/convert.c b/convert.c
> index 94ff837649..030e9b81b9 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -1293,10 +1293,11 @@ struct conv_attrs {
>  	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
>  };
>
> +static struct attr_check *check;
> +
>  static void convert_attrs(const struct index_state *istate,
>  			  struct conv_attrs *ca, const char *path)
>  {
> -	static struct attr_check *check;
>  	struct attr_check_item *ccheck = NULL;
>
>  	if (!check) {
> @@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate,
>  		ca->crlf_action = CRLF_AUTO_INPUT;
>  }
>
> +void reset_parsed_attributes(void)
> +{
> +	attr_check_free(check);
> +	check = NULL;
> +}
> +

Makes sense.

>  int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
>  {
>  	struct conv_attrs ca;
> diff --git a/convert.h b/convert.h
> index 831559f10d..3710969d43 100644
> --- a/convert.h
> +++ b/convert.h
> @@ -94,6 +94,12 @@ void convert_to_git_filter_fd(const struct index_state *istate,
>  int would_convert_to_git_filter_fd(const struct index_state *istate,
>  				   const char *path);
>
> +/*
> + * Reset the internal list of attributes used by convert_to_git and
> + * convert_to_working_tree.
> + */
> +void reset_parsed_attributes(void);
> +
>  /*****************************************************************
>   *
>   * Streaming conversion support
> diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
> index 80b23fd326..062dc41df7 100755
> --- a/t/t3400-rebase.sh
> +++ b/t/t3400-rebase.sh
> @@ -301,6 +301,29 @@ test_expect_success 'rebase --am and --show-current-patch' '
>  	)
>  '
>
> +test_expect_success 'rebase --am and .gitattributes' '
> +	test_create_repo attributes &&
> +	(
> +		cd attributes &&
> +		test_commit init &&
> +		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
> +		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
> +
> +		test_commit second &&
> +		git checkout -b test HEAD^ &&
> +
> +		echo "*.txt filter=test" >.gitattributes &&
> +		git add .gitattributes &&
> +		test_commit third &&
> +
> +		echo "This text is smudged." >a.txt &&
> +		git add a.txt &&
> +		test_commit fourth &&
> +		git rebase master &&
> +		grep "smudged" a.txt
> +	)
> +'
> +

This definitely exercise the behavior here. Thanks for adding a test
while you're at it.

>  test_expect_success 'rebase--merge.sh and --show-current-patch' '
>  	test_create_repo conflict-merge &&
>  	(

I wouldn't be opposed to renaming 'dummy' to be something else, but in
the case that you don't feel a re-roll is necessary, please consider my:

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] apply: reload .gitattributes after patching it
  2019-08-09 11:14 ` Taylor Blau
@ 2019-08-09 11:25   ` brian m. carlson
  2019-08-09 11:36     ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-08-09 11:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Stefan Beller, Duy Nguyen, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2154 bytes --]

On 2019-08-09 at 11:14:52, Taylor Blau wrote:
> > diff --git a/apply.c b/apply.c
> > index cde95369bb..b959b88b8e 100644
> > --- a/apply.c
> > +++ b/apply.c
> > @@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
> >  	struct patch *list = NULL, **listp = &list;
> >  	int skipped_patch = 0;
> >  	int res = 0;
> > +	int flush_attributes = 0;
> >
> >  	state->patch_input_file = filename;
> >  	if (read_patch_file(&buf, fd) < 0)
> > @@ -4670,6 +4671,14 @@ static int apply_patch(struct apply_state *state,
> >  			patch_stats(state, patch);
> >  			*listp = patch;
> >  			listp = &patch->next;
> > +
> > +			if (!flush_attributes && patch->new_name) {
> > +				char *dummy = strip_path_suffix(patch->new_name, GITATTRIBUTES_FILE);
> 
> It's a shame that 'strip_path_suffix' doesn't take a 'char *out', and
> accept NULL for that (which would save us the assignment and subsequent
> 'free'). In either case, this is certainly the appropriate function.

Yeah, I felt the same way. We could refactor this out into two separate
functions, one which returns an ssize_t and one which actually
allocates, but I'm not sure it makes a huge amount of sense with just
one caller. The allocation is relatively small, and I've tried to make
sure it's called exactly once per patch so as not to be wasteful and
inefficient.

> But, I'm not sure if 'dummy' is the best name for this variable or not.
> My first thought was that I'd expect this to be named 'suffix' (or the
> less descriptive 'p'), but I don't know if the change is warranted.
> What *are* we supposed to call this variable?
> "path_after_gitattributes" ;-)?

I struggled with this variable name, as I'm sure you guessed. I could
call it "has_suffix" or something similar. I have mixed feelings about
naming pointer variables like booleans since it makes things prone to
misreading, but I think that might be the sanest thing to do here. I can
certainly do "p" as well.

I'll see if there are other comments about directions to take here, and
then reroll.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] apply: reload .gitattributes after patching it
  2019-08-09 11:25   ` brian m. carlson
@ 2019-08-09 11:36     ` Jeff King
  2019-08-09 11:47       ` brian m. carlson
  2019-08-09 12:43       ` SZEDER Gábor
  0 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2019-08-09 11:36 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Taylor Blau, git, Stefan Beller, Duy Nguyen, Junio C Hamano

On Fri, Aug 09, 2019 at 11:25:52AM +0000, brian m. carlson wrote:

> > > +			if (!flush_attributes && patch->new_name) {
> > > +				char *dummy = strip_path_suffix(patch->new_name, GITATTRIBUTES_FILE);
> > 
> > It's a shame that 'strip_path_suffix' doesn't take a 'char *out', and
> > accept NULL for that (which would save us the assignment and subsequent
> > 'free'). In either case, this is certainly the appropriate function.
> 
> Yeah, I felt the same way. We could refactor this out into two separate
> functions, one which returns an ssize_t and one which actually
> allocates, but I'm not sure it makes a huge amount of sense with just
> one caller. The allocation is relatively small, and I've tried to make
> sure it's called exactly once per patch so as not to be wasteful and
> inefficient.

I think you could do this with:

  size_t len;
  if (strip_suffix(patch->new_name, GITATTRIBUTES_FILE, &len) &&
      len > 0 && is_dir_sep(patch->new_name[len-1]))
          flush_attributes = 1;

Not sure if that is better or worse. It avoids the extra memory
boilerplate, but the logic is a slightly more subtle.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] apply: reload .gitattributes after patching it
  2019-08-09 11:36     ` Jeff King
@ 2019-08-09 11:47       ` brian m. carlson
  2019-08-09 12:43       ` SZEDER Gábor
  1 sibling, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-09 11:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Stefan Beller, Duy Nguyen, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

On 2019-08-09 at 11:36:14, Jeff King wrote:
> I think you could do this with:
> 
>   size_t len;
>   if (strip_suffix(patch->new_name, GITATTRIBUTES_FILE, &len) &&
>       len > 0 && is_dir_sep(patch->new_name[len-1]))
>           flush_attributes = 1;
> 
> Not sure if that is better or worse. It avoids the extra memory
> boilerplate, but the logic is a slightly more subtle.

I think an approach like that would be fine with a comment. I'm not sure
this exact approach works with -p0, though, so I'll have to tweak it a
bit.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] apply: reload .gitattributes after patching it
  2019-08-09 11:36     ` Jeff King
  2019-08-09 11:47       ` brian m. carlson
@ 2019-08-09 12:43       ` SZEDER Gábor
  2019-08-09 13:51         ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: SZEDER Gábor @ 2019-08-09 12:43 UTC (permalink / raw)
  To: Jeff King
  Cc: brian m. carlson, Taylor Blau, git, Stefan Beller, Duy Nguyen,
	Junio C Hamano

On Fri, Aug 09, 2019 at 07:36:14AM -0400, Jeff King wrote:
> On Fri, Aug 09, 2019 at 11:25:52AM +0000, brian m. carlson wrote:
> 
> > > > +			if (!flush_attributes && patch->new_name) {
> > > > +				char *dummy = strip_path_suffix(patch->new_name, GITATTRIBUTES_FILE);
> > > 
> > > It's a shame that 'strip_path_suffix' doesn't take a 'char *out', and
> > > accept NULL for that (which would save us the assignment and subsequent
> > > 'free'). In either case, this is certainly the appropriate function.
> > 
> > Yeah, I felt the same way. We could refactor this out into two separate
> > functions, one which returns an ssize_t and one which actually
> > allocates, but I'm not sure it makes a huge amount of sense with just
> > one caller. The allocation is relatively small, and I've tried to make
> > sure it's called exactly once per patch so as not to be wasteful and
> > inefficient.
> 
> I think you could do this with:
> 
>   size_t len;
>   if (strip_suffix(patch->new_name, GITATTRIBUTES_FILE, &len) &&
>       len > 0 && is_dir_sep(patch->new_name[len-1]))
>           flush_attributes = 1;
> 
> Not sure if that is better or worse. It avoids the extra memory
> boilerplate, but the logic is a slightly more subtle.

Subtle indeed :) because you have to allow len == 0 to catch the
toplevel .gitattributes file.

But there is an other subtlety here: when I read the commit message
saying "patch that touches a path ending in ".gitattributes"." and saw
the new call to strip_path_suffix(), I immediately thought what would
happen with a file called 'foo.gitattributes'.  Only when I looked
into strip_path_suffix() became it clear that it only removes full
path components, so such a filename won't cause any trouble (though
perhaps the worst thing that could happen is that we unnecessarily
flush the attributes cache).


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] apply: reload .gitattributes after patching it
  2019-08-09 12:43       ` SZEDER Gábor
@ 2019-08-09 13:51         ` Jeff King
  2019-08-09 22:08           ` brian m. carlson
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-08-09 13:51 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: brian m. carlson, Taylor Blau, git, Stefan Beller, Duy Nguyen,
	Junio C Hamano

On Fri, Aug 09, 2019 at 02:43:18PM +0200, SZEDER Gábor wrote:

> > I think you could do this with:
> > 
> >   size_t len;
> >   if (strip_suffix(patch->new_name, GITATTRIBUTES_FILE, &len) &&
> >       len > 0 && is_dir_sep(patch->new_name[len-1]))
> >           flush_attributes = 1;
> > 
> > Not sure if that is better or worse. It avoids the extra memory
> > boilerplate, but the logic is a slightly more subtle.
> 
> Subtle indeed :) because you have to allow len == 0 to catch the
> toplevel .gitattributes file.

I'll pretend I was leaving that as a puzzle for the readers. :)

> But there is an other subtlety here: when I read the commit message
> saying "patch that touches a path ending in ".gitattributes"." and saw
> the new call to strip_path_suffix(), I immediately thought what would
> happen with a file called 'foo.gitattributes'.  Only when I looked
> into strip_path_suffix() became it clear that it only removes full
> path components, so such a filename won't cause any trouble (though
> perhaps the worst thing that could happen is that we unnecessarily
> flush the attributes cache).

Right. I think the term we want here is really "basename". So in fact:

  if (!strcmp(basename(patch->new_name), GITATTRIBUTES_FILE))

would do what we want, except for the annoying caveat that basename() is
allowed to modify its parameter (to remove trailing directory
separators, but we know we wouldn't have them here).

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] apply: reload .gitattributes after patching it
  2019-08-09 13:51         ` Jeff King
@ 2019-08-09 22:08           ` brian m. carlson
  0 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-09 22:08 UTC (permalink / raw)
  To: Jeff King
  Cc: SZEDER Gábor, Taylor Blau, git, Stefan Beller, Duy Nguyen,
	Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1490 bytes --]

On 2019-08-09 at 13:51:49, Jeff King wrote:
> On Fri, Aug 09, 2019 at 02:43:18PM +0200, SZEDER Gábor wrote:
> > But there is an other subtlety here: when I read the commit message
> > saying "patch that touches a path ending in ".gitattributes"." and saw
> > the new call to strip_path_suffix(), I immediately thought what would
> > happen with a file called 'foo.gitattributes'.  Only when I looked
> > into strip_path_suffix() became it clear that it only removes full
> > path components, so such a filename won't cause any trouble (though
> > perhaps the worst thing that could happen is that we unnecessarily
> > flush the attributes cache).
> 
> Right. I think the term we want here is really "basename". So in fact:
> 
>   if (!strcmp(basename(patch->new_name), GITATTRIBUTES_FILE))
> 
> would do what we want, except for the annoying caveat that basename() is
> allowed to modify its parameter (to remove trailing directory
> separators, but we know we wouldn't have them here).

I think this is exactly the function I'm looking for. I'm a little
uncomfortable relying on the fact that typical implementations don't
modify the string when there's no trailing slash, though.

I think I'm going to end up refactoring out the strip_path_suffix
function into a has_path_suffix and use that. That will avoid the
allocation and it doesn't rely on the generosity of OS implementers.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/2] Honor .gitattributes with rebase --am
  2019-08-09 10:02 [PATCH] apply: reload .gitattributes after patching it brian m. carlson
  2019-08-09 11:14 ` Taylor Blau
@ 2019-08-11 17:47 ` brian m. carlson
  2019-08-11 17:47   ` [PATCH v2 1/2] path: add a function to check for path suffix brian m. carlson
  2019-08-11 17:47   ` [PATCH v2 2/2] apply: reload .gitattributes after patching it brian m. carlson
  2019-08-13  2:43 ` [PATCH v3 0/2] Honor .gitattributes with rebase --am brian m. carlson
  2019-08-18 18:44 ` [PATCH v4 0/2] Honor .gitattributes with rebase --am brian m. carlson
  3 siblings, 2 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-11 17:47 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

This series makes rebase --am honor the .gitattributes file for
subsequent patches when a patch changes it.

Changes from v1:
* Add has_path_suffix in a separate commit.

brian m. carlson (2):
  path: add a function to check for path suffix
  apply: reload .gitattributes after patching it

 apply.c           |  7 +++++++
 convert.c         |  9 ++++++++-
 convert.h         |  6 ++++++
 path.c            | 39 ++++++++++++++++++++++++++++++---------
 path.h            |  3 +++
 t/t3400-rebase.sh | 23 +++++++++++++++++++++++
 6 files changed, 77 insertions(+), 10 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  e865872fc3 :rebase-gitattributes
-:  ---------- > 2:  125fda966c path: add a function to check for path suffix
1:  d7ea19aeef ! 3:  f54af7e595 apply: reload .gitattributes after patching it
    @@ apply.c: static int apply_patch(struct apply_state *state,
      			*listp = patch;
      			listp = &patch->next;
     +
    -+			if (!flush_attributes && patch->new_name) {
    -+				char *dummy = strip_path_suffix(patch->new_name, GITATTRIBUTES_FILE);
    -+				/* The patch applied to a .gitattributes file. */
    -+				if (dummy)
    -+					flush_attributes = 1;
    -+				free(dummy);
    -+			}
    ++			if (!flush_attributes && patch->new_name &&
    ++			    has_path_suffix(patch->new_name, GITATTRIBUTES_FILE))
    ++				flush_attributes = 1;
      		}
      		else {
      			if (state->apply_verbosity > verbosity_normal)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-11 17:47 ` [PATCH v2 0/2] Honor .gitattributes with rebase --am brian m. carlson
@ 2019-08-11 17:47   ` brian m. carlson
  2019-08-12  0:32     ` Junio C Hamano
  2019-08-11 17:47   ` [PATCH v2 2/2] apply: reload .gitattributes after patching it brian m. carlson
  1 sibling, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-08-11 17:47 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

We have a function to strip the path suffix from a commit, but we don't
have one to check for a path suffix. For a plain filename, we can use
basename, but that requires an allocation, since POSIX allows it to
modify its argument. Refactor strip_path_suffix into a helper function
and a new function, has_path_suffix to meet this need.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 path.c | 39 ++++++++++++++++++++++++++++++---------
 path.h |  3 +++
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/path.c b/path.c
index 25e97b8c3f..e193c62b7d 100644
--- a/path.c
+++ b/path.c
@@ -1221,31 +1221,52 @@ static inline int chomp_trailing_dir_sep(const char *path, int len)
 }
 
 /*
- * If path ends with suffix (complete path components), returns the
- * part before suffix (sans trailing directory separators).
- * Otherwise returns NULL.
+ * If path ends with suffix (complete path components), returns the offset of
+ * the last character in the path before the suffix (sans trailing directory
+ * separators), and -1 otherwise.
  */
-char *strip_path_suffix(const char *path, const char *suffix)
+static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
 {
 	int path_len = strlen(path), suffix_len = strlen(suffix);
 
 	while (suffix_len) {
 		if (!path_len)
-			return NULL;
+			return -1;
 
 		if (is_dir_sep(path[path_len - 1])) {
 			if (!is_dir_sep(suffix[suffix_len - 1]))
-				return NULL;
+				return -1;
 			path_len = chomp_trailing_dir_sep(path, path_len);
 			suffix_len = chomp_trailing_dir_sep(suffix, suffix_len);
 		}
 		else if (path[--path_len] != suffix[--suffix_len])
-			return NULL;
+			return -1;
 	}
 
 	if (path_len && !is_dir_sep(path[path_len - 1]))
-		return NULL;
-	return xstrndup(path, chomp_trailing_dir_sep(path, path_len));
+		return -1;
+	return chomp_trailing_dir_sep(path, path_len);
+}
+
+/*
+ * Returns true if the path ends with suffix, considering only complete path
+ * components and false otherwise.
+ */
+int has_path_suffix(const char *path, const char *suffix)
+{
+	return stripped_path_suffix_offset(path, suffix) != -1;
+}
+
+/*
+ * If path ends with suffix (complete path components), returns the
+ * part before suffix (sans trailing directory separators).
+ * Otherwise returns NULL.
+ */
+char *strip_path_suffix(const char *path, const char *suffix)
+{
+	ssize_t offset = stripped_path_suffix_offset(path, suffix);
+
+	return offset == -1 ? NULL : xstrndup(path, offset);
 }
 
 int daemon_avoid_alias(const char *p)
diff --git a/path.h b/path.h
index 2ba6ca58c8..c01d045786 100644
--- a/path.h
+++ b/path.h
@@ -193,4 +193,7 @@ const char *git_path_merge_head(struct repository *r);
 const char *git_path_fetch_head(struct repository *r);
 const char *git_path_shallow(struct repository *r);
 
+
+int has_path_suffix(const char *path, const char *suffix);
+
 #endif /* PATH_H */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 2/2] apply: reload .gitattributes after patching it
  2019-08-11 17:47 ` [PATCH v2 0/2] Honor .gitattributes with rebase --am brian m. carlson
  2019-08-11 17:47   ` [PATCH v2 1/2] path: add a function to check for path suffix brian m. carlson
@ 2019-08-11 17:47   ` brian m. carlson
  1 sibling, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-11 17:47 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

When applying multiple patches with git am, or when rebasing using the
am backend, it's possible that one of our patches has updated a
gitattributes file. Currently, we cache this information, so if a
file in a subsequent patch has attributes applied, the file will be
written out with the attributes in place as of the time we started the
rebase or am operation, not with the attributes applied by the previous
patch. This problem does not occur when using the -m or -i flags to
rebase.

To ensure we write the correct data into the working tree, expire the
cache after each patch that touches a path ending in ".gitattributes".

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 apply.c           |  7 +++++++
 convert.c         |  9 ++++++++-
 convert.h         |  6 ++++++
 t/t3400-rebase.sh | 23 +++++++++++++++++++++++
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index cde95369bb..201ee12e21 100644
--- a/apply.c
+++ b/apply.c
@@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
 	struct patch *list = NULL, **listp = &list;
 	int skipped_patch = 0;
 	int res = 0;
+	int flush_attributes = 0;
 
 	state->patch_input_file = filename;
 	if (read_patch_file(&buf, fd) < 0)
@@ -4670,6 +4671,10 @@ static int apply_patch(struct apply_state *state,
 			patch_stats(state, patch);
 			*listp = patch;
 			listp = &patch->next;
+
+			if (!flush_attributes && patch->new_name &&
+			    has_path_suffix(patch->new_name, GITATTRIBUTES_FILE))
+				flush_attributes = 1;
 		}
 		else {
 			if (state->apply_verbosity > verbosity_normal)
@@ -4746,6 +4751,8 @@ static int apply_patch(struct apply_state *state,
 	if (state->summary && state->apply_verbosity > verbosity_silent)
 		summary_patch_list(list);
 
+	if (flush_attributes)
+		reset_parsed_attributes();
 end:
 	free_patch_list(list);
 	strbuf_release(&buf);
diff --git a/convert.c b/convert.c
index 94ff837649..030e9b81b9 100644
--- a/convert.c
+++ b/convert.c
@@ -1293,10 +1293,11 @@ struct conv_attrs {
 	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
 };
 
+static struct attr_check *check;
+
 static void convert_attrs(const struct index_state *istate,
 			  struct conv_attrs *ca, const char *path)
 {
-	static struct attr_check *check;
 	struct attr_check_item *ccheck = NULL;
 
 	if (!check) {
@@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate,
 		ca->crlf_action = CRLF_AUTO_INPUT;
 }
 
+void reset_parsed_attributes(void)
+{
+	attr_check_free(check);
+	check = NULL;
+}
+
 int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
diff --git a/convert.h b/convert.h
index 831559f10d..3710969d43 100644
--- a/convert.h
+++ b/convert.h
@@ -94,6 +94,12 @@ void convert_to_git_filter_fd(const struct index_state *istate,
 int would_convert_to_git_filter_fd(const struct index_state *istate,
 				   const char *path);
 
+/*
+ * Reset the internal list of attributes used by convert_to_git and
+ * convert_to_working_tree.
+ */
+void reset_parsed_attributes(void);
+
 /*****************************************************************
  *
  * Streaming conversion support
diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
index 80b23fd326..062dc41df7 100755
--- a/t/t3400-rebase.sh
+++ b/t/t3400-rebase.sh
@@ -301,6 +301,29 @@ test_expect_success 'rebase --am and --show-current-patch' '
 	)
 '
 
+test_expect_success 'rebase --am and .gitattributes' '
+	test_create_repo attributes &&
+	(
+		cd attributes &&
+		test_commit init &&
+		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
+		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
+
+		test_commit second &&
+		git checkout -b test HEAD^ &&
+
+		echo "*.txt filter=test" >.gitattributes &&
+		git add .gitattributes &&
+		test_commit third &&
+
+		echo "This text is smudged." >a.txt &&
+		git add a.txt &&
+		test_commit fourth &&
+		git rebase master &&
+		grep "smudged" a.txt
+	)
+'
+
 test_expect_success 'rebase--merge.sh and --show-current-patch' '
 	test_create_repo conflict-merge &&
 	(

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-11 17:47   ` [PATCH v2 1/2] path: add a function to check for path suffix brian m. carlson
@ 2019-08-12  0:32     ` Junio C Hamano
  2019-08-12  1:10       ` brian m. carlson
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-08-12  0:32 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We have a function to strip the path suffix from a commit, but we don't
> have one to check for a path suffix. For a plain filename, we can use
> basename, but that requires an allocation, since POSIX allows it to
> modify its argument. Refactor strip_path_suffix into a helper function
> and a new function, has_path_suffix to meet this need.

I wish we did not use a crazy phrase "path suffix", that would
inevitably confuse ourselves with things like ".exe".

>  /*
> + * If path ends with suffix (complete path components), returns the offset of
> + * the last character in the path before the suffix (sans trailing directory
> + * separators), and -1 otherwise.

i.e. this is offset to the last path component.

> +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)

Perhaps

    static ssize_t last_path_component_offset(const char *path, const char *name)

I am tempted to also call the second parameter to this function
"basename", as we know from the proposed log message that you wish
"basename" were usable for this purpose, but basename itself has
another confusing factor (i.e. "are we stripping ".exe" extension?",
to which the answer is no in the context of these functions).

If we agree with the "last path component" phrasing, has_path_suffix()
would become something like:

    int last_path_component_equals(const char *path, const char *name);

perhaps.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-12  0:32     ` Junio C Hamano
@ 2019-08-12  1:10       ` brian m. carlson
  2019-08-12  4:36         ` SZEDER Gábor
  2019-08-12 16:49         ` Junio C Hamano
  0 siblings, 2 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-12  1:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]

On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
> 
> Perhaps
> 
>     static ssize_t last_path_component_offset(const char *path, const char *name)
> 
> I am tempted to also call the second parameter to this function
> "basename", as we know from the proposed log message that you wish
> "basename" were usable for this purpose, but basename itself has
> another confusing factor (i.e. "are we stripping ".exe" extension?",
> to which the answer is no in the context of these functions).
> 
> If we agree with the "last path component" phrasing, has_path_suffix()
> would become something like:
> 
>     int last_path_component_equals(const char *path, const char *name);

Except this is not necessarily the last path component. It could match
one or more path components with the way the function is written. If you
want to ignore that and name the function accordingly, I won't object,
but we could theoretically handle a name like "foo/.gitattributes" as
well.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-12  1:10       ` brian m. carlson
@ 2019-08-12  4:36         ` SZEDER Gábor
  2019-08-12 16:49         ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: SZEDER Gábor @ 2019-08-12  4:36 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, git, Taylor Blau, Jeff King

On Mon, Aug 12, 2019 at 01:10:54AM +0000, brian m. carlson wrote:
> On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
> > 
> > Perhaps
> > 
> >     static ssize_t last_path_component_offset(const char *path, const char *name)
> > 
> > I am tempted to also call the second parameter to this function
> > "basename", as we know from the proposed log message that you wish
> > "basename" were usable for this purpose, but basename itself has
> > another confusing factor (i.e. "are we stripping ".exe" extension?",
> > to which the answer is no in the context of these functions).
> > 
> > If we agree with the "last path component" phrasing, has_path_suffix()
> > would become something like:
> > 
> >     int last_path_component_equals(const char *path, const char *name);
> 
> Except this is not necessarily the last path component. It could match
> one or more path components with the way the function is written. If you
> want to ignore that and name the function accordingly, I won't object,
> but we could theoretically handle a name like "foo/.gitattributes" as
> well.

ends_with_path_components(), perhaps?

I think having "path_component" in some form in the function name
would have avoided my confusion mentioned earlier in a reply to the
first version.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-12  1:10       ` brian m. carlson
  2019-08-12  4:36         ` SZEDER Gábor
@ 2019-08-12 16:49         ` Junio C Hamano
  2019-08-12 22:40           ` brian m. carlson
  2019-08-13  6:36           ` SZEDER Gábor
  1 sibling, 2 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-08-12 16:49 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
>> 
>> Perhaps
>> 
>>     static ssize_t last_path_component_offset(const char *path, const char *name)
>> 
>> I am tempted to also call the second parameter to this function
>> "basename", as we know from the proposed log message that you wish
>> "basename" were usable for this purpose, but basename itself has
>> another confusing factor (i.e. "are we stripping ".exe" extension?",
>> to which the answer is no in the context of these functions).
>> 
>> If we agree with the "last path component" phrasing, has_path_suffix()
>> would become something like:
>> 
>>     int last_path_component_equals(const char *path, const char *name);
>
> Except this is not necessarily the last path component. It could match
> one or more path components with the way the function is written.

That's fair.  Is the feature that allows the function called
ends_with_component*S* like Szeder suggests designed one, i.e. with
an explicit purpose of supporting callers that pass "foo/bar" as the
"suffix" to it, or is it merely that the implementation happens to
work that way, even though the expected use that is supported is to
pass only one level component but the implementation did not even
bother asserting that the "suffix" does not have a slash in it?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-12 16:49         ` Junio C Hamano
@ 2019-08-12 22:40           ` brian m. carlson
  2019-08-13  1:13             ` Jeff King
  2019-08-13 16:40             ` Junio C Hamano
  2019-08-13  6:36           ` SZEDER Gábor
  1 sibling, 2 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-12 22:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 2190 bytes --]

On 2019-08-12 at 16:49:20, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
> >> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >> > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
> >> 
> >> Perhaps
> >> 
> >>     static ssize_t last_path_component_offset(const char *path, const char *name)
> >> 
> >> I am tempted to also call the second parameter to this function
> >> "basename", as we know from the proposed log message that you wish
> >> "basename" were usable for this purpose, but basename itself has
> >> another confusing factor (i.e. "are we stripping ".exe" extension?",
> >> to which the answer is no in the context of these functions).
> >> 
> >> If we agree with the "last path component" phrasing, has_path_suffix()
> >> would become something like:
> >> 
> >>     int last_path_component_equals(const char *path, const char *name);
> >
> > Except this is not necessarily the last path component. It could match
> > one or more path components with the way the function is written.
> 
> That's fair.  Is the feature that allows the function called
> ends_with_component*S* like Szeder suggests designed one, i.e. with
> an explicit purpose of supporting callers that pass "foo/bar" as the
> "suffix" to it, or is it merely that the implementation happens to
> work that way, even though the expected use that is supported is to
> pass only one level component but the implementation did not even
> bother asserting that the "suffix" does not have a slash in it?

Well, I split it out from a function that handles multiple path
components, mostly so that I could leverage existing work (and not have
to worry about getting it wrong). It wasn't explicitly intended that it
support multiple components, since I don't require that for my
implementation, but I could see future users taking advantage of that.

I think "ends_with_path_components" might be the way forward, unless
you think something else would be better.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-12 22:40           ` brian m. carlson
@ 2019-08-13  1:13             ` Jeff King
  2019-08-13 16:40             ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-08-13  1:13 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, git, SZEDER Gábor, Taylor Blau

On Mon, Aug 12, 2019 at 10:40:21PM +0000, brian m. carlson wrote:

> I think "ends_with_path_components" might be the way forward, unless
> you think something else would be better.

FWIW, having read the rest of the thread, that was the name that clicked
for me.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 0/2] Honor .gitattributes with rebase --am
  2019-08-09 10:02 [PATCH] apply: reload .gitattributes after patching it brian m. carlson
  2019-08-09 11:14 ` Taylor Blau
  2019-08-11 17:47 ` [PATCH v2 0/2] Honor .gitattributes with rebase --am brian m. carlson
@ 2019-08-13  2:43 ` brian m. carlson
  2019-08-13  2:43   ` [PATCH v3 1/2] path: add a function to check for path suffix brian m. carlson
  2019-08-13  2:43   ` [PATCH v3 2/2] apply: reload .gitattributes after patching it brian m. carlson
  2019-08-18 18:44 ` [PATCH v4 0/2] Honor .gitattributes with rebase --am brian m. carlson
  3 siblings, 2 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-13  2:43 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

This series makes rebase --am honor the .gitattributes file for
subsequent patches when a patch changes it.

Changes from v2:
* Rename has_path_suffix to ends_with_path_components.

Changes from v1:
* Add has_path_suffix in a separate commit.

brian m. carlson (2):
  path: add a function to check for path suffix
  apply: reload .gitattributes after patching it

 apply.c           |  7 +++++++
 convert.c         |  9 ++++++++-
 convert.h         |  6 ++++++
 path.c            | 39 ++++++++++++++++++++++++++++++---------
 path.h            |  3 +++
 t/t3400-rebase.sh | 23 +++++++++++++++++++++++
 6 files changed, 77 insertions(+), 10 deletions(-)

Range-diff against v2:
1:  125fda966c ! 1:  14c853420b path: add a function to check for path suffix
    @@ Commit message
         have one to check for a path suffix. For a plain filename, we can use
         basename, but that requires an allocation, since POSIX allows it to
         modify its argument. Refactor strip_path_suffix into a helper function
    -    and a new function, has_path_suffix to meet this need.
    +    and a new function, ends_with_path_components, to meet this need.
     
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
    @@ path.c: static inline int chomp_trailing_dir_sep(const char *path, int len)
     +}
     +
     +/*
    -+ * Returns true if the path ends with suffix, considering only complete path
    -+ * components and false otherwise.
    ++ * Returns true if the path ends with components, considering only complete path
    ++ * components, and false otherwise.
     + */
    -+int has_path_suffix(const char *path, const char *suffix)
    ++int ends_with_path_components(const char *path, const char *components)
     +{
    -+	return stripped_path_suffix_offset(path, suffix) != -1;
    ++	return stripped_path_suffix_offset(path, components) != -1;
     +}
     +
      /*
    @@ path.h: const char *git_path_merge_head(struct repository *r);
      const char *git_path_shallow(struct repository *r);
      
     +
    -+int has_path_suffix(const char *path, const char *suffix);
    ++int ends_with_path_components(const char *path, const char *components);
     +
      #endif /* PATH_H */
2:  f54af7e595 ! 2:  5f4402b38d apply: reload .gitattributes after patching it
    @@ apply.c: static int apply_patch(struct apply_state *state,
      			listp = &patch->next;
     +
     +			if (!flush_attributes && patch->new_name &&
    -+			    has_path_suffix(patch->new_name, GITATTRIBUTES_FILE))
    ++			    ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE))
     +				flush_attributes = 1;
      		}
      		else {

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 1/2] path: add a function to check for path suffix
  2019-08-13  2:43 ` [PATCH v3 0/2] Honor .gitattributes with rebase --am brian m. carlson
@ 2019-08-13  2:43   ` brian m. carlson
  2019-08-13  2:43   ` [PATCH v3 2/2] apply: reload .gitattributes after patching it brian m. carlson
  1 sibling, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-13  2:43 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

We have a function to strip the path suffix from a commit, but we don't
have one to check for a path suffix. For a plain filename, we can use
basename, but that requires an allocation, since POSIX allows it to
modify its argument. Refactor strip_path_suffix into a helper function
and a new function, ends_with_path_components, to meet this need.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 path.c | 39 ++++++++++++++++++++++++++++++---------
 path.h |  3 +++
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/path.c b/path.c
index 25e97b8c3f..e3da1f3c4e 100644
--- a/path.c
+++ b/path.c
@@ -1221,31 +1221,52 @@ static inline int chomp_trailing_dir_sep(const char *path, int len)
 }
 
 /*
- * If path ends with suffix (complete path components), returns the
- * part before suffix (sans trailing directory separators).
- * Otherwise returns NULL.
+ * If path ends with suffix (complete path components), returns the offset of
+ * the last character in the path before the suffix (sans trailing directory
+ * separators), and -1 otherwise.
  */
-char *strip_path_suffix(const char *path, const char *suffix)
+static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
 {
 	int path_len = strlen(path), suffix_len = strlen(suffix);
 
 	while (suffix_len) {
 		if (!path_len)
-			return NULL;
+			return -1;
 
 		if (is_dir_sep(path[path_len - 1])) {
 			if (!is_dir_sep(suffix[suffix_len - 1]))
-				return NULL;
+				return -1;
 			path_len = chomp_trailing_dir_sep(path, path_len);
 			suffix_len = chomp_trailing_dir_sep(suffix, suffix_len);
 		}
 		else if (path[--path_len] != suffix[--suffix_len])
-			return NULL;
+			return -1;
 	}
 
 	if (path_len && !is_dir_sep(path[path_len - 1]))
-		return NULL;
-	return xstrndup(path, chomp_trailing_dir_sep(path, path_len));
+		return -1;
+	return chomp_trailing_dir_sep(path, path_len);
+}
+
+/*
+ * Returns true if the path ends with components, considering only complete path
+ * components, and false otherwise.
+ */
+int ends_with_path_components(const char *path, const char *components)
+{
+	return stripped_path_suffix_offset(path, components) != -1;
+}
+
+/*
+ * If path ends with suffix (complete path components), returns the
+ * part before suffix (sans trailing directory separators).
+ * Otherwise returns NULL.
+ */
+char *strip_path_suffix(const char *path, const char *suffix)
+{
+	ssize_t offset = stripped_path_suffix_offset(path, suffix);
+
+	return offset == -1 ? NULL : xstrndup(path, offset);
 }
 
 int daemon_avoid_alias(const char *p)
diff --git a/path.h b/path.h
index 2ba6ca58c8..14d6dcad16 100644
--- a/path.h
+++ b/path.h
@@ -193,4 +193,7 @@ const char *git_path_merge_head(struct repository *r);
 const char *git_path_fetch_head(struct repository *r);
 const char *git_path_shallow(struct repository *r);
 
+
+int ends_with_path_components(const char *path, const char *components);
+
 #endif /* PATH_H */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 2/2] apply: reload .gitattributes after patching it
  2019-08-13  2:43 ` [PATCH v3 0/2] Honor .gitattributes with rebase --am brian m. carlson
  2019-08-13  2:43   ` [PATCH v3 1/2] path: add a function to check for path suffix brian m. carlson
@ 2019-08-13  2:43   ` brian m. carlson
  2019-08-13 18:08     ` Junio C Hamano
  2019-08-13 22:37     ` Junio C Hamano
  1 sibling, 2 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-13  2:43 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

When applying multiple patches with git am, or when rebasing using the
am backend, it's possible that one of our patches has updated a
gitattributes file. Currently, we cache this information, so if a
file in a subsequent patch has attributes applied, the file will be
written out with the attributes in place as of the time we started the
rebase or am operation, not with the attributes applied by the previous
patch. This problem does not occur when using the -m or -i flags to
rebase.

To ensure we write the correct data into the working tree, expire the
cache after each patch that touches a path ending in ".gitattributes".

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 apply.c           |  7 +++++++
 convert.c         |  9 ++++++++-
 convert.h         |  6 ++++++
 t/t3400-rebase.sh | 23 +++++++++++++++++++++++
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index cde95369bb..e5373452ab 100644
--- a/apply.c
+++ b/apply.c
@@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
 	struct patch *list = NULL, **listp = &list;
 	int skipped_patch = 0;
 	int res = 0;
+	int flush_attributes = 0;
 
 	state->patch_input_file = filename;
 	if (read_patch_file(&buf, fd) < 0)
@@ -4670,6 +4671,10 @@ static int apply_patch(struct apply_state *state,
 			patch_stats(state, patch);
 			*listp = patch;
 			listp = &patch->next;
+
+			if (!flush_attributes && patch->new_name &&
+			    ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE))
+				flush_attributes = 1;
 		}
 		else {
 			if (state->apply_verbosity > verbosity_normal)
@@ -4746,6 +4751,8 @@ static int apply_patch(struct apply_state *state,
 	if (state->summary && state->apply_verbosity > verbosity_silent)
 		summary_patch_list(list);
 
+	if (flush_attributes)
+		reset_parsed_attributes();
 end:
 	free_patch_list(list);
 	strbuf_release(&buf);
diff --git a/convert.c b/convert.c
index 94ff837649..030e9b81b9 100644
--- a/convert.c
+++ b/convert.c
@@ -1293,10 +1293,11 @@ struct conv_attrs {
 	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
 };
 
+static struct attr_check *check;
+
 static void convert_attrs(const struct index_state *istate,
 			  struct conv_attrs *ca, const char *path)
 {
-	static struct attr_check *check;
 	struct attr_check_item *ccheck = NULL;
 
 	if (!check) {
@@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate,
 		ca->crlf_action = CRLF_AUTO_INPUT;
 }
 
+void reset_parsed_attributes(void)
+{
+	attr_check_free(check);
+	check = NULL;
+}
+
 int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
diff --git a/convert.h b/convert.h
index 831559f10d..3710969d43 100644
--- a/convert.h
+++ b/convert.h
@@ -94,6 +94,12 @@ void convert_to_git_filter_fd(const struct index_state *istate,
 int would_convert_to_git_filter_fd(const struct index_state *istate,
 				   const char *path);
 
+/*
+ * Reset the internal list of attributes used by convert_to_git and
+ * convert_to_working_tree.
+ */
+void reset_parsed_attributes(void);
+
 /*****************************************************************
  *
  * Streaming conversion support
diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
index 80b23fd326..062dc41df7 100755
--- a/t/t3400-rebase.sh
+++ b/t/t3400-rebase.sh
@@ -301,6 +301,29 @@ test_expect_success 'rebase --am and --show-current-patch' '
 	)
 '
 
+test_expect_success 'rebase --am and .gitattributes' '
+	test_create_repo attributes &&
+	(
+		cd attributes &&
+		test_commit init &&
+		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
+		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
+
+		test_commit second &&
+		git checkout -b test HEAD^ &&
+
+		echo "*.txt filter=test" >.gitattributes &&
+		git add .gitattributes &&
+		test_commit third &&
+
+		echo "This text is smudged." >a.txt &&
+		git add a.txt &&
+		test_commit fourth &&
+		git rebase master &&
+		grep "smudged" a.txt
+	)
+'
+
 test_expect_success 'rebase--merge.sh and --show-current-patch' '
 	test_create_repo conflict-merge &&
 	(

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-12 16:49         ` Junio C Hamano
  2019-08-12 22:40           ` brian m. carlson
@ 2019-08-13  6:36           ` SZEDER Gábor
  2019-08-13 16:42             ` Junio C Hamano
  1 sibling, 1 reply; 41+ messages in thread
From: SZEDER Gábor @ 2019-08-13  6:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git, Taylor Blau, Jeff King

On Mon, Aug 12, 2019 at 09:49:20AM -0700, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
> >> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >> > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
> >> 
> >> Perhaps
> >> 
> >>     static ssize_t last_path_component_offset(const char *path, const char *name)
> >> 
> >> I am tempted to also call the second parameter to this function
> >> "basename", as we know from the proposed log message that you wish
> >> "basename" were usable for this purpose, but basename itself has
> >> another confusing factor (i.e. "are we stripping ".exe" extension?",
> >> to which the answer is no in the context of these functions).
> >> 
> >> If we agree with the "last path component" phrasing, has_path_suffix()
> >> would become something like:
> >> 
> >>     int last_path_component_equals(const char *path, const char *name);
> >
> > Except this is not necessarily the last path component. It could match
> > one or more path components with the way the function is written.
> 
> That's fair.  Is the feature that allows the function called
> ends_with_component*S* like Szeder suggests designed one, i.e. with
> an explicit purpose of supporting callers that pass "foo/bar" as the
> "suffix" to it, or is it merely that the implementation happens to
> work that way, even though the expected use that is supported is to
> pass only one level component but the implementation did not even
> bother asserting that the "suffix" does not have a slash in it?

The plural in the suggested function name was intentional on my part,
even though in this callsite in question we are only interested in the
filename, i.e. only a single path component.

I was hoping that the names of these related functions will be kept in
sync, and all will somehow contain the "path_components" substring,
e.g. strip_path_suffix() becomes strip_suffix_path_components() or
something.  And that function must be able to handle multiple path
components, becase there is this callsite:

  exec-cmd.c:         !(prefix = strip_path_suffix(executable_dirname, GIT_EXEC_PATH)) &&

and the build sets '-DGIT_EXEC_PATH="libexec/git-core"' by default.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-12 22:40           ` brian m. carlson
  2019-08-13  1:13             ` Jeff King
@ 2019-08-13 16:40             ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-08-13 16:40 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Well, I split it out from a function that handles multiple path
> components, mostly so that I could leverage existing work (and not have
> to worry about getting it wrong). It wasn't explicitly intended that it
> support multiple components, since I don't require that for my
> implementation, but I could see future users taking advantage of that.
>
> I think "ends_with_path_components" might be the way forward, unless
> you think something else would be better.

Good; thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/2] path: add a function to check for path suffix
  2019-08-13  6:36           ` SZEDER Gábor
@ 2019-08-13 16:42             ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-08-13 16:42 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: brian m. carlson, git, Taylor Blau, Jeff King

SZEDER Gábor <szeder.dev@gmail.com> writes:

> ...  And that function must be able to handle multiple path
> components, becase there is this callsite:
>
>   exec-cmd.c:         !(prefix = strip_path_suffix(executable_dirname, GIT_EXEC_PATH)) &&
>
> and the build sets '-DGIT_EXEC_PATH="libexec/git-core"' by default.

OK, that answers my earlier question.  We do want to support such a
caller with one or more components at the end.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 2/2] apply: reload .gitattributes after patching it
  2019-08-13  2:43   ` [PATCH v3 2/2] apply: reload .gitattributes after patching it brian m. carlson
@ 2019-08-13 18:08     ` Junio C Hamano
  2019-08-15 22:10       ` Junio C Hamano
  2019-08-13 22:37     ` Junio C Hamano
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-08-13 18:08 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> When applying multiple patches with git am, or when rebasing using the
> am backend, it's possible that one of our patches has updated a
> gitattributes file. Currently, we cache this information, so if a
> file in a subsequent patch has attributes applied, the file will be
> written out with the attributes in place as of the time we started the
> rebase or am operation, not with the attributes applied by the previous
> patch. This problem does not occur when using the -m or -i flags to
> rebase.

Back when "am" was a script that repeatedly calls "apply --index &&
commit-tree", caching the original contents of the attributes file
and using it throughout the series would have been impossible to
begin with, so this must be a regression when we rewrote it in C and
moved to do everything in a single process without clearly the state
between the steps, I guess.

"rebase -m" and "rebase -i" are not repeated run_command() calls
that invoke "git cherry-pick" or "git merge" these days, either, so
I am somewhat curious how they avoid fallilng into the same trap.

Thanks for the fix.  Will queue.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 2/2] apply: reload .gitattributes after patching it
  2019-08-13  2:43   ` [PATCH v3 2/2] apply: reload .gitattributes after patching it brian m. carlson
  2019-08-13 18:08     ` Junio C Hamano
@ 2019-08-13 22:37     ` Junio C Hamano
  2019-08-13 22:48       ` brian m. carlson
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-08-13 22:37 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> +test_expect_success 'rebase --am and .gitattributes' '
> +	test_create_repo attributes &&
> +	(
> +		cd attributes &&
> +		test_commit init &&
> +		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
> +		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&

These eventually invokes test-when-finished for cleaning things up,
that cannot be called inside a subshell.  The "attributes" test
repository is a throw-away reopsitory, so we should be able to just
use "git config" to set the variables locally in it, no?

> +		test_commit second &&
> +		git checkout -b test HEAD^ &&
> +
> +		echo "*.txt filter=test" >.gitattributes &&
> +		git add .gitattributes &&
> +		test_commit third &&
> +
> +		echo "This text is smudged." >a.txt &&
> +		git add a.txt &&
> +		test_commit fourth &&
> +		git rebase master &&
> +		grep "smudged" a.txt
> +	)
> +'
> +
>  test_expect_success 'rebase--merge.sh and --show-current-patch' '
>  	test_create_repo conflict-merge &&
>  	(

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 2/2] apply: reload .gitattributes after patching it
  2019-08-13 22:37     ` Junio C Hamano
@ 2019-08-13 22:48       ` brian m. carlson
  0 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-13 22:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 825 bytes --]

On 2019-08-13 at 22:37:56, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > +test_expect_success 'rebase --am and .gitattributes' '
> > +	test_create_repo attributes &&
> > +	(
> > +		cd attributes &&
> > +		test_commit init &&
> > +		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
> > +		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
> 
> These eventually invokes test-when-finished for cleaning things up,
> that cannot be called inside a subshell.  The "attributes" test
> repository is a throw-away reopsitory, so we should be able to just
> use "git config" to set the variables locally in it, no?

Yup, that's safe. I'll reroll with that.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 2/2] apply: reload .gitattributes after patching it
  2019-08-13 18:08     ` Junio C Hamano
@ 2019-08-15 22:10       ` Junio C Hamano
  2019-08-16  0:22         ` brian m. carlson
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-08-15 22:10 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> When applying multiple patches with git am, or when rebasing using the
>> am backend, it's possible that one of our patches has updated a
>> gitattributes file. Currently, we cache this information, so if a
>> file in a subsequent patch has attributes applied, the file will be
>> written out with the attributes in place as of the time we started the
>> rebase or am operation, not with the attributes applied by the previous
>> patch. This problem does not occur when using the -m or -i flags to
>> rebase.
> ...
> "rebase -m" and "rebase -i" are not repeated run_command() calls
> that invoke "git cherry-pick" or "git merge" these days, either, so
> I am somewhat curious how they avoid fallilng into the same trap.
>
> Thanks for the fix.  Will queue.

Actually there still is one more thing I wasn't clear about the
change.

> To ensure we write the correct data into the working tree, expire the
> cache after each patch that touches a path ending in ".gitattributes".
> ...
> +			if (!flush_attributes && patch->new_name &&
> +			    ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE))
> +				flush_attributes = 1;

When an attribute file is removed by a patch, we should forget what
we read earlier from the file before it got removed.  Would such a
case, where patch->new_name would be NULL, be handled correctly?

The call to ends_with_path_components() is almost no cost, and I
would suspect that this call is easier to reason about without the
"!flush_attributes &&" in the conditional part, by the way.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 2/2] apply: reload .gitattributes after patching it
  2019-08-15 22:10       ` Junio C Hamano
@ 2019-08-16  0:22         ` brian m. carlson
  0 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-16  0:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]

On 2019-08-15 at 22:10:29, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >
> >> When applying multiple patches with git am, or when rebasing using the
> >> am backend, it's possible that one of our patches has updated a
> >> gitattributes file. Currently, we cache this information, so if a
> >> file in a subsequent patch has attributes applied, the file will be
> >> written out with the attributes in place as of the time we started the
> >> rebase or am operation, not with the attributes applied by the previous
> >> patch. This problem does not occur when using the -m or -i flags to
> >> rebase.
> > ...
> > "rebase -m" and "rebase -i" are not repeated run_command() calls
> > that invoke "git cherry-pick" or "git merge" these days, either, so
> > I am somewhat curious how they avoid fallilng into the same trap.
> >
> > Thanks for the fix.  Will queue.
> 
> Actually there still is one more thing I wasn't clear about the
> change.
> 
> > To ensure we write the correct data into the working tree, expire the
> > cache after each patch that touches a path ending in ".gitattributes".
> > ...
> > +			if (!flush_attributes && patch->new_name &&
> > +			    ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE))
> > +				flush_attributes = 1;
> 
> When an attribute file is removed by a patch, we should forget what
> we read earlier from the file before it got removed.  Would such a
> case, where patch->new_name would be NULL, be handled correctly?

That's a good question. I don't think we handle that case, so I'll
include that (and an appropriate test) with my reroll.

I suspect we probably want something like:

  if ((patch->new_name && ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) ||
      (patch->old_name && ends_with_path_components(patch->old_name, GITATTRIBUTES_FILE)))

This should cause at most one flush per patch, and I think this should
cover both cases, as well as any we haven't thought of. It's also
possible that we could get a copy or rename that causes a false
positive, but considering the contents of a .gitattributes file, that
seems unlikely enough in practice that it's probably not worth worrying
about for now.

> The call to ends_with_path_components() is almost no cost, and I
> would suspect that this call is easier to reason about without the
> "!flush_attributes &&" in the conditional part, by the way.

Yeah, it probably is. It was added to avoid the allocation, but now that
we don't have one, it shouldn't be a problem.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 0/2]  Honor .gitattributes with rebase --am
  2019-08-09 10:02 [PATCH] apply: reload .gitattributes after patching it brian m. carlson
                   ` (2 preceding siblings ...)
  2019-08-13  2:43 ` [PATCH v3 0/2] Honor .gitattributes with rebase --am brian m. carlson
@ 2019-08-18 18:44 ` brian m. carlson
  2019-08-18 18:44   ` [PATCH v4 1/2] path: add a function to check for path suffix brian m. carlson
  2019-08-18 18:44   ` [PATCH v4 2/2] apply: reload .gitattributes after patching it brian m. carlson
  3 siblings, 2 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-18 18:44 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

This series makes rebase --am honor the .gitattributes file for
subsequent patches when a patch changes it.

Changes from v3:
* Check for both addition and removal of .gitattributes files.
* Switch from "test_config" to "git config".

Changes from v2:
* Rename has_path_suffix to ends_with_path_components.

Changes from v1:
* Add has_path_suffix in a separate commit.

brian m. carlson (2):
  path: add a function to check for path suffix
  apply: reload .gitattributes after patching it

 apply.c           |  7 +++++++
 convert.c         |  9 ++++++++-
 convert.h         |  6 ++++++
 path.c            | 39 ++++++++++++++++++++++++++++++---------
 path.h            |  3 +++
 t/t3400-rebase.sh | 36 ++++++++++++++++++++++++++++++++++++
 6 files changed, 90 insertions(+), 10 deletions(-)

Range-diff against v3:
1:  5f4402b38d ! 1:  fa825e4b40 apply: reload .gitattributes after patching it
    @@ apply.c: static int apply_patch(struct apply_state *state,
      			*listp = patch;
      			listp = &patch->next;
     +
    -+			if (!flush_attributes && patch->new_name &&
    -+			    ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE))
    ++			if ((patch->new_name && ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) ||
    ++			    (patch->old_name && ends_with_path_components(patch->old_name, GITATTRIBUTES_FILE)))
     +				flush_attributes = 1;
      		}
      		else {
    @@ t/t3400-rebase.sh: test_expect_success 'rebase --am and --show-current-patch' '
     +	(
     +		cd attributes &&
     +		test_commit init &&
    -+		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
    -+		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
    ++		git config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
    ++		git config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
     +
     +		test_commit second &&
     +		git checkout -b test HEAD^ &&
    @@ t/t3400-rebase.sh: test_expect_success 'rebase --am and --show-current-patch' '
     +		echo "This text is smudged." >a.txt &&
     +		git add a.txt &&
     +		test_commit fourth &&
    ++
    ++		git checkout -b removal HEAD^ &&
    ++		git rm .gitattributes &&
    ++		git add -u &&
    ++		test_commit fifth &&
    ++		git cherry-pick test &&
    ++
    ++		git checkout test &&
     +		git rebase master &&
    -+		grep "smudged" a.txt
    ++		grep "smudged" a.txt &&
    ++
    ++		git checkout removal &&
    ++		git reset --hard &&
    ++		git rebase master &&
    ++		grep "clean" a.txt
     +	)
     +'
     +

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 1/2] path: add a function to check for path suffix
  2019-08-18 18:44 ` [PATCH v4 0/2] Honor .gitattributes with rebase --am brian m. carlson
@ 2019-08-18 18:44   ` brian m. carlson
  2019-08-18 18:44   ` [PATCH v4 2/2] apply: reload .gitattributes after patching it brian m. carlson
  1 sibling, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-08-18 18:44 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

We have a function to strip the path suffix from a commit, but we don't
have one to check for a path suffix. For a plain filename, we can use
basename, but that requires an allocation, since POSIX allows it to
modify its argument. Refactor strip_path_suffix into a helper function
and a new function, ends_with_path_components, to meet this need.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 path.c | 39 ++++++++++++++++++++++++++++++---------
 path.h |  3 +++
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/path.c b/path.c
index 25e97b8c3f..e3da1f3c4e 100644
--- a/path.c
+++ b/path.c
@@ -1221,31 +1221,52 @@ static inline int chomp_trailing_dir_sep(const char *path, int len)
 }
 
 /*
- * If path ends with suffix (complete path components), returns the
- * part before suffix (sans trailing directory separators).
- * Otherwise returns NULL.
+ * If path ends with suffix (complete path components), returns the offset of
+ * the last character in the path before the suffix (sans trailing directory
+ * separators), and -1 otherwise.
  */
-char *strip_path_suffix(const char *path, const char *suffix)
+static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
 {
 	int path_len = strlen(path), suffix_len = strlen(suffix);
 
 	while (suffix_len) {
 		if (!path_len)
-			return NULL;
+			return -1;
 
 		if (is_dir_sep(path[path_len - 1])) {
 			if (!is_dir_sep(suffix[suffix_len - 1]))
-				return NULL;
+				return -1;
 			path_len = chomp_trailing_dir_sep(path, path_len);
 			suffix_len = chomp_trailing_dir_sep(suffix, suffix_len);
 		}
 		else if (path[--path_len] != suffix[--suffix_len])
-			return NULL;
+			return -1;
 	}
 
 	if (path_len && !is_dir_sep(path[path_len - 1]))
-		return NULL;
-	return xstrndup(path, chomp_trailing_dir_sep(path, path_len));
+		return -1;
+	return chomp_trailing_dir_sep(path, path_len);
+}
+
+/*
+ * Returns true if the path ends with components, considering only complete path
+ * components, and false otherwise.
+ */
+int ends_with_path_components(const char *path, const char *components)
+{
+	return stripped_path_suffix_offset(path, components) != -1;
+}
+
+/*
+ * If path ends with suffix (complete path components), returns the
+ * part before suffix (sans trailing directory separators).
+ * Otherwise returns NULL.
+ */
+char *strip_path_suffix(const char *path, const char *suffix)
+{
+	ssize_t offset = stripped_path_suffix_offset(path, suffix);
+
+	return offset == -1 ? NULL : xstrndup(path, offset);
 }
 
 int daemon_avoid_alias(const char *p)
diff --git a/path.h b/path.h
index 2ba6ca58c8..14d6dcad16 100644
--- a/path.h
+++ b/path.h
@@ -193,4 +193,7 @@ const char *git_path_merge_head(struct repository *r);
 const char *git_path_fetch_head(struct repository *r);
 const char *git_path_shallow(struct repository *r);
 
+
+int ends_with_path_components(const char *path, const char *components);
+
 #endif /* PATH_H */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-18 18:44 ` [PATCH v4 0/2] Honor .gitattributes with rebase --am brian m. carlson
  2019-08-18 18:44   ` [PATCH v4 1/2] path: add a function to check for path suffix brian m. carlson
@ 2019-08-18 18:44   ` brian m. carlson
  2019-08-19  9:41     ` Phillip Wood
  1 sibling, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-08-18 18:44 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

When applying multiple patches with git am, or when rebasing using the
am backend, it's possible that one of our patches has updated a
gitattributes file. Currently, we cache this information, so if a
file in a subsequent patch has attributes applied, the file will be
written out with the attributes in place as of the time we started the
rebase or am operation, not with the attributes applied by the previous
patch. This problem does not occur when using the -m or -i flags to
rebase.

To ensure we write the correct data into the working tree, expire the
cache after each patch that touches a path ending in ".gitattributes".

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 apply.c           |  7 +++++++
 convert.c         |  9 ++++++++-
 convert.h         |  6 ++++++
 t/t3400-rebase.sh | 36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index cde95369bb..d57bc635e4 100644
--- a/apply.c
+++ b/apply.c
@@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
 	struct patch *list = NULL, **listp = &list;
 	int skipped_patch = 0;
 	int res = 0;
+	int flush_attributes = 0;
 
 	state->patch_input_file = filename;
 	if (read_patch_file(&buf, fd) < 0)
@@ -4670,6 +4671,10 @@ static int apply_patch(struct apply_state *state,
 			patch_stats(state, patch);
 			*listp = patch;
 			listp = &patch->next;
+
+			if ((patch->new_name && ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) ||
+			    (patch->old_name && ends_with_path_components(patch->old_name, GITATTRIBUTES_FILE)))
+				flush_attributes = 1;
 		}
 		else {
 			if (state->apply_verbosity > verbosity_normal)
@@ -4746,6 +4751,8 @@ static int apply_patch(struct apply_state *state,
 	if (state->summary && state->apply_verbosity > verbosity_silent)
 		summary_patch_list(list);
 
+	if (flush_attributes)
+		reset_parsed_attributes();
 end:
 	free_patch_list(list);
 	strbuf_release(&buf);
diff --git a/convert.c b/convert.c
index 94ff837649..030e9b81b9 100644
--- a/convert.c
+++ b/convert.c
@@ -1293,10 +1293,11 @@ struct conv_attrs {
 	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
 };
 
+static struct attr_check *check;
+
 static void convert_attrs(const struct index_state *istate,
 			  struct conv_attrs *ca, const char *path)
 {
-	static struct attr_check *check;
 	struct attr_check_item *ccheck = NULL;
 
 	if (!check) {
@@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate,
 		ca->crlf_action = CRLF_AUTO_INPUT;
 }
 
+void reset_parsed_attributes(void)
+{
+	attr_check_free(check);
+	check = NULL;
+}
+
 int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
diff --git a/convert.h b/convert.h
index 831559f10d..3710969d43 100644
--- a/convert.h
+++ b/convert.h
@@ -94,6 +94,12 @@ void convert_to_git_filter_fd(const struct index_state *istate,
 int would_convert_to_git_filter_fd(const struct index_state *istate,
 				   const char *path);
 
+/*
+ * Reset the internal list of attributes used by convert_to_git and
+ * convert_to_working_tree.
+ */
+void reset_parsed_attributes(void);
+
 /*****************************************************************
  *
  * Streaming conversion support
diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
index 80b23fd326..23469cc789 100755
--- a/t/t3400-rebase.sh
+++ b/t/t3400-rebase.sh
@@ -301,6 +301,42 @@ test_expect_success 'rebase --am and --show-current-patch' '
 	)
 '
 
+test_expect_success 'rebase --am and .gitattributes' '
+	test_create_repo attributes &&
+	(
+		cd attributes &&
+		test_commit init &&
+		git config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
+		git config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
+
+		test_commit second &&
+		git checkout -b test HEAD^ &&
+
+		echo "*.txt filter=test" >.gitattributes &&
+		git add .gitattributes &&
+		test_commit third &&
+
+		echo "This text is smudged." >a.txt &&
+		git add a.txt &&
+		test_commit fourth &&
+
+		git checkout -b removal HEAD^ &&
+		git rm .gitattributes &&
+		git add -u &&
+		test_commit fifth &&
+		git cherry-pick test &&
+
+		git checkout test &&
+		git rebase master &&
+		grep "smudged" a.txt &&
+
+		git checkout removal &&
+		git reset --hard &&
+		git rebase master &&
+		grep "clean" a.txt
+	)
+'
+
 test_expect_success 'rebase--merge.sh and --show-current-patch' '
 	test_create_repo conflict-merge &&
 	(

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-18 18:44   ` [PATCH v4 2/2] apply: reload .gitattributes after patching it brian m. carlson
@ 2019-08-19  9:41     ` Phillip Wood
  2019-08-19  9:55       ` Phillip Wood
  2019-08-20  2:45       ` brian m. carlson
  0 siblings, 2 replies; 41+ messages in thread
From: Phillip Wood @ 2019-08-19  9:41 UTC (permalink / raw)
  To: brian m. carlson, git
  Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

Hi Brian

On 18/08/2019 19:44, brian m. carlson wrote:
> When applying multiple patches with git am, or when rebasing using the
> am backend, it's possible that one of our patches has updated a
> gitattributes file. Currently, we cache this information, so if a
> file in a subsequent patch has attributes applied, the file will be
> written out with the attributes in place as of the time we started the
> rebase or am operation, not with the attributes applied by the previous
> patch. This problem does not occur when using the -m or -i flags to
> rebase.

Do you know why -m and -i aren't affected?

> To ensure we write the correct data into the working tree, expire the
> cache after each patch that touches a path ending in ".gitattributes".
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>   apply.c           |  7 +++++++
>   convert.c         |  9 ++++++++-
>   convert.h         |  6 ++++++
>   t/t3400-rebase.sh | 36 ++++++++++++++++++++++++++++++++++++
>   4 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/apply.c b/apply.c
> index cde95369bb..d57bc635e4 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
>   	struct patch *list = NULL, **listp = &list;
>   	int skipped_patch = 0;
>   	int res = 0;
> +	int flush_attributes = 0;
>   
>   	state->patch_input_file = filename;
>   	if (read_patch_file(&buf, fd) < 0)
> @@ -4670,6 +4671,10 @@ static int apply_patch(struct apply_state *state,
>   			patch_stats(state, patch);
>   			*listp = patch;
>   			listp = &patch->next;
> +
> +			if ((patch->new_name && ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) ||
> +			    (patch->old_name && ends_with_path_components(patch->old_name, GITATTRIBUTES_FILE)))
> +				flush_attributes = 1;

style nit - these lines are very long compared to 80 characters

>   		}
>   		else {
>   			if (state->apply_verbosity > verbosity_normal)
> @@ -4746,6 +4751,8 @@ static int apply_patch(struct apply_state *state,
>   	if (state->summary && state->apply_verbosity > verbosity_silent)
>   		summary_patch_list(list);
>   
> +	if (flush_attributes)
> +		reset_parsed_attributes();
>   end:
>   	free_patch_list(list);
>   	strbuf_release(&buf);
> diff --git a/convert.c b/convert.c
> index 94ff837649..030e9b81b9 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -1293,10 +1293,11 @@ struct conv_attrs {
>   	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
>   };
>   
> +static struct attr_check *check;

I was concerned about the impact adding a file global if we ever want to 
multi-thread this for submodules, but looking through the file there are 
a couple of others already so this isn't creating a new problem.

Best Wishes

Phillip

> +
>   static void convert_attrs(const struct index_state *istate,
>   			  struct conv_attrs *ca, const char *path)
>   {
> -	static struct attr_check *check;
>   	struct attr_check_item *ccheck = NULL;
>   
>   	if (!check) {
> @@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate,
>   		ca->crlf_action = CRLF_AUTO_INPUT;
>   }
>   
> +void reset_parsed_attributes(void)
> +{
> +	attr_check_free(check);
> +	check = NULL;
> +}
> +
>   int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
>   {
>   	struct conv_attrs ca;
> diff --git a/convert.h b/convert.h
> index 831559f10d..3710969d43 100644
> --- a/convert.h
> +++ b/convert.h
> @@ -94,6 +94,12 @@ void convert_to_git_filter_fd(const struct index_state *istate,
>   int would_convert_to_git_filter_fd(const struct index_state *istate,
>   				   const char *path);
>   
> +/*
> + * Reset the internal list of attributes used by convert_to_git and
> + * convert_to_working_tree.
> + */
> +void reset_parsed_attributes(void);
> +
>   /*****************************************************************
>    *
>    * Streaming conversion support
> diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
> index 80b23fd326..23469cc789 100755
> --- a/t/t3400-rebase.sh
> +++ b/t/t3400-rebase.sh
> @@ -301,6 +301,42 @@ test_expect_success 'rebase --am and --show-current-patch' '
>   	)
>   '
>   
> +test_expect_success 'rebase --am and .gitattributes' '
> +	test_create_repo attributes &&
> +	(
> +		cd attributes &&
> +		test_commit init &&
> +		git config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
> +		git config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
> +
> +		test_commit second &&
> +		git checkout -b test HEAD^ &&
> +
> +		echo "*.txt filter=test" >.gitattributes &&
> +		git add .gitattributes &&
> +		test_commit third &&
> +
> +		echo "This text is smudged." >a.txt &&
> +		git add a.txt &&
> +		test_commit fourth &&
> +
> +		git checkout -b removal HEAD^ &&
> +		git rm .gitattributes &&
> +		git add -u &&
> +		test_commit fifth &&
> +		git cherry-pick test &&
> +
> +		git checkout test &&
> +		git rebase master &&
> +		grep "smudged" a.txt &&
> +
> +		git checkout removal &&
> +		git reset --hard &&
> +		git rebase master &&
> +		grep "clean" a.txt
> +	)
> +'
> +
>   test_expect_success 'rebase--merge.sh and --show-current-patch' '
>   	test_create_repo conflict-merge &&
>   	(
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-19  9:41     ` Phillip Wood
@ 2019-08-19  9:55       ` Phillip Wood
  2019-08-20  3:05         ` brian m. carlson
  2019-08-20  2:45       ` brian m. carlson
  1 sibling, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2019-08-19  9:55 UTC (permalink / raw)
  To: brian m. carlson, git
  Cc: SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

On 19/08/2019 10:41, Phillip Wood wrote:
> [...]
>> diff --git a/convert.c b/convert.c
>> index 94ff837649..030e9b81b9 100644
>> --- a/convert.c
>> +++ b/convert.c
>> @@ -1293,10 +1293,11 @@ struct conv_attrs {
>>       const char *working_tree_encoding; /* Supported encoding or 
>> default encoding if NULL */
>>   };
>> +static struct attr_check *check;
> 
> I was concerned about the impact adding a file global if we ever want to 
> multi-thread this for submodules, but looking through the file there are 
> a couple of others already so this isn't creating a new problem.

Doh, I've just realized it was static already - ignore that.

One thing did occur to me though - does this patch reset attributes like 
the merge marker length (they're less critical though if there is a 
conflict after an attribute change it would be nice to have the correct 
length) or just the ones for filtering files?

Best Wishes

Phillip

> 
> Best Wishes
> 
> Phillip
> 
>> +
>>   static void convert_attrs(const struct index_state *istate,
>>                 struct conv_attrs *ca, const char *path)
>>   {
>> -    static struct attr_check *check;
>>       struct attr_check_item *ccheck = NULL;
>>       if (!check) {
>> @@ -1339,6 +1340,12 @@ static void convert_attrs(const struct 
>> index_state *istate,
>>           ca->crlf_action = CRLF_AUTO_INPUT;
>>   }
>> +void reset_parsed_attributes(void)
>> +{
>> +    attr_check_free(check);
>> +    check = NULL;
>> +}
>> +
>>   int would_convert_to_git_filter_fd(const struct index_state *istate, 
>> const char *path)
>>   {
>>       struct conv_attrs ca;
>> diff --git a/convert.h b/convert.h
>> index 831559f10d..3710969d43 100644
>> --- a/convert.h
>> +++ b/convert.h
>> @@ -94,6 +94,12 @@ void convert_to_git_filter_fd(const struct 
>> index_state *istate,
>>   int would_convert_to_git_filter_fd(const struct index_state *istate,
>>                      const char *path);
>> +/*
>> + * Reset the internal list of attributes used by convert_to_git and
>> + * convert_to_working_tree.
>> + */
>> +void reset_parsed_attributes(void);
>> +
>>   /*****************************************************************
>>    *
>>    * Streaming conversion support
>> diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
>> index 80b23fd326..23469cc789 100755
>> --- a/t/t3400-rebase.sh
>> +++ b/t/t3400-rebase.sh
>> @@ -301,6 +301,42 @@ test_expect_success 'rebase --am and 
>> --show-current-patch' '
>>       )
>>   '
>> +test_expect_success 'rebase --am and .gitattributes' '
>> +    test_create_repo attributes &&
>> +    (
>> +        cd attributes &&
>> +        test_commit init &&
>> +        git config filter.test.clean "sed -e 
>> '\''s/smudged/clean/g'\''" &&
>> +        git config filter.test.smudge "sed -e 
>> '\''s/clean/smudged/g'\''" &&
>> +
>> +        test_commit second &&
>> +        git checkout -b test HEAD^ &&
>> +
>> +        echo "*.txt filter=test" >.gitattributes &&
>> +        git add .gitattributes &&
>> +        test_commit third &&
>> +
>> +        echo "This text is smudged." >a.txt &&
>> +        git add a.txt &&
>> +        test_commit fourth &&
>> +
>> +        git checkout -b removal HEAD^ &&
>> +        git rm .gitattributes &&
>> +        git add -u &&
>> +        test_commit fifth &&
>> +        git cherry-pick test &&
>> +
>> +        git checkout test &&
>> +        git rebase master &&
>> +        grep "smudged" a.txt &&
>> +
>> +        git checkout removal &&
>> +        git reset --hard &&
>> +        git rebase master &&
>> +        grep "clean" a.txt
>> +    )
>> +'
>> +
>>   test_expect_success 'rebase--merge.sh and --show-current-patch' '
>>       test_create_repo conflict-merge &&
>>       (
>>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-19  9:41     ` Phillip Wood
  2019-08-19  9:55       ` Phillip Wood
@ 2019-08-20  2:45       ` brian m. carlson
  2019-08-20  8:52         ` Phillip Wood
  1 sibling, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-08-20  2:45 UTC (permalink / raw)
  To: phillip.wood
  Cc: git, SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3324 bytes --]

On 2019-08-19 at 09:41:42, Phillip Wood wrote:
> Hi Brian
> 
> On 18/08/2019 19:44, brian m. carlson wrote:
> > When applying multiple patches with git am, or when rebasing using the
> > am backend, it's possible that one of our patches has updated a
> > gitattributes file. Currently, we cache this information, so if a
> > file in a subsequent patch has attributes applied, the file will be
> > written out with the attributes in place as of the time we started the
> > rebase or am operation, not with the attributes applied by the previous
> > patch. This problem does not occur when using the -m or -i flags to
> > rebase.
> 
> Do you know why -m and -i aren't affected?

I had to look, but I believe the answer is because they use the
sequencer, and the sequencer calls git merge-recursive as a separate
process, and so the writing of the tree is only done in a subprocess,
which can't persist state.

Should we move the merge-recursive code into the main process, we'll
likely have the same problem there.

> > diff --git a/apply.c b/apply.c
> > index cde95369bb..d57bc635e4 100644
> > --- a/apply.c
> > +++ b/apply.c
> > @@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
> >   	struct patch *list = NULL, **listp = &list;
> >   	int skipped_patch = 0;
> >   	int res = 0;
> > +	int flush_attributes = 0;
> >   	state->patch_input_file = filename;
> >   	if (read_patch_file(&buf, fd) < 0)
> > @@ -4670,6 +4671,10 @@ static int apply_patch(struct apply_state *state,
> >   			patch_stats(state, patch);
> >   			*listp = patch;
> >   			listp = &patch->next;
> > +
> > +			if ((patch->new_name && ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) ||
> > +			    (patch->old_name && ends_with_path_components(patch->old_name, GITATTRIBUTES_FILE)))
> > +				flush_attributes = 1;
> 
> style nit - these lines are very long compared to 80 characters

They are.  They aren't two different from other lines in the file, and I
thought that leaving them that way would preserve the similarities, but
I can also wrap them.  I'll send out a v5 with wrapped lines.

> > diff --git a/convert.c b/convert.c
> > index 94ff837649..030e9b81b9 100644
> > --- a/convert.c
> > +++ b/convert.c
> > @@ -1293,10 +1293,11 @@ struct conv_attrs {
> >   	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
> >   };
> > +static struct attr_check *check;
> 
> I was concerned about the impact adding a file global if we ever want to
> multi-thread this for submodules, but looking through the file there are a
> couple of others already so this isn't creating a new problem.
> > +
> >   static void convert_attrs(const struct index_state *istate,
> >   			  struct conv_attrs *ca, const char *path)
> >   {
> > -	static struct attr_check *check;
> >   	struct attr_check_item *ccheck = NULL;
> >   	if (!check) {
> > @@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate,
> >   		ca->crlf_action = CRLF_AUTO_INPUT;
> >   }

The portion I've included above demonstrates that this was already a
static variable, just one hidden in a function.  So yeah, this is no
worse than it already was.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-19  9:55       ` Phillip Wood
@ 2019-08-20  3:05         ` brian m. carlson
  2019-08-20  8:56           ` Phillip Wood
  0 siblings, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-08-20  3:05 UTC (permalink / raw)
  To: phillip.wood
  Cc: git, SZEDER Gábor, Taylor Blau, Jeff King, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On 2019-08-19 at 09:55:27, Phillip Wood wrote:
> On 19/08/2019 10:41, Phillip Wood wrote:
> > [...]
> > > diff --git a/convert.c b/convert.c
> > > index 94ff837649..030e9b81b9 100644
> > > --- a/convert.c
> > > +++ b/convert.c
> > > @@ -1293,10 +1293,11 @@ struct conv_attrs {
> > >       const char *working_tree_encoding; /* Supported encoding or
> > > default encoding if NULL */
> > >   };
> > > +static struct attr_check *check;
> > 
> > I was concerned about the impact adding a file global if we ever want to
> > multi-thread this for submodules, but looking through the file there are
> > a couple of others already so this isn't creating a new problem.
> 
> Doh, I've just realized it was static already - ignore that.

And I just realized that I didn't read the entire thread before
responding.  Sorry about that.

> One thing did occur to me though - does this patch reset attributes like the
> merge marker length (they're less critical though if there is a conflict
> after an attribute change it would be nice to have the correct length) or
> just the ones for filtering files?

It resets "crlf", "ident", "filter", "eol", "text", and
"working-tree-encoding".  Things it doesn't reset include "whitespace",
"export-ignore", "export-subst", "merge", and "conflict-marker-size".
Of these, I think only the latter two are relevant.

I'll update that in v5.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-20  2:45       ` brian m. carlson
@ 2019-08-20  8:52         ` Phillip Wood
  2019-08-20 18:24           ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2019-08-20  8:52 UTC (permalink / raw)
  To: brian m. carlson, phillip.wood, git, SZEDER Gábor,
	Taylor Blau, Jeff King, Junio C Hamano

Hi Brian

On 20/08/2019 03:45, brian m. carlson wrote:
> On 2019-08-19 at 09:41:42, Phillip Wood wrote:
>> Hi Brian
>>
>> On 18/08/2019 19:44, brian m. carlson wrote:
>>> When applying multiple patches with git am, or when rebasing using the
>>> am backend, it's possible that one of our patches has updated a
>>> gitattributes file. Currently, we cache this information, so if a
>>> file in a subsequent patch has attributes applied, the file will be
>>> written out with the attributes in place as of the time we started the
>>> rebase or am operation, not with the attributes applied by the previous
>>> patch. This problem does not occur when using the -m or -i flags to
>>> rebase.
>>
>> Do you know why -m and -i aren't affected?
> 
> I had to look, but I believe the answer is because they use the
> sequencer, and the sequencer calls git merge-recursive as a separate
> process, and so the writing of the tree is only done in a subprocess,
> which can't persist state.

The sequencer has been running in a single process for a while now. We 
do fork for 'git merge' sometimes when processing 'merge' commands but 
'pick' commands are all done in a single process by calling 
do_recursive_merge().

Best Wishes

Phillip

> Should we move the merge-recursive code into the main process, we'll
> likely have the same problem there.
> 
>>> diff --git a/apply.c b/apply.c
>>> index cde95369bb..d57bc635e4 100644
>>> --- a/apply.c
>>> +++ b/apply.c
>>> @@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state,
>>>    	struct patch *list = NULL, **listp = &list;
>>>    	int skipped_patch = 0;
>>>    	int res = 0;
>>> +	int flush_attributes = 0;
>>>    	state->patch_input_file = filename;
>>>    	if (read_patch_file(&buf, fd) < 0)
>>> @@ -4670,6 +4671,10 @@ static int apply_patch(struct apply_state *state,
>>>    			patch_stats(state, patch);
>>>    			*listp = patch;
>>>    			listp = &patch->next;
>>> +
>>> +			if ((patch->new_name && ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) ||
>>> +			    (patch->old_name && ends_with_path_components(patch->old_name, GITATTRIBUTES_FILE)))
>>> +				flush_attributes = 1;
>>
>> style nit - these lines are very long compared to 80 characters
> 
> They are.  They aren't two different from other lines in the file, and I
> thought that leaving them that way would preserve the similarities, but
> I can also wrap them.  I'll send out a v5 with wrapped lines.
> 
>>> diff --git a/convert.c b/convert.c
>>> index 94ff837649..030e9b81b9 100644
>>> --- a/convert.c
>>> +++ b/convert.c
>>> @@ -1293,10 +1293,11 @@ struct conv_attrs {
>>>    	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
>>>    };
>>> +static struct attr_check *check;
>>
>> I was concerned about the impact adding a file global if we ever want to
>> multi-thread this for submodules, but looking through the file there are a
>> couple of others already so this isn't creating a new problem.
>>> +
>>>    static void convert_attrs(const struct index_state *istate,
>>>    			  struct conv_attrs *ca, const char *path)
>>>    {
>>> -	static struct attr_check *check;
>>>    	struct attr_check_item *ccheck = NULL;
>>>    	if (!check) {
>>> @@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate,
>>>    		ca->crlf_action = CRLF_AUTO_INPUT;
>>>    }
> 
> The portion I've included above demonstrates that this was already a
> static variable, just one hidden in a function.  So yeah, this is no
> worse than it already was.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-20  3:05         ` brian m. carlson
@ 2019-08-20  8:56           ` Phillip Wood
  0 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2019-08-20  8:56 UTC (permalink / raw)
  To: brian m. carlson, phillip.wood, git, SZEDER Gábor,
	Taylor Blau, Jeff King, Junio C Hamano

On 20/08/2019 04:05, brian m. carlson wrote:
> On 2019-08-19 at 09:55:27, Phillip Wood wrote:
>> On 19/08/2019 10:41, Phillip Wood wrote:
>>> [...]
>>>> diff --git a/convert.c b/convert.c
>>>> index 94ff837649..030e9b81b9 100644
>>>> --- a/convert.c
>>>> +++ b/convert.c
>>>> @@ -1293,10 +1293,11 @@ struct conv_attrs {
>>>>        const char *working_tree_encoding; /* Supported encoding or
>>>> default encoding if NULL */
>>>>    };
>>>> +static struct attr_check *check;
>>>
>>> I was concerned about the impact adding a file global if we ever want to
>>> multi-thread this for submodules, but looking through the file there are
>>> a couple of others already so this isn't creating a new problem.
>>
>> Doh, I've just realized it was static already - ignore that.
> 
> And I just realized that I didn't read the entire thread before
> responding.  Sorry about that.
> 
>> One thing did occur to me though - does this patch reset attributes like the
>> merge marker length (they're less critical though if there is a conflict
>> after an attribute change it would be nice to have the correct length) or
>> just the ones for filtering files?
> 
> It resets "crlf", "ident", "filter", "eol", "text", and
> "working-tree-encoding".  Things it doesn't reset include "whitespace",
> "export-ignore", "export-subst", "merge", and "conflict-marker-size".
> Of these, I think only the latter two are relevant.
> 
> I'll update that in v5.

Thanks, one other thought I had was that this is really a bug in 'am' 
rather than 'rebase'. There has been some talk of switching the default 
rabase backend to the sequencer so maybe it would be better to test 'am' 
rather than 'rebase' to ensure we still check 'am' is working properly 
after any switch in the rebase backend. (perhaps we should be testing 
rebase and rebase --interactive separately as well to prevent any future 
regressions I'm not sure)

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-20  8:52         ` Phillip Wood
@ 2019-08-20 18:24           ` Junio C Hamano
  2019-08-20 18:32             ` Phillip Wood
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-08-20 18:24 UTC (permalink / raw)
  To: Phillip Wood
  Cc: brian m. carlson, phillip.wood, git, SZEDER Gábor,
	Taylor Blau, Jeff King

Phillip Wood <phillip.wood123@gmail.com> writes:

>>> Do you know why -m and -i aren't affected?
>>
>> I had to look, but I believe the answer is because they use the
>> sequencer, and the sequencer calls git merge-recursive as a separate
>> process, and so the writing of the tree is only done in a subprocess,
>> which can't persist state.
>
> The sequencer has been running in a single process for a while now. We
> do fork for 'git merge' sometimes when processing 'merge' commands but
> 'pick' commands are all done in a single process by calling
> do_recursive_merge().
>
> Best Wishes
>
> Phillip
>
>> Should we move the merge-recursive code into the main process, we'll
>> likely have the same problem there.

So we actually have the same issue already?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-20 18:24           ` Junio C Hamano
@ 2019-08-20 18:32             ` Phillip Wood
  2019-08-26 15:09               ` Phillip Wood
  0 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2019-08-20 18:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, phillip.wood, git, SZEDER Gábor,
	Taylor Blau, Jeff King

On 20/08/2019 19:24, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>>>> Do you know why -m and -i aren't affected?
>>>
>>> I had to look, but I believe the answer is because they use the
>>> sequencer, and the sequencer calls git merge-recursive as a separate
>>> process, and so the writing of the tree is only done in a subprocess,
>>> which can't persist state.
>>
>> The sequencer has been running in a single process for a while now. We
>> do fork for 'git merge' sometimes when processing 'merge' commands but
>> 'pick' commands are all done in a single process by calling
>> do_recursive_merge().
>>
>> Best Wishes
>>
>> Phillip
>>
>>> Should we move the merge-recursive code into the main process, we'll
>>> likely have the same problem there.
> 
> So we actually have the same issue already?

I don't think so, I modified Brian's test to call 'rebase -i' and it 
passes but no one seems to know why.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-20 18:32             ` Phillip Wood
@ 2019-08-26 15:09               ` Phillip Wood
  2019-08-26 16:41                 ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2019-08-26 15:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, phillip.wood, git, SZEDER Gábor,
	Taylor Blau, Jeff King

On 20/08/2019 19:32, Phillip Wood wrote:
> On 20/08/2019 19:24, Junio C Hamano wrote:
>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>
>>>>> Do you know why -m and -i aren't affected?
>>>>
>>>> I had to look, but I believe the answer is because they use the
>>>> sequencer, and the sequencer calls git merge-recursive as a separate
>>>> process, and so the writing of the tree is only done in a subprocess,
>>>> which can't persist state.
>>>
>>> The sequencer has been running in a single process for a while now. We
>>> do fork for 'git merge' sometimes when processing 'merge' commands but
>>> 'pick' commands are all done in a single process by calling
>>> do_recursive_merge().
>>>
>>> Best Wishes
>>>
>>> Phillip
>>>
>>>> Should we move the merge-recursive code into the main process, we'll
>>>> likely have the same problem there.
>>
>> So we actually have the same issue already?
> 
> I don't think so, I modified Brian's test to call 'rebase -i' and it 
> passes but no one seems to know why.

I spent some time digging into this and the attributes are reloaded 
before each pick. This is because check_updates() called by 
unpack_trees() calls git_attr_set_direction(GIT_ATTR_CHECKOUT) at the 
start of the function and git_attr_set_direction(GIT_ATTR_CHECKIN) at 
the end of the function. This has the effect of dropping all loaded 
attributes as git_attr_set_direction() calls drop_all_attr_stacks() when 
the direction is changed.

Best Wishes

Phillip

> Best Wishes
> 
> Phillip

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/2] apply: reload .gitattributes after patching it
  2019-08-26 15:09               ` Phillip Wood
@ 2019-08-26 16:41                 ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-08-26 16:41 UTC (permalink / raw)
  To: Phillip Wood
  Cc: brian m. carlson, phillip.wood, git, SZEDER Gábor,
	Taylor Blau, Jeff King

Phillip Wood <phillip.wood123@gmail.com> writes:

> I spent some time digging into this and the attributes are reloaded
> before each pick. This is because check_updates() called by
> unpack_trees() calls git_attr_set_direction(GIT_ATTR_CHECKOUT) at the
> start of the function and git_attr_set_direction(GIT_ATTR_CHECKIN) at
> the end of the function. This has the effect of dropping all loaded
> attributes as git_attr_set_direction() calls drop_all_attr_stacks()
> when the direction is changed.

Ah, OK (and thanks), so it was working by accident ;-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-08-26 16:41 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09 10:02 [PATCH] apply: reload .gitattributes after patching it brian m. carlson
2019-08-09 11:14 ` Taylor Blau
2019-08-09 11:25   ` brian m. carlson
2019-08-09 11:36     ` Jeff King
2019-08-09 11:47       ` brian m. carlson
2019-08-09 12:43       ` SZEDER Gábor
2019-08-09 13:51         ` Jeff King
2019-08-09 22:08           ` brian m. carlson
2019-08-11 17:47 ` [PATCH v2 0/2] Honor .gitattributes with rebase --am brian m. carlson
2019-08-11 17:47   ` [PATCH v2 1/2] path: add a function to check for path suffix brian m. carlson
2019-08-12  0:32     ` Junio C Hamano
2019-08-12  1:10       ` brian m. carlson
2019-08-12  4:36         ` SZEDER Gábor
2019-08-12 16:49         ` Junio C Hamano
2019-08-12 22:40           ` brian m. carlson
2019-08-13  1:13             ` Jeff King
2019-08-13 16:40             ` Junio C Hamano
2019-08-13  6:36           ` SZEDER Gábor
2019-08-13 16:42             ` Junio C Hamano
2019-08-11 17:47   ` [PATCH v2 2/2] apply: reload .gitattributes after patching it brian m. carlson
2019-08-13  2:43 ` [PATCH v3 0/2] Honor .gitattributes with rebase --am brian m. carlson
2019-08-13  2:43   ` [PATCH v3 1/2] path: add a function to check for path suffix brian m. carlson
2019-08-13  2:43   ` [PATCH v3 2/2] apply: reload .gitattributes after patching it brian m. carlson
2019-08-13 18:08     ` Junio C Hamano
2019-08-15 22:10       ` Junio C Hamano
2019-08-16  0:22         ` brian m. carlson
2019-08-13 22:37     ` Junio C Hamano
2019-08-13 22:48       ` brian m. carlson
2019-08-18 18:44 ` [PATCH v4 0/2] Honor .gitattributes with rebase --am brian m. carlson
2019-08-18 18:44   ` [PATCH v4 1/2] path: add a function to check for path suffix brian m. carlson
2019-08-18 18:44   ` [PATCH v4 2/2] apply: reload .gitattributes after patching it brian m. carlson
2019-08-19  9:41     ` Phillip Wood
2019-08-19  9:55       ` Phillip Wood
2019-08-20  3:05         ` brian m. carlson
2019-08-20  8:56           ` Phillip Wood
2019-08-20  2:45       ` brian m. carlson
2019-08-20  8:52         ` Phillip Wood
2019-08-20 18:24           ` Junio C Hamano
2019-08-20 18:32             ` Phillip Wood
2019-08-26 15:09               ` Phillip Wood
2019-08-26 16:41                 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).