[PATCH] git-apply doesn't handle same name patches well [V3]

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] git-apply doesn't handle same name patches well [V3]
@ 2008-06-16 20:04 Don Zickus
  2008-06-16 20:27 ` Jakub Narebski
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Don Zickus @ 2008-06-16 20:04 UTC (permalink / raw)
  To: git; +Cc: Don Zickus

When working with a lot of people who backport patches all day long, every
once in a while I get a patch that modifies the same file more than once
inside the same patch.  git-apply either fails if the second change relies
on the first change or silently drops the first change if the second change
is independent.

The silent part is the scary scenario for us.  Also this behaviour is
different from the patch-utils.

I have modified git-apply to cache the filenames of files it modifies such
that if a later patch chunk modifies a file in the cache it will buffer the
previously changed file instead of reading the original file from disk.

Logic has been put in to handle creations/deletions/renames/copies.  All the
relevant tests of git-apply succeed.

A new test has been added to cover the cases I addressed.  However,
currently adding changes to renamed file inside the same patch doesn't work
correctly (it fails to find new file).  I didn't know how to fix this
correctly, so I have the test fail expectedly.

The fix is relatively straight-forward.  But I'm not sure if this new
behaviour is something the git community wants.

Changes since v2
================
- the updated patch not a v1 copy (doh!)

Changes since v1
================
- converted to path-list structs
- added testcases for renaming a patch and apply a new patch on top inside
the same patch file

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 builtin-apply.c          |   52 +++++++++++++++++++++++++++++++++++-
 t/t4127-apply-same-fn.sh |   67 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 118 insertions(+), 1 deletions(-)
 create mode 100755 t/t4127-apply-same-fn.sh

diff --git a/builtin-apply.c b/builtin-apply.c
index c497889..9f76ce4 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -12,6 +12,7 @@
 #include "blob.h"
 #include "delta.h"
 #include "builtin.h"
+#include "path-list.h"
 
 /*
  *  --check turns on checking that the working tree matches the
@@ -185,6 +186,13 @@ struct image {
 	struct line *line;
 };
 
+/*
+ * Caches patch filenames to handle the case where a
+ * patch chunk reuses a filename
+ */
+
+struct path_list fn_cache = {NULL, 0, 0, 0};
+
 static uint32_t hash_line(const char *cp, size_t len)
 {
 	size_t i;
@@ -2176,6 +2184,38 @@ static int read_file_or_gitlink(struct cache_entry *ce, struct strbuf *buf)
 	return 0;
 }
 
+struct patch *in_fn_cache(char *name)
+{
+	struct path_list_item *item;
+
+	item = path_list_lookup(name, &fn_cache);
+	if (item != NULL)
+		return (struct patch *)item->util;
+
+	return NULL;
+}
+
+void add_to_fn_cache(char *name, struct patch *patch)
+{
+	struct path_list_item *item;
+
+	/* Always add new_name unless patch is a deletion */
+	if (name != NULL) {
+		item = path_list_insert(name, &fn_cache);
+		item->util = patch;
+	}
+
+	/* skip normal diffs, creations and copies */
+	/*
+	 * store a failure on rename/deletion cases because
+	 * later chunks shouldn't patch old names
+	 */
+	if ((name == NULL) || (patch->is_rename)) {
+		item = path_list_insert(patch->old_name, &fn_cache);
+		item->util = (struct patch *) -1;
+	}
+}
+
 static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *ce)
 {
 	struct strbuf buf;
@@ -2188,7 +2228,16 @@ static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *
 		if (read_file_or_gitlink(ce, &buf))
 			return error("read of %s failed", patch->old_name);
 	} else if (patch->old_name) {
-		if (S_ISGITLINK(patch->old_mode)) {
+		struct patch *tpatch = in_fn_cache(patch->old_name);
+
+		if (tpatch != NULL) {
+			if (tpatch == (struct patch *) -1) {
+				return error("patch %s has been renamed/deleted",
+					patch->old_name);
+			}
+			/* We have a patched copy in memory use that */
+			strbuf_add(&buf, tpatch->result, tpatch->resultsize);
+		} else if (S_ISGITLINK(patch->old_mode)) {
 			if (ce) {
 				read_file_or_gitlink(ce, &buf);
 			} else {
@@ -2211,6 +2260,7 @@ static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *
 		return -1; /* note with --reject this succeeds. */
 	patch->result = image.buf;
 	patch->resultsize = image.len;
+	add_to_fn_cache(patch->new_name, patch);
 	free(image.line_allocated);
 
 	if (0 < patch->is_delete && patch->resultsize)
diff --git a/t/t4127-apply-same-fn.sh b/t/t4127-apply-same-fn.sh
new file mode 100755
index 0000000..47b59d5
--- /dev/null
+++ b/t/t4127-apply-same-fn.sh
@@ -0,0 +1,67 @@
+#!/bin/sh
+
+test_description='apply same filename'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	for i in a b c d e f g h i j k l m
+	do
+		echo $i
+	done >same_fn &&
+	git add same_fn &&
+	git commit -m initial
+'
+test_expect_success 'apply same filename with independent changes' '
+	sed -i -e "s/^d/z/" same_fn &&
+	git diff > patch0 &&
+	git add same_fn &&
+	sed -i -e "s/^i/y/" same_fn &&
+	git diff >> patch0 &&
+	cp same_fn same_fn2 &&
+	git reset --hard &&
+	git-apply patch0 &&
+	diff same_fn same_fn2
+'
+
+test_expect_success 'apply same filename with overlapping changes' '
+	git reset --hard
+	sed -i -e "s/^d/z/" same_fn &&
+	git diff > patch0 &&
+	git add same_fn &&
+	sed -i -e "s/^e/y/" same_fn &&
+	git diff >> patch0 &&
+	cp same_fn same_fn2 &&
+	git reset --hard &&
+	git-apply patch0 &&
+	diff same_fn same_fn2
+'
+
+test_expect_failure 'apply same new filename after rename' '
+	git reset --hard
+	git mv same_fn new_fn
+	sed -i -e "s/^d/z/" new_fn &&
+	git add new_fn &&
+	git diff -M --cached > patch1 &&
+	sed -i -e "s/^e/y/" new_fn &&
+	git diff >> patch1 &&
+	cp new_fn new_fn2 &&
+	git reset --hard &&
+	git apply patch1 &&
+	diff new_fn new_fn2
+'
+
+test_expect_success 'apply same old filename after rename' '
+	git reset --hard
+	git mv same_fn new_fn
+	sed -i -e "s/^d/z/" new_fn &&
+	git add new_fn &&
+	git diff -M --cached > patch1 &&
+	git mv new_fn same_fn
+	sed -i -e "s/^e/y/" same_fn &&
+	git diff >> patch1 &&
+	git reset --hard &&
+	test_must_fail git apply patch1
+'
+
+test_done
-- 
1.5.6.rc2.48.g13da

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-apply doesn't handle same name patches well [V3]
  2008-06-16 20:04 [PATCH] git-apply doesn't handle same name patches well [V3] Don Zickus
@ 2008-06-16 20:27 ` Jakub Narebski
  2008-06-17  9:40 ` Johannes Schindelin
  2008-06-18  0:42 ` Junio C Hamano
  2 siblings, 0 replies; 6+ messages in thread
From: Jakub Narebski @ 2008-06-16 20:27 UTC (permalink / raw)
  To: Don Zickus; +Cc: git

Don Zickus <dzickus@redhat.com> writes:

> When working with a lot of people who backport patches all day long, every
> once in a while I get a patch that modifies the same file more than once
> inside the same patch.  git-apply either fails if the second change relies
> on the first change or silently drops the first change if the second change
> is independent.
> 
> The silent part is the scary scenario for us.  Also this behaviour is
> different from the patch-utils.
> 
> I have modified git-apply to cache the filenames of files it modifies such
> that if a later patch chunk modifies a file in the cache it will buffer the
> previously changed file instead of reading the original file from disk.
> 
> Logic has been put in to handle creations/deletions/renames/copies.  All the
> relevant tests of git-apply succeed.

Very nice, although most probably I would never use this.

+1

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-apply doesn't handle same name patches well [V3]
  2008-06-16 20:04 [PATCH] git-apply doesn't handle same name patches well [V3] Don Zickus
  2008-06-16 20:27 ` Jakub Narebski
@ 2008-06-17  9:40 ` Johannes Schindelin
  2008-06-18  0:42 ` Junio C Hamano
  2 siblings, 0 replies; 6+ messages in thread
From: Johannes Schindelin @ 2008-06-17  9:40 UTC (permalink / raw)
  To: Don Zickus; +Cc: git

Hi,

On Mon, 16 Jun 2008, Don Zickus wrote:

> Changes since v2
> ================
> - the updated patch not a v1 copy (doh!)

Ah.  If you would have reused the same mail thread, I would not have 
missed it when I responded to V2.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-apply doesn't handle same name patches well [V3]
  2008-06-16 20:04 [PATCH] git-apply doesn't handle same name patches well [V3] Don Zickus
  2008-06-16 20:27 ` Jakub Narebski
  2008-06-17  9:40 ` Johannes Schindelin
@ 2008-06-18  0:42 ` Junio C Hamano
  2008-06-19 21:33   ` Don Zickus
  2 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2008-06-18  0:42 UTC (permalink / raw)
  To: Don Zickus; +Cc: git

Don Zickus <dzickus@redhat.com> writes:

> When working with a lot of people who backport patches all day long, every
> once in a while I get a patch that modifies the same file more than once
> inside the same patch.  git-apply either fails if the second change relies
> on the first change or silently drops the first change if the second change
> is independent.

Good issue to tackle.

> A new test has been added to cover the cases I addressed.  However,
> currently adding changes to renamed file inside the same patch doesn't work
> correctly (it fails to find new file).  I didn't know how to fix this
> correctly, so I have the test fail expectedly.

Do you mean that the first patch rename-edits A to B, and then the second
patch edits B in place?  Because your fn_cache is keyed by postimage
filename (in this case B), I would imagine that the later lookup of B
should successfully find the patch result from the previous one.  Unless
the in_fn_cache() is somehow wrong...

or do you mean that the first patch rename-edits A to B, but the second
one still wants to edit A in place and you would want to pretend as if the
later one is for a patch to B?  I would think that is doable but asking
for too much magic, and a tool with too much magic is scary.

> The fix is relatively straight-forward.  But I'm not sure if this new
> behaviour is something the git community wants.

> Changes since v2
> ================
> - the updated patch not a v1 copy (doh!)
>
> Changes since v1
> ================
> - converted to path-list structs
> - added testcases for renaming a patch and apply a new patch on top inside
> the same patch file
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---

First, a minor style issue.  Because the final round after polishing on
the list is etched into the history without any of the earlier doh!
rounds, we do not want to have the above "Changes since..." in the commit
log message.

  Cf. http://article.gmane.org/gmane.comp.version-control.git/84308

>  builtin-apply.c          |   52 +++++++++++++++++++++++++++++++++++-
>  t/t4127-apply-same-fn.sh |   67 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 118 insertions(+), 1 deletions(-)
>  create mode 100755 t/t4127-apply-same-fn.sh
>
> diff --git a/builtin-apply.c b/builtin-apply.c
> index c497889..9f76ce4 100644
> --- a/builtin-apply.c
> +++ b/builtin-apply.c
> @@ -12,6 +12,7 @@
>  #include "blob.h"
>  #include "delta.h"
>  #include "builtin.h"
> +#include "path-list.h"
>  
>  /*
>   *  --check turns on checking that the working tree matches the
> @@ -185,6 +186,13 @@ struct image {
>  	struct line *line;
>  };
>  
> +/*
> + * Caches patch filenames to handle the case where a
> + * patch chunk reuses a filename
> + */
> +
> +struct path_list fn_cache = {NULL, 0, 0, 0};

"Reuses a filename"?  Do you mean touches the same file again?

This is not a "cache" in the sense that you can nuke it without changing
the behaviour except performance, but more about "Record the postimage
pathnames (and contents, indirectly by pointing at the patch structure)
each previous patch application would have created".

There is a case where a normal git patch contains two separate patches to
the same file.  A typechange patch is always expressed as a deletion of
the old path immediately followed by a creation of the same path.  I have
to wonder why that codepath for handing that particular special case is
not changed in this patch.  Surely the mechanism you are adding is a
generalization that can cover such a case as well, isn't it?

> @@ -2176,6 +2184,38 @@ static int read_file_or_gitlink(struct cache_entry *ce, struct strbuf *buf)
>  	return 0;
>  }
>  
> +struct patch *in_fn_cache(char *name)
> +{
> +	struct path_list_item *item;
> +
> +	item = path_list_lookup(name, &fn_cache);
> +	if (item != NULL)
> +		return (struct patch *)item->util;
> +
> +	return NULL;
> +}
> +
> +void add_to_fn_cache(char *name, struct patch *patch)
> +{
> +	struct path_list_item *item;
> +
> +	/* Always add new_name unless patch is a deletion */
> +	if (name != NULL) {
> +		item = path_list_insert(name, &fn_cache);
> +		item->util = patch;
> +	}
> +
> +	/* skip normal diffs, creations and copies */

This comment is a "Huh?".

> +	/*
> +	 * store a failure on rename/deletion cases because
> +	 * later chunks shouldn't patch old names
> +	 */
> +	if ((name == NULL) || (patch->is_rename)) {
> +		item = path_list_insert(patch->old_name, &fn_cache);
> +		item->util = (struct patch *) -1;

If you look at the patch->old_name _anyway_, why do you give a separate
name parameter to this function?  The function would be much easier to
read if you pass only patch, and use patch->new_name instead of the
separate name parameter.  Otherwise the reader needs to scroll down and
figure out that name is a new name by looking at the call site to
understand what is going on.

> +	}
> +}
> +
>  static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *ce)
>  {
>  	struct strbuf buf;
> @@ -2188,7 +2228,16 @@ static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *
>  		if (read_file_or_gitlink(ce, &buf))
>  			return error("read of %s failed", patch->old_name);
>  	} else if (patch->old_name) {
> -		if (S_ISGITLINK(patch->old_mode)) {
> +		struct patch *tpatch = in_fn_cache(patch->old_name);
> +
> +		if (tpatch != NULL) {
> +			if (tpatch == (struct patch *) -1) {
> +				return error("patch %s has been renamed/deleted",
> +					patch->old_name);
> +			}
> +			/* We have a patched copy in memory use that */
> +			strbuf_add(&buf, tpatch->result, tpatch->resultsize);
> +		} else if (S_ISGITLINK(patch->old_mode)) {

Isn't this wrong?  Why can't this new enhancement be used while operating
only on the index?

>  			if (ce) {
>  				read_file_or_gitlink(ce, &buf);
>  			} else {
> @@ -2211,6 +2260,7 @@ static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *
>  		return -1; /* note with --reject this succeeds. */
>  	patch->result = image.buf;
>  	patch->resultsize = image.len;
> +	add_to_fn_cache(patch->new_name, patch);
>  	free(image.line_allocated);

So after each patch application, the patch is remembered, keyed by the
pathname of the postimage, so by consulting fn_cache with a pathname, you
can know what the status of a path that was touched by previous patches
is.  That sounds quite straightforward.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-apply doesn't handle same name patches well [V3]
  2008-06-18  0:42 ` Junio C Hamano
@ 2008-06-19 21:33   ` Don Zickus
  2008-06-19 22:15     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Don Zickus @ 2008-06-19 21:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Jun 17, 2008 at 05:42:54PM -0700, Junio C Hamano wrote:
> Don Zickus <dzickus@redhat.com> writes:
> 
> > When working with a lot of people who backport patches all day long, every
> > once in a while I get a patch that modifies the same file more than once
> > inside the same patch.  git-apply either fails if the second change relies
> > on the first change or silently drops the first change if the second change
> > is independent.
> 
> Good issue to tackle.
> 
> > A new test has been added to cover the cases I addressed.  However,
> > currently adding changes to renamed file inside the same patch doesn't work
> > correctly (it fails to find new file).  I didn't know how to fix this
> > correctly, so I have the test fail expectedly.
> 
> Do you mean that the first patch rename-edits A to B, and then the second
> patch edits B in place?  Because your fn_cache is keyed by postimage
> filename (in this case B), I would imagine that the later lookup of B
> should successfully find the patch result from the previous one.  Unless
> the in_fn_cache() is somehow wrong...

Yeah, I thought so too, but after debugging the problem, the 'lstat' in
check_preimage() fails to find the file on disk and exits with that error.

I was going to try to figure out a way to grab it from fn_cache but I
wasn't sure how much of the 'lstat' info is needed later.

> 
> or do you mean that the first patch rename-edits A to B, but the second
> one still wants to edit A in place and you would want to pretend as if the
> later one is for a patch to B?  I would think that is doable but asking
> for too much magic, and a tool with too much magic is scary.

Personally I think this case should be a failure.  I even attached a
testcase in my patch to make sure this failed.  I wasn't comfortable doing
this magic either.

> > +/*
> > + * Caches patch filenames to handle the case where a
> > + * patch chunk reuses a filename
> > + */
> > +
> > +struct path_list fn_cache = {NULL, 0, 0, 0};
> 
> "Reuses a filename"?  Do you mean touches the same file again?
> 
> This is not a "cache" in the sense that you can nuke it without changing
> the behaviour except performance, but more about "Record the postimage
> pathnames (and contents, indirectly by pointing at the patch structure)
> each previous patch application would have created".

Yes, poor choice of words.  I'll try to think of a something else.

> 
> There is a case where a normal git patch contains two separate patches to
> the same file.  A typechange patch is always expressed as a deletion of
> the old path immediately followed by a creation of the same path.  I have
> to wonder why that codepath for handing that particular special case is
> not changed in this patch.  Surely the mechanism you are adding is a
> generalization that can cover such a case as well, isn't it?

Heh.  Maybe, but I didn't know the code well enough to do that.  Pointers?
I'll try to poke around and see what I can cleanup, but I will have to
rely on the mailing-list to make sure I did it correctly.

> 
> > @@ -2176,6 +2184,38 @@ static int read_file_or_gitlink(struct cache_entry *ce, struct strbuf *buf)
> >  	return 0;
> >  }
> >  
> > +struct patch *in_fn_cache(char *name)
> > +{
> > +	struct path_list_item *item;
> > +
> > +	item = path_list_lookup(name, &fn_cache);
> > +	if (item != NULL)
> > +		return (struct patch *)item->util;
> > +
> > +	return NULL;
> > +}
> > +
> > +void add_to_fn_cache(char *name, struct patch *patch)
> > +{
> > +	struct path_list_item *item;
> > +
> > +	/* Always add new_name unless patch is a deletion */
> > +	if (name != NULL) {
> > +		item = path_list_insert(name, &fn_cache);
> > +		item->util = patch;
> > +	}
> > +
> > +	/* skip normal diffs, creations and copies */
> 
> This comment is a "Huh?".
I was just making a note about which cases I wanted to skip and which ones
I wanted to process.  I can expand on it.  For example, patches that
contain normal diffs, file creations and git copies or ignored as don't
cares.  Only file deletions and git renames were interesting to me in the
code below.

> 
> > +	/*
> > +	 * store a failure on rename/deletion cases because
> > +	 * later chunks shouldn't patch old names
> > +	 */
> > +	if ((name == NULL) || (patch->is_rename)) {
> > +		item = path_list_insert(patch->old_name, &fn_cache);
> > +		item->util = (struct patch *) -1;
> 
> If you look at the patch->old_name _anyway_, why do you give a separate
> name parameter to this function?  The function would be much easier to
> read if you pass only patch, and use patch->new_name instead of the
> separate name parameter.  Otherwise the reader needs to scroll down and
> figure out that name is a new name by looking at the call site to
> understand what is going on.

Yeah, leftover code that was added when I ran into rename and copy
problems.

> 
> > +	}
> > +}
> > +
> >  static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *ce)
> >  {
> >  	struct strbuf buf;
> > @@ -2188,7 +2228,16 @@ static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *
> >  		if (read_file_or_gitlink(ce, &buf))
> >  			return error("read of %s failed", patch->old_name);
> >  	} else if (patch->old_name) {
> > -		if (S_ISGITLINK(patch->old_mode)) {
> > +		struct patch *tpatch = in_fn_cache(patch->old_name);
> > +
> > +		if (tpatch != NULL) {
> > +			if (tpatch == (struct patch *) -1) {
> > +				return error("patch %s has been renamed/deleted",
> > +					patch->old_name);
> > +			}
> > +			/* We have a patched copy in memory use that */
> > +			strbuf_add(&buf, tpatch->result, tpatch->resultsize);
> > +		} else if (S_ISGITLINK(patch->old_mode)) {
> 
> Isn't this wrong?  Why can't this new enhancement be used while operating
> only on the index?

I don't know, can it?  You tell me.  I wasn't sure on the whole index
thing worked.

I'll respin the patch when I get some free time tomorrow or next week.

Cheers,
Don

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-apply doesn't handle same name patches well [V3]
  2008-06-19 21:33   ` Don Zickus
@ 2008-06-19 22:15     ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2008-06-19 22:15 UTC (permalink / raw)
  To: Don Zickus; +Cc: git

Don Zickus <dzickus@redhat.com> writes:

> On Tue, Jun 17, 2008 at 05:42:54PM -0700, Junio C Hamano wrote:
>> Don Zickus <dzickus@redhat.com> writes:
> ...
> I was going to try to figure out a way to grab it from fn_cache but I
> wasn't sure how much of the 'lstat' info is needed later.

The usual case of one-diff-one-path patch application wants to make sure
that there is no discrepancy between the index and the work tree for the
path when working in --index mode.  When working in work-tree-only mode,
lstat just makes sure that the path to be patched actually exists (or
doesn't, if it is a creation patch).

When fn_cache is used, you pretend as if the resulting path exists and up
to date when found in fn_cache and the previous round succeeded, so you
can substitute the lstat (you cannot just lose it, but need to make sure
the path exists after applying the earlier fn_cache contents when handling
a later patch that wants to touch an existing file).  As you pretend that
the previous round succeeded, you do not have to check the up-to-dateness
between the index and work tree when dealing with a path that has previous
result in fn_cache, even when operating in --index mode.

>> or do you mean that the first patch rename-edits A to B, but the second
>> one still wants to edit A in place and you would want to pretend as if the
>> later one is for a patch to B?  I would think that is doable but asking
>> for too much magic, and a tool with too much magic is scary.
>
> Personally I think this case should be a failure.  I even attached a
> testcase in my patch to make sure this failed.  I wasn't comfortable doing
> this magic either.

Good.

>> There is a case where a normal git patch contains two separate patches to
>> the same file.  A typechange patch is always expressed as a deletion of
>> the old path immediately followed by a creation of the same path.  I have
>> to wonder why that codepath for handing that particular special case is
>> not changed in this patch.  Surely the mechanism you are adding is a
>> generalization that can cover such a case as well, isn't it?
>
> Heh.  Maybe, but I didn't know the code well enough to do that.  Pointers?

See the way "prev_patch" is used in check_patch.

>> > @@ -2176,6 +2184,38 @@ static int read_file_or_gitlink(struct cache_entry *ce, struct strbuf *buf)
>> >  	return 0;
>> >  }
>> >  
>> > +struct patch *in_fn_cache(char *name)
>> > +{
>> > +	struct path_list_item *item;
>> > +
>> > +	item = path_list_lookup(name, &fn_cache);
>> > +	if (item != NULL)
>> > +		return (struct patch *)item->util;
>> > +
>> > +	return NULL;
>> > +}
>> > +
>> > +void add_to_fn_cache(char *name, struct patch *patch)
>> > +{
>> > +	struct path_list_item *item;
>> > +
>> > +	/* Always add new_name unless patch is a deletion */
>> > +	if (name != NULL) {
>> > +		item = path_list_insert(name, &fn_cache);
>> > +		item->util = patch;
>> > +	}
>> > +
>> > +	/* skip normal diffs, creations and copies */
>> 
>> This comment is a "Huh?".
> I was just making a note about which cases I wanted to skip and which ones
> I wanted to process.  I can expand on it.  For example, patches that
> contain normal diffs, file creations and git copies or ignored as don't
> cares.  Only file deletions and git renames were interesting to me in the
> code below.

The function's purpose is to record what the expected state of the path
after the current patch has applied successfully, and

 * If it is not a deletion, you will record the postimage, so that a later
   patch can work from there, not from what is in the initial state;

 * If it is a deletion (or rename-away), you record that the path after
   this patch no longer exists, so that you can catch a later broken patch
   that tries to touch the path.

You have already handled "normal diff, creation and copy" in the first
part "if (name != NULL)", but you talk about it again here, which was the
"Huh?" inducing part.  It gave an impression that the part that follows
does something other than the above two.

>> > +	/*
>> > +	 * store a failure on rename/deletion cases because
>> > +	 * later chunks shouldn't patch old names
>> > +	 */
>> > +	if ((name == NULL) || (patch->is_rename)) {
>> > +		item = path_list_insert(patch->old_name, &fn_cache);
>> > +		item->util = (struct patch *) -1;

If you have a patch that does A->B (rename), C->A (rename), A->A (mod),
would your code handle that?

>> If you look at the patch->old_name _anyway_, why do you give a separate
>> name parameter to this function?  The function would be much easier to
>> read if you pass only patch, and use patch->new_name instead of the
>> separate name parameter.  Otherwise the reader needs to scroll down and
>> figure out that name is a new name by looking at the call site to
>> understand what is going on.
>
> Yeah, leftover code that was added when I ran into rename and copy
> problems.
>
>> 
>> > +	}
>> > +}
>> > +
>> >  static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *ce)
>> >  {
>> >  	struct strbuf buf;
>> > @@ -2188,7 +2228,16 @@ static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *
>> >  		if (read_file_or_gitlink(ce, &buf))
>> >  			return error("read of %s failed", patch->old_name);
>> >  	} else if (patch->old_name) {
>> > -		if (S_ISGITLINK(patch->old_mode)) {
>> > +		struct patch *tpatch = in_fn_cache(patch->old_name);
>> > +
>> > +		if (tpatch != NULL) {
>> > +			if (tpatch == (struct patch *) -1) {
>> > +				return error("patch %s has been renamed/deleted",
>> > +					patch->old_name);
>> > +			}
>> > +			/* We have a patched copy in memory use that */
>> > +			strbuf_add(&buf, tpatch->result, tpatch->resultsize);
>> > +		} else if (S_ISGITLINK(patch->old_mode)) {
>> 
>> Isn't this wrong?  Why can't this new enhancement be used while operating
>> only on the index?
>
> I don't know, can it?  You tell me.  I wasn't sure on the whole index
> thing worked.

Perhaps a "git-apply" primer might help.  The program can work in three
modes of operation.

 * normal mode: look at and operate only on work tree files.

 * --index mode: work on both the index and the work tree.  IOW, patch the
   files and immediately do "git add -u" on that path, so that even
   addition and deletion are recorded in the index.  In this case, the
   paths involved must be up-to-date between the work tree and the index
   when you start "git apply"; otherwise you will lose your local
   changes.

 * --cached mode: work only on the index and never look at nor touch the
   work tree.

Now, if you have a patch that has A->A (mod), followed by another A->A(mod),
is there a good reason why you allow it in the first two modes and not the
last one?  I do not think so.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-06-19 22:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-16 20:04 [PATCH] git-apply doesn't handle same name patches well [V3] Don Zickus
2008-06-16 20:27 ` Jakub Narebski
2008-06-17  9:40 ` Johannes Schindelin
2008-06-18  0:42 ` Junio C Hamano
2008-06-19 21:33   ` Don Zickus
2008-06-19 22:15     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).