git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Demonstrate and partially work around a gitattributes problem
@ 2017-05-15 15:23 Johannes Schindelin
  2017-05-15 15:23 ` [PATCH 1/2] gitattributes: demonstrate that Git tries to read a bogus file Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Johannes Schindelin @ 2017-05-15 15:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Granted, it is a bit of a less common use case to call

	git diff <commit1>:<file1> <commit2>:<file2>

There are valid scenarios for such calls, of course.

And sometimes, one may want to compare even files that are stored in
subdirectories instead of the top-level directory. Example:

	git diff cafebabe:t/README bee7e55:t/README

Now, the surprising part is that Git tries to read a .gitattributes
files interpreting the *entire* prefix `cafebabe:t/` as a *directory
path*. I.e. it will try to read the *file* (not the blob)
`cafebabe:t/.gitattributes`.

A remarkable side effect prevents this from happening for files in the
top-level directory: there is no slash in the argument, therefore the
top-level .gitattributes (which have been read from the index/worktree
already) is used.

Unless, of course, one specifies the commit via a ref whose name
contains slashes.

As mentioned in the commit message of the patch demonstrating the
problem, I fear that this issue is *really* hard to fix. Certainly too
complicated for me alone.

Side note: I was really, really surprised, in a very positive way, that
Git handled the scenario gracefully where I *created* files with the
actual file paths <commit1>:<file1> and <commit2>:<file2>, i.e. where
`git diff <commit1>:<file1> <commit2>:<file2>` is ambiguous because it
could refer to two objects or to two files. In this case, Git warns
about the ambiguity (it is *slightly* misleading that it says to
separate *revisions* using `--`, as we do not want to compare
revisions... but it is definitely better than picking one side of the
ambiguity and running with it).

Git for Windows carries the second patch for ages already, and I would
have contributed it much earlier if I had not been busy with other
patches. The reason I submit it now is that it conflicts with Duy's
fopen_or_warn() patch series.


Johannes Schindelin (2):
  gitattributes: demonstrate that Git tries to read a bogus file
  mingw: Suppress warning that <commit>:.gitattributes does not exist

 attr.c                |  2 +-
 t/t0003-attributes.sh | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)


base-commit: b06d3643105c8758ed019125a4399cb7efdcce2c
Published-As: https://github.com/dscho/git/releases/tag/no-commit-colon-gitattributes-v1
Fetch-It-Via: git fetch https://github.com/dscho/git no-commit-colon-gitattributes-v1
-- 
2.13.0.windows.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/2] gitattributes: demonstrate that Git tries to read a bogus file
  2017-05-15 15:23 [PATCH 0/2] Demonstrate and partially work around a gitattributes problem Johannes Schindelin
@ 2017-05-15 15:23 ` Johannes Schindelin
  2017-05-15 15:24 ` [PATCH 2/2] mingw: Suppress warning that <commit>:.gitattributes does not exist Johannes Schindelin
  2017-05-16  7:54 ` [PATCH 0/2] Demonstrate and partially work around a gitattributes problem Jeff King
  2 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2017-05-15 15:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This problem has been reported originally in August 2015, as

	https://github.com/git-for-windows/git/issues/255

The symptom: when passing <commit>:<directory>/<file> style arguments to
`git diff`, Git tries to read the attributes from a file called
<commit>:<directory>/.gitattributes.

This symptom is more prominent on Windows because the colon in the file
name is illegal, and therefore reported to the user. On Linux, the colon
is legal, and it just so happens that that file typically does not
exist, and therefore there are no adverse consequences.

However, it is still a bug: Git should not even attempt to open that
file. Let's add a test case to demonstrate that problem, even on Linux
and MacOSX.

The underlying problem will be really tricky to fix: the run_diff*()
family of functions expects the path passed via the diff_filespec
structs to be the path as if it were in the worktree. However, when
processing the `git diff <blob1> <blob2>` invocation, Git uses
setup_revisions()'s parsing of the pending objects to fill in this
information, and setup_revisions() simply copies the command-line
argument, rather than reconstructing the actual *file* path.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t0003-attributes.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index f19ae4f8ccd..a7820022d8d 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -323,4 +323,25 @@ test_expect_success 'bare repository: test info/attributes' '
 	)
 '
 
+test_expect_failure 'a file <commit>:.gitattributes is ignored' '
+	git init bogus-file &&
+	(
+		cd bogus-file &&
+		mkdir sub &&
+		test_commit sub/file &&
+		test_commit sub/file2 &&
+		commit=$(git rev-parse HEAD) &&
+		if test_have_prereq !MINGW
+		then
+			# Windows does not support colons in filenames
+			mkdir $commit:sub &&
+			echo "* -inva/id" >$commit:sub/.gitattributes
+		fi &&
+		git diff $commit:sub/file.t..$commit:sub/file2.t >out 2>err &&
+		! grep "is not a valid attribute name" err &&
+		# On Windows, there will be a warning because of the colon
+		! grep "warning: unable to access .$commit:sub" err
+	)
+'
+
 test_done
-- 
2.13.0.windows.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] mingw: Suppress warning that <commit>:.gitattributes does not exist
  2017-05-15 15:23 [PATCH 0/2] Demonstrate and partially work around a gitattributes problem Johannes Schindelin
  2017-05-15 15:23 ` [PATCH 1/2] gitattributes: demonstrate that Git tries to read a bogus file Johannes Schindelin
@ 2017-05-15 15:24 ` Johannes Schindelin
  2017-05-16  7:54 ` [PATCH 0/2] Demonstrate and partially work around a gitattributes problem Jeff King
  2 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2017-05-15 15:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

On Windows, a file name containing a colon is illegal. We should
therefore expect the corresponding errno when `fopen()` is called for a
path of the form <commit>:.gitattributes.

This fixes the symptom reported in

	https://github.com/git-for-windows/git/issues/255

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 attr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 7e2134471cb..a0297088633 100644
--- a/attr.c
+++ b/attr.c
@@ -726,7 +726,7 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 	int lineno = 0;
 
 	if (!fp) {
-		if (errno != ENOENT && errno != ENOTDIR)
+		if (errno != ENOENT && errno != ENOTDIR && errno != EINVAL)
 			warn_on_inaccessible(path);
 		return NULL;
 	}
-- 
2.13.0.windows.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] Demonstrate and partially work around a gitattributes problem
  2017-05-15 15:23 [PATCH 0/2] Demonstrate and partially work around a gitattributes problem Johannes Schindelin
  2017-05-15 15:23 ` [PATCH 1/2] gitattributes: demonstrate that Git tries to read a bogus file Johannes Schindelin
  2017-05-15 15:24 ` [PATCH 2/2] mingw: Suppress warning that <commit>:.gitattributes does not exist Johannes Schindelin
@ 2017-05-16  7:54 ` Jeff King
  2017-05-16  8:10   ` Jeff King
  2 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2017-05-16  7:54 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy

On Mon, May 15, 2017 at 05:23:44PM +0200, Johannes Schindelin wrote:

> Granted, it is a bit of a less common use case to call
> 
> 	git diff <commit1>:<file1> <commit2>:<file2>
> 
> There are valid scenarios for such calls, of course.
> 
> And sometimes, one may want to compare even files that are stored in
> subdirectories instead of the top-level directory. Example:
> 
> 	git diff cafebabe:t/README bee7e55:t/README
> 
> Now, the surprising part is that Git tries to read a .gitattributes
> files interpreting the *entire* prefix `cafebabe:t/` as a *directory
> path*. I.e. it will try to read the *file* (not the blob)
> `cafebabe:t/.gitattributes`.

You also get a very silly-looking diff header:

  $ git diff HEAD:Makefile HEAD~100:Makefile
  diff --git a/HEAD:Makefile b/HEAD~100:Makefile
  index e35542e63..9b36068ac 100644
  --- a/HEAD:Makefile
  +++ b/HEAD~100:Makefile
  [...]

I think the problem is that diff's blob-compare should be feeding the
actual path to the diff machinery, rather than the original name. True,
that carries less information, but it matches the other cases. For
instance:

  git diff HEAD~100:Makefile Makefile

will diff a blob and a file, and uses the file's path (so "HEAD~100"
does not appear at all in the output).

Likewise, doing:

  git diff HEAD~100 HEAD -- Makefile

will show you the diff over "Makefile", without bothering to mention
the commit endpoints.

So I think we should just be feeding "Makefile" to the diff code even
for the 2-blob case.

Something like the patch below almost gets us there, but it suffers from
the fact that the revision machinery doesn't record the path for the
blobs it parses. So:

  1. It's a little inefficient, because it has to re-resolve the names
     again.

  2. It works for "git diff $commit:file1 $commit:file2", because each
     of the blobs has a name we can re-resolve. But connecting them with
     "$commit:file1..$commit:file2" doesn't work, because that name is
     given to _both_ blobs, and can't be resolved as a single sha1. I'd
     argue that this is actually a bug in the revision code.

Both would be solved if handle_revision_arg used get_sha1_with_context
to record the path while resolving (struct object_entry already has a
slot for it, but the revision code never sets it).

-Peff

---
diff --git a/builtin/diff.c b/builtin/diff.c
index d184aafab..e8c541f00 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -406,10 +406,30 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 			obj->flags |= flags;
 			add_object_array(obj, name, &ent);
 		} else if (obj->type == OBJ_BLOB) {
+			struct object_context oc;
+
 			if (2 <= blobs)
 				die(_("more than two blobs given: '%s'"), name);
-			hashcpy(blob[blobs].oid.hash, obj->oid.hash);
-			blob[blobs].name = name;
+
+			/*
+			 * We already resolved this once as a blob via the
+			 * rev_info machinery, but it did not bother to collect
+			 * any path information. Let's re-resolve to get it.
+			 *
+			 * There are a few corner cases where there is no path
+			 * information (e.g., a tag pointing straight at a
+			 * blob). In that case we'll fall back to just using
+			 * the name the user gave us.
+			 */
+			if (!get_sha1_with_context(entry->name, 0, blob[blobs].oid.hash, &oc) &&
+			    !hashcmp(blob[blobs].oid.hash, obj->oid.hash) &&
+			    oc.path[0]) {
+				blob[blobs].name = xstrdup(oc.path);
+			} else {
+				blob[blobs].name = name;
+				hashcpy(blob[blobs].oid.hash, obj->oid.hash);
+			}
+
 			blob[blobs].mode = entry->mode;
 			blobs++;
 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] Demonstrate and partially work around a gitattributes problem
  2017-05-16  7:54 ` [PATCH 0/2] Demonstrate and partially work around a gitattributes problem Jeff King
@ 2017-05-16  8:10   ` Jeff King
  2017-05-17  1:38     ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2017-05-16  8:10 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy

On Tue, May 16, 2017 at 03:54:18AM -0400, Jeff King wrote:

> Something like the patch below almost gets us there, but it suffers from
> the fact that the revision machinery doesn't record the path for the
> blobs it parses. So:
> 
>   1. It's a little inefficient, because it has to re-resolve the names
>      again.
> 
>   2. It works for "git diff $commit:file1 $commit:file2", because each
>      of the blobs has a name we can re-resolve. But connecting them with
>      "$commit:file1..$commit:file2" doesn't work, because that name is
>      given to _both_ blobs, and can't be resolved as a single sha1. I'd
>      argue that this is actually a bug in the revision code.
> 
> Both would be solved if handle_revision_arg used get_sha1_with_context
> to record the path while resolving (struct object_entry already has a
> slot for it, but the revision code never sets it).

This turned out much easier than I feared. With the patch below:

  - your t0003 test passes

  - the diff headers are much more sensible (IMHO)

  - it fixes a semi-related bug in which "git diff $blob1 $blob2"
    compared the modes correctly, but "git diff $blob1..$blob2" did not
    (because the ".." code path did not correctly record the modes).

I'll stop here for now to get any comments/feedback. There are a few
changes I'd make to polish this up:

  - the mode fix should be its own preparatory patch

  - teaching revision.c to pass out path info should be its own patch

  - the "struct blobinfo" in builtin/diff.c should probably call it
    "path" instead of "name" for clarity. In fact, I kind of wonder if
    this should just be an object_array element, as that has all of
    the information.

  - the way that "struct object_context" handles the paths with a
    fixed-size buffer is badly designed (by me, long ago). I may see
    what it would take to refactor that to match the technique in
    sha1_object_info_extended(), which I think has worked well in
    practice.

    That's orthogonal to this series, but it's been bugging me for a
    long time, and obviously the caller here would have to adapt.

---
diff --git a/builtin/diff.c b/builtin/diff.c
index d184aafab..8ed1e99e3 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -409,7 +409,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 			if (2 <= blobs)
 				die(_("more than two blobs given: '%s'"), name);
 			hashcpy(blob[blobs].oid.hash, obj->oid.hash);
-			blob[blobs].name = name;
+			blob[blobs].name = entry->path ? entry->path : name;
 			blob[blobs].mode = entry->mode;
 			blobs++;
 
diff --git a/revision.c b/revision.c
index 8a8c1789c..72b9af7de 100644
--- a/revision.c
+++ b/revision.c
@@ -1431,7 +1431,7 @@ static void prepare_show_merge(struct rev_info *revs)
 
 int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsigned revarg_opt)
 {
-	struct object_context oc;
+	struct object_context oc, oc2;
 	char *dotdot;
 	struct object *object;
 	unsigned char sha1[20];
@@ -1470,8 +1470,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 				return -1;
 			}
 		}
-		if (!get_sha1_committish(this, from_sha1) &&
-		    !get_sha1_committish(next, sha1)) {
+		if (!get_sha1_with_context(this, GET_SHA1_COMMITTISH, from_sha1, &oc) &&
+		    !get_sha1_with_context(next, GET_SHA1_COMMITTISH, sha1, &oc2)) {
 			struct object *a_obj, *b_obj;
 
 			if (!cant_be_filename) {
@@ -1523,8 +1523,12 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 					REV_CMD_LEFT, a_flags);
 			add_rev_cmdline(revs, b_obj, next,
 					REV_CMD_RIGHT, flags);
-			add_pending_object(revs, a_obj, this);
-			add_pending_object(revs, b_obj, next);
+			add_pending_object_with_path(revs, a_obj, this,
+						     oc.mode,
+						     oc.path[0] ? oc.path : NULL);
+			add_pending_object_with_path(revs, b_obj, next,
+						     oc2.mode,
+						     oc2.path[0] ? oc2.path : NULL);
 			return 0;
 		}
 		*dotdot = '.';
@@ -1574,7 +1578,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, sha1, flags ^ local_flags);
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
-	add_pending_object_with_mode(revs, object, arg, oc.mode);
+	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] Demonstrate and partially work around a gitattributes problem
  2017-05-16  8:10   ` Jeff King
@ 2017-05-17  1:38     ` Junio C Hamano
  2017-05-17  2:05       ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2017-05-17  1:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

> diff --git a/builtin/diff.c b/builtin/diff.c
> index d184aafab..8ed1e99e3 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -409,7 +409,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  			if (2 <= blobs)
>  				die(_("more than two blobs given: '%s'"), name);
>  			hashcpy(blob[blobs].oid.hash, obj->oid.hash);
> -			blob[blobs].name = name;
> +			blob[blobs].name = entry->path ? entry->path : name;
>  			blob[blobs].mode = entry->mode;
>  			blobs++;
>   ...
> -			add_pending_object(revs, a_obj, this);
> -			add_pending_object(revs, b_obj, next);
> +			add_pending_object_with_path(revs, a_obj, this,
> +						     oc.mode,
> +						     oc.path[0] ? oc.path : NULL);
> +			add_pending_object_with_path(revs, b_obj, next,
> +						     oc2.mode,
> +						     oc2.path[0] ? oc2.path : NULL);

The fix is surprisingly simple, and I think it definitely goes in
the right direction.

Somehow, it bothers me to be forced to view (a_obj, this, from_sha1,
oc) vs (b_obj, next, sha1, oc2) as a sensibly corresponding pair of
tuples, but that is not something your introduction of "oc2"
started, so I won't complain ;-).

>  			return 0;
>  		}
>  		*dotdot = '.';
> @@ -1574,7 +1578,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
>  		verify_non_filename(revs->prefix, arg);
>  	object = get_reference(revs, arg, sha1, flags ^ local_flags);
>  	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
> -	add_pending_object_with_mode(revs, object, arg, oc.mode);
> +	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] Demonstrate and partially work around a gitattributes problem
  2017-05-17  1:38     ` Junio C Hamano
@ 2017-05-17  2:05       ` Jeff King
  2017-05-18 19:23         ` Johannes Schindelin
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
  0 siblings, 2 replies; 33+ messages in thread
From: Jeff King @ 2017-05-17  2:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

On Wed, May 17, 2017 at 10:38:36AM +0900, Junio C Hamano wrote:

> > -			add_pending_object(revs, a_obj, this);
> > -			add_pending_object(revs, b_obj, next);
> > +			add_pending_object_with_path(revs, a_obj, this,
> > +						     oc.mode,
> > +						     oc.path[0] ? oc.path : NULL);
> > +			add_pending_object_with_path(revs, b_obj, next,
> > +						     oc2.mode,
> > +						     oc2.path[0] ? oc2.path : NULL);
> 
> The fix is surprisingly simple, and I think it definitely goes in
> the right direction.
> 
> Somehow, it bothers me to be forced to view (a_obj, this, from_sha1,
> oc) vs (b_obj, next, sha1, oc2) as a sensibly corresponding pair of
> tuples, but that is not something your introduction of "oc2"
> started, so I won't complain ;-).

Yes, in my polishing I switched this out at least "oc_a" and "oc_b" so
we could be sure they match correctly. I think this whole "dotdot"
conditional could be pulled out to its own function, and probably
consistently use "a" and "b" for the endpoints. I'll see what
refactoring I can do to make it more readable.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] Demonstrate and partially work around a gitattributes problem
  2017-05-17  2:05       ` Jeff King
@ 2017-05-18 19:23         ` Johannes Schindelin
  2017-05-19  0:00           ` Jeff King
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2017-05-18 19:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Nguyễn Thái Ngọc Duy

Hi Peff,

On Tue, 16 May 2017, Jeff King wrote:

> On Wed, May 17, 2017 at 10:38:36AM +0900, Junio C Hamano wrote:
> 
> > > -			add_pending_object(revs, a_obj, this);
> > > -			add_pending_object(revs, b_obj, next);
> > > +			add_pending_object_with_path(revs, a_obj, this,
> > > +						     oc.mode,
> > > +						     oc.path[0] ? oc.path : NULL);
> > > +			add_pending_object_with_path(revs, b_obj, next,
> > > +						     oc2.mode,
> > > +						     oc2.path[0] ? oc2.path : NULL);
> > 
> > The fix is surprisingly simple, and I think it definitely goes in
> > the right direction.
> > 
> > Somehow, it bothers me to be forced to view (a_obj, this, from_sha1,
> > oc) vs (b_obj, next, sha1, oc2) as a sensibly corresponding pair of
> > tuples, but that is not something your introduction of "oc2"
> > started, so I won't complain ;-).
> 
> Yes, in my polishing I switched this out at least "oc_a" and "oc_b" so
> we could be sure they match correctly. I think this whole "dotdot"
> conditional could be pulled out to its own function, and probably
> consistently use "a" and "b" for the endpoints. I'll see what
> refactoring I can do to make it more readable.

So you drive this forward? That would really help me a lot. Please make my
1/2 part of your patches to that end.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] Demonstrate and partially work around a gitattributes problem
  2017-05-18 19:23         ` Johannes Schindelin
@ 2017-05-19  0:00           ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19  0:00 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, git, Nguyễn Thái Ngọc Duy

On Thu, May 18, 2017 at 09:23:17PM +0200, Johannes Schindelin wrote:

> So you drive this forward? That would really help me a lot. Please make my
> 1/2 part of your patches to that end.

Yes, sorry, I haven't gotten a chance to polish the patches yet.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar
  2017-05-17  2:05       ` Jeff King
  2017-05-18 19:23         ` Johannes Schindelin
@ 2017-05-19 12:46         ` Jeff King
  2017-05-19 12:48           ` [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently Jeff King
                             ` (15 more replies)
  1 sibling, 16 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

On Tue, May 16, 2017 at 10:05:35PM -0400, Jeff King wrote:

> On Wed, May 17, 2017 at 10:38:36AM +0900, Junio C Hamano wrote:
> 
> > > -			add_pending_object(revs, a_obj, this);
> > > -			add_pending_object(revs, b_obj, next);
> > > +			add_pending_object_with_path(revs, a_obj, this,
> > > +						     oc.mode,
> > > +						     oc.path[0] ? oc.path : NULL);
> > > +			add_pending_object_with_path(revs, b_obj, next,
> > > +						     oc2.mode,
> > > +						     oc2.path[0] ? oc2.path : NULL);
> > 
> > The fix is surprisingly simple, and I think it definitely goes in
> > the right direction.
> > 
> > Somehow, it bothers me to be forced to view (a_obj, this, from_sha1,
> > oc) vs (b_obj, next, sha1, oc2) as a sensibly corresponding pair of
> > tuples, but that is not something your introduction of "oc2"
> > started, so I won't complain ;-).
> 
> Yes, in my polishing I switched this out at least "oc_a" and "oc_b" so
> we could be sure they match correctly. I think this whole "dotdot"
> conditional could be pulled out to its own function, and probably
> consistently use "a" and "b" for the endpoints. I'll see what
> refactoring I can do to make it more readable.

The cleanup ended up being a bit bigger than I expected, but I did find
an obscure bug while I was there. And I think the end result looks
pretty decent.

The patches are listed below. There are a few things I didn't do (and
don't plan to in the near future):

  - I didn't pick up Dscho's gitattributes test. The tests in patch 9
    cover that in a much more direct fashion by looking at the diff
    headers. We _could_ still check that they respect gitattributes, but
    there's no reason that they wouldn't; the bug was before we even hit
    the diff machinery at all.

  - There is an interesting related case with attributes that wasn't
    tested by that case. If you do:

       git diff $(git rev-parse HEAD:one) $(git rev-parse HEAD:two)

    then we'll feed those sha1s to the diff machinery as the path (we
    have enough information at that point to know they aren't actually
    paths, but we have to give _something_ to the diff code to display).

    That means we'll also look for .gitattributes files that match those
    names. It would be hard to fix (because the diff machinery doesn't
    have a notion of "this path is for display, but not for attributes).
    And I doubt anybody cares (AIUI, the motivation for the original was
    not that somebody was concerned with reading what is likely to be a
    non-existent attributes file, but rather that looking at names with
    colons produces a bogus warning on Windows).

  - I noticed that the dotdot parser jumps into the middle of the string
    with a strstr(), and never tries to find other dotdots. That means
    things like:

      HEAD@{2..days.ago}...HEAD@{3..days.ago}

    does not parse (we find the first "..", realize that it's not a
    range operator, and then give up looking for more range operators).
    That could be solved by either parsing left-to-right, or by trying
    each ".." in the string in turn. I doubt anybody cares overly much.

    I think we do get the related:

      git show HEAD:foo..bar

    correct, because we see that "HEAD:foo" is not a commit and bail to
    trying the whole thing as a single unit.

I think those are all minor problems that have likely been around for
10+ years, and would take a lot of digging and work to fix. Unless
somebody actually hits one in practice, I'm happy to punt for another
decade.

  [01/15]: handle_revision_arg: reset "dotdot" consistently
  [02/15]: handle_revision_arg: simplify commit reference lookups
  [03/15]: handle_revision_arg: stop using "dotdot" as a generic pointer
  [04/15]: handle_revision_arg: hoist ".." check out of range parsing
  [05/15]: handle_revision_arg: add handle_dotdot() helper
  [06/15]: sha1_name: consistently refer to object_context as "oc"
  [07/15]: get_sha1_with_context: always initialize oc->symlink_path
  [08/15]: get_sha1_with_context: dynamically allocate oc->path
  [09/15]: t4063: add tests of direct blob diffs
  [10/15]: handle_revision_arg: record modes for "a..b" endpoints
  [11/15]: handle_revision_arg: record paths for pending objects
  [12/15]: diff: pass whole pending entry in blobinfo
  [13/15]: diff: use the word "path" instead of "name" for blobs
  [14/15]: diff: use pending "path" if it is available
  [15/15]: diff: use blob path for blob/file diffs

 builtin/cat-file.c    |   4 +-
 builtin/diff.c        |  60 ++++++-------
 builtin/grep.c        |   4 +-
 builtin/log.c         |  10 ++-
 cache.h               |  10 ++-
 revision.c            | 243 +++++++++++++++++++++++++++++---------------------
 sha1_name.c           |  11 ++-
 t/t4063-diff-blobs.sh |  96 ++++++++++++++++++++
 t/t4202-log.sh        |   9 ++
 tree-walk.c           |   1 -
 10 files changed, 301 insertions(+), 147 deletions(-)
 create mode 100755 t/t4063-diff-blobs.sh

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
@ 2017-05-19 12:48           ` Jeff King
  2017-05-20 14:56             ` Philip Oakley
  2017-05-19 12:48           ` [PATCH 02/15] handle_revision_arg: simplify commit reference lookups Jeff King
                             ` (14 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

When we are parsing a range like "a..b", we write a
temporary NUL over the first ".", so that we can access the
names "a" and "b" as C strings. But our restoration of the
original "." is done at inconsistent times, which can lead
to confusing results.

For most calls, we restore the "." after we resolve the
names, but before we call verify_non_filename().  This means
that when we later call add_pending_object(), the name for
the left-hand "a" has been re-expanded to "a..b". You can
see this with:

  git log --source a...b

where "b" will be correctly marked with "b", but "a" will be
marked with "a...b". Likewise with "a..b" (though you need
to use --boundary to even see "a" at all in that case).

To top off the confusion, when the REVARG_CANNOT_BE_FILENAME
flag is set, we skip the non-filename check, and leave the
NUL in place.

That means we do report the correct name for "a" in the
pending array. But some code paths try to show the whole
"a..b" name in error messages, and these erroneously show
only "a" instead of "a..b". E.g.:

  $ git cherry-pick HEAD:foo..HEAD:foo
  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
  fatal: Invalid symmetric difference expression HEAD:foo

(That last message should be "HEAD:foo...HEAD:foo"; I used
cherry-pick because it passes the CANNOT_BE_FILENAME flag).

As an interesting side note, cherry-pick actually looks at
and re-resolves the arguments from the pending->name fields.
So it would have been visibly broken by the first bug, but
the effect was canceled out by the second one.

This patch makes the whole function consistent by re-writing
the NUL immediately after calling verify_non_filename(), and
then restoring the "." as appropriate in some error-printing
and early-return code paths.

Signed-off-by: Jeff King <peff@peff.net>
---
I also considered just making a copy of the string rather than this
in-place munging (technically we get it as a pointer-to-const; it's only
the use of strstr() that lets us quietly drop the const). But it doesn't
really make the code any cleaner; now instead of restoring the dot you
have to remember to free() the string in each code path.

 revision.c     | 3 +++
 t/t4202-log.sh | 9 +++++++++
 2 files changed, 12 insertions(+)

diff --git a/revision.c b/revision.c
index 8a8c1789c..014bf52e3 100644
--- a/revision.c
+++ b/revision.c
@@ -1477,12 +1477,14 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 			if (!cant_be_filename) {
 				*dotdot = '.';
 				verify_non_filename(revs->prefix, arg);
+				*dotdot = '\0';
 			}
 
 			a_obj = parse_object(from_sha1);
 			b_obj = parse_object(sha1);
 			if (!a_obj || !b_obj) {
 			missing:
+				*dotdot = '.';
 				if (revs->ignore_missing)
 					return 0;
 				die(symmetric
@@ -1525,6 +1527,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 					REV_CMD_RIGHT, flags);
 			add_pending_object(revs, a_obj, this);
 			add_pending_object(revs, b_obj, next);
+			*dotdot = '.';
 			return 0;
 		}
 		*dotdot = '.';
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index f57799071..6da1bbe91 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1380,4 +1380,13 @@ test_expect_success 'log --source paints tag names' '
 	test_cmp expect actual
 '
 
+test_expect_success 'log --source paints symmetric ranges' '
+	cat >expect <<-\EOF &&
+	09e12a9	source-b three
+	8e393e1	source-a two
+	EOF
+	git log --oneline --source source-a...source-b >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/15] handle_revision_arg: simplify commit reference lookups
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
  2017-05-19 12:48           ` [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently Jeff King
@ 2017-05-19 12:48           ` Jeff King
  2017-05-19 12:50           ` [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer Jeff King
                             ` (13 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

The "dotdot" range parser avoids calling
lookup_commit_reference() if we are directly fed two
commits. But its casts are unnecessarily complex; that
function will just return a commit we pass into it.

Just calling the function all the time is much simpler, and
doesn't do any significant extra work (the object is already
parsed, and deref_tag() on a non-tag is a noop; we do incur
one extra lookup_object() call, but that's fairly trivial).

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/revision.c b/revision.c
index 014bf52e3..b6031fb35 100644
--- a/revision.c
+++ b/revision.c
@@ -1500,12 +1500,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 				struct commit *a, *b;
 				struct commit_list *exclude;
 
-				a = (a_obj->type == OBJ_COMMIT
-				     ? (struct commit *)a_obj
-				     : lookup_commit_reference(a_obj->oid.hash));
-				b = (b_obj->type == OBJ_COMMIT
-				     ? (struct commit *)b_obj
-				     : lookup_commit_reference(b_obj->oid.hash));
+				a = lookup_commit_reference(a_obj->oid.hash);
+				b = lookup_commit_reference(b_obj->oid.hash);
 				if (!a || !b)
 					goto missing;
 				exclude = get_merge_bases(a, b);
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
  2017-05-19 12:48           ` [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently Jeff King
  2017-05-19 12:48           ` [PATCH 02/15] handle_revision_arg: simplify commit reference lookups Jeff King
@ 2017-05-19 12:50           ` Jeff King
  2017-05-24  2:45             ` Junio C Hamano
  2017-05-19 12:51           ` [PATCH 04/15] handle_revision_arg: hoist ".." check out of range parsing Jeff King
                             ` (12 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

The handle_revision_arg() function has a "dotdot" variable
that it uses to find a ".." or "..." in the argument. If we
don't find one, we look for other marks, like "^!". But we
just keep re-using the "dotdot" variable, which is
confusing.

Let's introduce a separate "mark" variable that can be used
for these other marks. They still reuse the same variable,
but at least the name is no longer actively misleading.

Signed-off-by: Jeff King <peff@peff.net>
---
It may make sense to pull each of these into its own helper. I didn't
really look because they're so small, and because the return semantics
seemed confusing to me. Some of them return, and some of them keep
parsing. Some of them restore the NUL they overwrite, and some do not.

I didn't dig in to see if there are weird corner cases where they
misbehave.

 revision.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/revision.c b/revision.c
index b6031fb35..9c683874d 100644
--- a/revision.c
+++ b/revision.c
@@ -1433,6 +1433,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 {
 	struct object_context oc;
 	char *dotdot;
+	char *mark;
 	struct object *object;
 	unsigned char sha1[20];
 	int local_flags;
@@ -1529,33 +1530,33 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		*dotdot = '.';
 	}
 
-	dotdot = strstr(arg, "^@");
-	if (dotdot && !dotdot[2]) {
-		*dotdot = 0;
+	mark = strstr(arg, "^@");
+	if (mark && !mark[2]) {
+		*mark = 0;
 		if (add_parents_only(revs, arg, flags, 0))
 			return 0;
-		*dotdot = '^';
+		*mark = '^';
 	}
-	dotdot = strstr(arg, "^!");
-	if (dotdot && !dotdot[2]) {
-		*dotdot = 0;
+	mark = strstr(arg, "^!");
+	if (mark && !mark[2]) {
+		*mark = 0;
 		if (!add_parents_only(revs, arg, flags ^ (UNINTERESTING | BOTTOM), 0))
-			*dotdot = '^';
+			*mark = '^';
 	}
-	dotdot = strstr(arg, "^-");
-	if (dotdot) {
+	mark = strstr(arg, "^-");
+	if (mark) {
 		int exclude_parent = 1;
 
-		if (dotdot[2]) {
+		if (mark[2]) {
 			char *end;
-			exclude_parent = strtoul(dotdot + 2, &end, 10);
+			exclude_parent = strtoul(mark + 2, &end, 10);
 			if (*end != '\0' || !exclude_parent)
 				return -1;
 		}
 
-		*dotdot = 0;
+		*mark = 0;
 		if (!add_parents_only(revs, arg, flags ^ (UNINTERESTING | BOTTOM), exclude_parent))
-			*dotdot = '^';
+			*mark = '^';
 	}
 
 	local_flags = 0;
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/15] handle_revision_arg: hoist ".." check out of range parsing
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (2 preceding siblings ...)
  2017-05-19 12:50           ` [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer Jeff King
@ 2017-05-19 12:51           ` Jeff King
  2017-05-19 12:52           ` [PATCH 05/15] handle_revision_arg: add handle_dotdot() helper Jeff King
                             ` (11 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

Since 003c84f6d (specifying ranges: we did not mean to make
".." an empty set, 2011-05-02), we treat the argument ".."
specially. We detect it by noticing that both sides of the
range are empty, and that this is a non-symmetric two-dot
range.

While correct, this makes the code overly complicated. We
can just detect ".." up front before we try to do further
parsing. This avoids having to de-munge the NUL from dotdot,
and lets us eliminate an extra const array (which we needed
only to do direct pointer comparisons).

It also removes the one code path from the range-parsing
conditional that requires us to return -1. That will make it
simpler to pull the dotdot parsing out into its own
function.

Signed-off-by: Jeff King <peff@peff.net>
---
There's a similar call in rev-parse which could maybe take the same
cleanup. I didn't bother with it because my main goal was removing an
obstacle to factoring out the dotdot parsing (though in an ideal world,
rev-parse would actually use the same dotdot parsing code; I've no idea
how complicated that would be).

 revision.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/revision.c b/revision.c
index 9c683874d..dec06ff8b 100644
--- a/revision.c
+++ b/revision.c
@@ -1443,6 +1443,14 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 
 	flags = flags & UNINTERESTING ? flags | BOTTOM : flags & ~BOTTOM;
 
+	if (!cant_be_filename && !strcmp(arg, "..")) {
+		/*
+		 * Just ".."?  That is not a range but the
+		 * pathspec for the parent directory.
+		 */
+		return -1;
+	}
+
 	dotdot = strstr(arg, "..");
 	if (dotdot) {
 		unsigned char from_sha1[20];
@@ -1450,27 +1458,15 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		const char *this = arg;
 		int symmetric = *next == '.';
 		unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
-		static const char head_by_default[] = "HEAD";
 		unsigned int a_flags;
 
 		*dotdot = 0;
 		next += symmetric;
 
 		if (!*next)
-			next = head_by_default;
+			next = "HEAD";
 		if (dotdot == arg)
-			this = head_by_default;
-		if (this == head_by_default && next == head_by_default &&
-		    !symmetric) {
-			/*
-			 * Just ".."?  That is not a range but the
-			 * pathspec for the parent directory.
-			 */
-			if (!cant_be_filename) {
-				*dotdot = '.';
-				return -1;
-			}
-		}
+			this = "HEAD";
 		if (!get_sha1_committish(this, from_sha1) &&
 		    !get_sha1_committish(next, sha1)) {
 			struct object *a_obj, *b_obj;
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/15] handle_revision_arg: add handle_dotdot() helper
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (3 preceding siblings ...)
  2017-05-19 12:51           ` [PATCH 04/15] handle_revision_arg: hoist ".." check out of range parsing Jeff King
@ 2017-05-19 12:52           ` Jeff King
  2017-05-24  2:30             ` Junio C Hamano
  2017-05-19 12:52           ` [PATCH 06/15] sha1_name: consistently refer to object_context as "oc" Jeff King
                             ` (10 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

The handle_revision_arg function is rather long, and a big
chunk of it is handling the range operators. Let's pull that
out to a separate helper. While we're doing so, we can clean
up a few of the rough edges that made the flow hard to
follow:

  - instead of manually restoring *dotdot (that we overwrote
    with a NUL), do the real work in a sub-helper, which
    makes it clear that the munge/restore lines are a
    matched pair

  - eliminate a goto which wasn't actually used for control
    flow, but only to avoid duplicating a few lines
    (instead, those lines are pushed into another helper
    function)

  - use early returns instead of deep nesting

  - consistently name all variables for the left-hand side
    of the range as "a" (rather than "this" or "from") and
    the right-hand side as "b" (rather than "next", or using
    the unadorned "sha1" or "flags" from the main function).

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c | 177 +++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 102 insertions(+), 75 deletions(-)

diff --git a/revision.c b/revision.c
index dec06ff8b..eb45501fd 100644
--- a/revision.c
+++ b/revision.c
@@ -1429,10 +1429,109 @@ static void prepare_show_merge(struct rev_info *revs)
 	revs->limited = 1;
 }
 
+static int dotdot_missing(const char *arg, char *dotdot,
+			  struct rev_info *revs, int symmetric)
+{
+	if (revs->ignore_missing)
+		return 0;
+	/* de-munge so we report the full argument */
+	*dotdot = '.';
+	die(symmetric
+	    ? "Invalid symmetric difference expression %s"
+	    : "Invalid revision range %s", arg);
+}
+
+static int handle_dotdot_1(const char *arg, char *dotdot,
+			   struct rev_info *revs, int flags,
+			   int cant_be_filename)
+{
+	const char *a_name, *b_name;
+	struct object_id a_oid, b_oid;
+	struct object *a_obj, *b_obj;
+	unsigned int a_flags, b_flags;
+	int symmetric = 0;
+	unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
+
+	a_name = arg;
+	if (!*a_name)
+		a_name = "HEAD";
+
+	b_name = dotdot + 2;
+	if (*b_name == '.') {
+		symmetric = 1;
+		b_name++;
+	}
+	if (!*b_name)
+		b_name = "HEAD";
+
+	if (get_sha1_committish(a_name, a_oid.hash) ||
+	    get_sha1_committish(b_name, b_oid.hash))
+		return -1;
+
+	if (!cant_be_filename) {
+		*dotdot = '.';
+		verify_non_filename(revs->prefix, arg);
+		*dotdot = '\0';
+	}
+
+	a_obj = parse_object(a_oid.hash);
+	b_obj = parse_object(b_oid.hash);
+	if (!a_obj || !b_obj)
+		return dotdot_missing(arg, dotdot, revs, symmetric);
+
+	if (!symmetric) {
+		/* just A..B */
+		b_flags = flags;
+		a_flags = flags_exclude;
+	} else {
+		/* A...B -- find merge bases between the two */
+		struct commit *a, *b;
+		struct commit_list *exclude;
+
+		a = lookup_commit_reference(a_obj->oid.hash);
+		b = lookup_commit_reference(b_obj->oid.hash);
+		if (!a || !b)
+			return dotdot_missing(arg, dotdot, revs, symmetric);
+
+		exclude = get_merge_bases(a, b);
+		add_rev_cmdline_list(revs, exclude, REV_CMD_MERGE_BASE,
+				     flags_exclude);
+		add_pending_commit_list(revs, exclude, flags_exclude);
+		free_commit_list(exclude);
+
+		b_flags = flags;
+		a_flags = flags | SYMMETRIC_LEFT;
+	}
+
+	a_obj->flags |= a_flags;
+	b_obj->flags |= b_flags;
+	add_rev_cmdline(revs, a_obj, a_name, REV_CMD_LEFT, a_flags);
+	add_rev_cmdline(revs, b_obj, b_name, REV_CMD_RIGHT, b_flags);
+	add_pending_object(revs, a_obj, a_name);
+	add_pending_object(revs, b_obj, b_name);
+	return 0;
+}
+
+static int handle_dotdot(const char *arg,
+			 struct rev_info *revs, int flags,
+			 int cant_be_filename)
+{
+	char *dotdot = strstr(arg, "..");
+	int ret;
+
+	if (!dotdot)
+		return -1;
+
+	*dotdot = '\0';
+	ret = handle_dotdot_1(arg, dotdot, revs, flags, cant_be_filename);
+	*dotdot = '.';
+
+	return ret;
+}
+
 int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsigned revarg_opt)
 {
 	struct object_context oc;
-	char *dotdot;
 	char *mark;
 	struct object *object;
 	unsigned char sha1[20];
@@ -1451,80 +1550,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		return -1;
 	}
 
-	dotdot = strstr(arg, "..");
-	if (dotdot) {
-		unsigned char from_sha1[20];
-		const char *next = dotdot + 2;
-		const char *this = arg;
-		int symmetric = *next == '.';
-		unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
-		unsigned int a_flags;
-
-		*dotdot = 0;
-		next += symmetric;
-
-		if (!*next)
-			next = "HEAD";
-		if (dotdot == arg)
-			this = "HEAD";
-		if (!get_sha1_committish(this, from_sha1) &&
-		    !get_sha1_committish(next, sha1)) {
-			struct object *a_obj, *b_obj;
-
-			if (!cant_be_filename) {
-				*dotdot = '.';
-				verify_non_filename(revs->prefix, arg);
-				*dotdot = '\0';
-			}
-
-			a_obj = parse_object(from_sha1);
-			b_obj = parse_object(sha1);
-			if (!a_obj || !b_obj) {
-			missing:
-				*dotdot = '.';
-				if (revs->ignore_missing)
-					return 0;
-				die(symmetric
-				    ? "Invalid symmetric difference expression %s"
-				    : "Invalid revision range %s", arg);
-			}
-
-			if (!symmetric) {
-				/* just A..B */
-				a_flags = flags_exclude;
-			} else {
-				/* A...B -- find merge bases between the two */
-				struct commit *a, *b;
-				struct commit_list *exclude;
-
-				a = lookup_commit_reference(a_obj->oid.hash);
-				b = lookup_commit_reference(b_obj->oid.hash);
-				if (!a || !b)
-					goto missing;
-				exclude = get_merge_bases(a, b);
-				add_rev_cmdline_list(revs, exclude,
-						     REV_CMD_MERGE_BASE,
-						     flags_exclude);
-				add_pending_commit_list(revs, exclude,
-							flags_exclude);
-				free_commit_list(exclude);
-
-				a_flags = flags | SYMMETRIC_LEFT;
-			}
-
-			a_obj->flags |= a_flags;
-			b_obj->flags |= flags;
-			add_rev_cmdline(revs, a_obj, this,
-					REV_CMD_LEFT, a_flags);
-			add_rev_cmdline(revs, b_obj, next,
-					REV_CMD_RIGHT, flags);
-			add_pending_object(revs, a_obj, this);
-			add_pending_object(revs, b_obj, next);
-			*dotdot = '.';
-			return 0;
-		}
-		*dotdot = '.';
-	}
+	if (!handle_dotdot(arg, revs, flags, revarg_opt))
+		return 0;
 
 	mark = strstr(arg, "^@");
 	if (mark && !mark[2]) {
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/15] sha1_name: consistently refer to object_context as "oc"
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (4 preceding siblings ...)
  2017-05-19 12:52           ` [PATCH 05/15] handle_revision_arg: add handle_dotdot() helper Jeff King
@ 2017-05-19 12:52           ` Jeff King
  2017-05-19 12:52           ` [PATCH 07/15] get_sha1_with_context: always initialize oc->symlink_path Jeff King
                             ` (9 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

An early version of the patch to add object_context used the
name object_resolve_context. This was later shortened to
just object_context, but the "orc" variable name stuck in a
few places.  Let's use "oc", which is used elsewhere in the
code.

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h     | 2 +-
 sha1_name.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 188811920..656341b8e 100644
--- a/cache.h
+++ b/cache.h
@@ -1363,7 +1363,7 @@ extern int get_sha1_tree(const char *str, unsigned char *sha1);
 extern int get_sha1_treeish(const char *str, unsigned char *sha1);
 extern int get_sha1_blob(const char *str, unsigned char *sha1);
 extern void maybe_die_on_misspelt_object_name(const char *name, const char *prefix);
-extern int get_sha1_with_context(const char *str, unsigned flags, unsigned char *sha1, struct object_context *orc);
+extern int get_sha1_with_context(const char *str, unsigned flags, unsigned char *sha1, struct object_context *oc);
 
 extern int get_oid(const char *str, struct object_id *oid);
 
diff --git a/sha1_name.c b/sha1_name.c
index 35c1e2a9e..a11d08dd8 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1638,9 +1638,9 @@ void maybe_die_on_misspelt_object_name(const char *name, const char *prefix)
 	get_sha1_with_context_1(name, GET_SHA1_ONLY_TO_DIE, prefix, sha1, &oc);
 }
 
-int get_sha1_with_context(const char *str, unsigned flags, unsigned char *sha1, struct object_context *orc)
+int get_sha1_with_context(const char *str, unsigned flags, unsigned char *sha1, struct object_context *oc)
 {
 	if (flags & GET_SHA1_FOLLOW_SYMLINKS && flags & GET_SHA1_ONLY_TO_DIE)
 		die("BUG: incompatible flags for get_sha1_with_context");
-	return get_sha1_with_context_1(str, flags, NULL, sha1, orc);
+	return get_sha1_with_context_1(str, flags, NULL, sha1, oc);
 }
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/15] get_sha1_with_context: always initialize oc->symlink_path
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (5 preceding siblings ...)
  2017-05-19 12:52           ` [PATCH 06/15] sha1_name: consistently refer to object_context as "oc" Jeff King
@ 2017-05-19 12:52           ` Jeff King
  2017-05-19 12:54           ` [PATCH 08/15] get_sha1_with_context: dynamically allocate oc->path Jeff King
                             ` (8 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

The get_sha1_with_context() function zeroes out the
oc->symlink_path strbuf, but doesn't use strbuf_init() to
set up the usual invariants (like pointing to the slopbuf).
We don't actually write to the oc->symlink_path strbuf
unless we call get_tree_entry_follow_symlinks(), and that
function does initialize it. However, readers may still look
at the zero'd strbuf.

In practice this isn't a triggerable bug. The only caller
that looks at it only does so when the mode we found is 0.
This doesn't happen for non-tree-entries (where we return
S_IFINVALID). A broken tree entry could have a mode of 0,
but canon_mode() quietly rewrites that into S_IFGITLINK.
So the "0" mode should only come up when we did indeed find
a symlink.

This is mostly just an accident of how the code happens to
work, though. Let's future-proof ourselves to make sure the
strbuf is properly initialized for all calls (it's only a
few struct member assignments, not a heap allocation).

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c | 1 +
 tree-walk.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_name.c b/sha1_name.c
index a11d08dd8..35b16efc6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1511,6 +1511,7 @@ static int get_sha1_with_context_1(const char *name,
 
 	memset(oc, 0, sizeof(*oc));
 	oc->mode = S_IFINVALID;
+	strbuf_init(&oc->symlink_path, 0);
 	ret = get_sha1_1(name, namelen, sha1, flags);
 	if (!ret)
 		return ret;
diff --git a/tree-walk.c b/tree-walk.c
index ff7760568..c7ecfc856 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -589,7 +589,6 @@ enum follow_symlinks_result get_tree_entry_follow_symlinks(unsigned char *tree_s
 	int i;
 
 	init_tree_desc(&t, NULL, 0UL);
-	strbuf_init(result_path, 0);
 	strbuf_addstr(&namebuf, name);
 	hashcpy(current_tree_sha1, tree_sha1);
 
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/15] get_sha1_with_context: dynamically allocate oc->path
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (6 preceding siblings ...)
  2017-05-19 12:52           ` [PATCH 07/15] get_sha1_with_context: always initialize oc->symlink_path Jeff King
@ 2017-05-19 12:54           ` Jeff King
  2017-05-19 12:54           ` [PATCH 09/15] t4063: add tests of direct blob diffs Jeff King
                             ` (7 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

When a sha1 lookup returns the tree path via "struct
object_context", it just copies it into a fixed-size buffer.
This means the result can be truncated, and it means our
"struct object_context" consumes a lot of stack space.

Instead, let's allocate a string on the heap. Because most
callers don't care about this information, we'll avoid doing
it by default (so they don't all have to start calling
free() on the result). There are basically two options for
the caller to signal to us that it's interested:

  1. By setting a pointer to storage in the object_context.

  2. By passing a flag in another parameter.

Doing (1) would match the way that sha1_object_info_extended()
works. But it would mean that every caller would have to
initialize the object_context, which they don't currently
have to do.

This patch does (2), and adds a new bit to the function's
flags field. All of the callers that look at the "path"
field are updated to pass the new flag.

Signed-off-by: Jeff King <peff@peff.net>
---
If there's a topic in flight that adds a new call without the flag, note
that it will stop filling the "path" field (which is what most calls
would want). And you won't get a compile error, because the pointer can
be used in largely the same way as the array.

I find it unlikely that there's a case that would care, but if we wanted
to be really paranoid, we could change the name of oc->path (at the cost
of making this diff much noisier).

 builtin/cat-file.c |  4 +++-
 builtin/grep.c     |  4 +++-
 builtin/log.c      | 10 +++++++---
 cache.h            |  8 +++++++-
 sha1_name.c        |  6 ++++--
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 1890d7a63..421709517 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -61,7 +61,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	if (unknown_type)
 		flags |= LOOKUP_UNKNOWN_OBJECT;
 
-	if (get_sha1_with_context(obj_name, 0, oid.hash, &obj_context))
+	if (get_sha1_with_context(obj_name, GET_SHA1_RECORD_PATH,
+				  oid.hash, &obj_context))
 		die("Not a valid object name %s", obj_name);
 
 	if (!path)
@@ -165,6 +166,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		die("git cat-file %s: bad file", obj_name);
 
 	write_or_die(1, buf, size);
+	free(obj_context.path);
 	return 0;
 }
 
diff --git a/builtin/grep.c b/builtin/grep.c
index 3ffb5b4e8..254c1c784 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1190,7 +1190,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			break;
 		}
 
-		if (get_sha1_with_context(arg, 0, oid.hash, &oc)) {
+		if (get_sha1_with_context(arg, GET_SHA1_RECORD_PATH,
+					  oid.hash, &oc)) {
 			if (seen_dashdash)
 				die(_("unable to resolve revision: %s"), arg);
 			break;
@@ -1200,6 +1201,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		if (!seen_dashdash)
 			verify_non_filename(prefix, arg);
 		add_object_array_with_path(object, arg, &list, oc.mode, oc.path);
+		free(oc.path);
 	}
 
 	/*
diff --git a/builtin/log.c b/builtin/log.c
index fd3d10ec2..a0d5233dc 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -483,16 +483,20 @@ static int show_blob_object(const struct object_id *oid, struct rev_info *rev, c
 	    !DIFF_OPT_TST(&rev->diffopt, ALLOW_TEXTCONV))
 		return stream_blob_to_fd(1, oid, NULL, 0);
 
-	if (get_sha1_with_context(obj_name, 0, oidc.hash, &obj_context))
+	if (get_sha1_with_context(obj_name, GET_SHA1_RECORD_PATH,
+				  oidc.hash, &obj_context))
 		die(_("Not a valid object name %s"), obj_name);
-	if (!obj_context.path[0] ||
-	    !textconv_object(obj_context.path, obj_context.mode, &oidc, 1, &buf, &size))
+	if (!obj_context.path ||
+	    !textconv_object(obj_context.path, obj_context.mode, &oidc, 1, &buf, &size)) {
+		free(obj_context.path);
 		return stream_blob_to_fd(1, oid, NULL, 0);
+	}
 
 	if (!buf)
 		die(_("git show %s: bad file"), obj_name);
 
 	write_or_die(1, buf, size);
+	free(obj_context.path);
 	return 0;
 }
 
diff --git a/cache.h b/cache.h
index 656341b8e..e219c45ed 100644
--- a/cache.h
+++ b/cache.h
@@ -1333,13 +1333,18 @@ static inline int hex2chr(const char *s)
 
 struct object_context {
 	unsigned char tree[20];
-	char path[PATH_MAX];
 	unsigned mode;
 	/*
 	 * symlink_path is only used by get_tree_entry_follow_symlinks,
 	 * and only for symlinks that point outside the repository.
 	 */
 	struct strbuf symlink_path;
+	/*
+	 * If GET_SHA1_RECORD_PATH is set, this will record path (if any)
+	 * found when resolving the name. The caller is responsible for
+	 * releasing the memory.
+	 */
+	char *path;
 };
 
 #define GET_SHA1_QUIETLY           01
@@ -1349,6 +1354,7 @@ struct object_context {
 #define GET_SHA1_TREEISH          020
 #define GET_SHA1_BLOB             040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
+#define GET_SHA1_RECORD_PATH     0200
 #define GET_SHA1_ONLY_TO_DIE    04000
 
 #define GET_SHA1_DISAMBIGUATORS \
diff --git a/sha1_name.c b/sha1_name.c
index 35b16efc6..70f096749 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1550,7 +1550,8 @@ static int get_sha1_with_context_1(const char *name,
 			namelen = strlen(cp);
 		}
 
-		strlcpy(oc->path, cp, sizeof(oc->path));
+		if (flags & GET_SHA1_RECORD_PATH)
+			oc->path = xstrdup(cp);
 
 		if (!active_cache)
 			read_cache();
@@ -1613,7 +1614,8 @@ static int get_sha1_with_context_1(const char *name,
 				}
 			}
 			hashcpy(oc->tree, tree_sha1);
-			strlcpy(oc->path, filename, sizeof(oc->path));
+			if (flags & GET_SHA1_RECORD_PATH)
+				oc->path = xstrdup(filename);
 
 			free(new_filename);
 			return ret;
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/15] t4063: add tests of direct blob diffs
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (7 preceding siblings ...)
  2017-05-19 12:54           ` [PATCH 08/15] get_sha1_with_context: dynamically allocate oc->path Jeff King
@ 2017-05-19 12:54           ` Jeff King
  2017-05-19 12:55           ` [PATCH 10/15] handle_revision_arg: record modes for "a..b" endpoints Jeff King
                             ` (6 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

The git-diff command can directly compare two blobs (or a
blob and a file), but we don't test this at all. Let's add
some basic tests that reveal a few problems.

There are basically four interesting inputs:

  1. sha1 against sha1 (where diff has no information beyond
     the contents)

  2. tree:path against tree:path (where it can get
     information via get_sha1_with_context)

  3. Same as (2), but using the ".." range syntax

  4. tree:path against a filename

And beyond generating a sane diff, we care about a few
little bits: which paths they show in the diff header, and
whether they correctly pick up a mode change.

They should all be able to show a mode except for (1),
though note that case (3) is currently broken.

For the headers, we would ideally show the path within the
tree if we have it, making:

  git diff a:path b:path

look the same as:

  git diff a b -- path

We can't always do that (e.g., in the direct sha1/sha1 diff,
we have no path to show), in which case we should fall back
to the name that resolved to the blob (which is nonsense
from the repository's perspective, but is the best we can
do).

Aside from the fallback case in (1), none of the cases get
this right. Cases (2) and (3) always show the full
tree:path, even though we should be able to know just the
"path" portion.

Case (4) picks up the filename path, but assigns it to
_both_ sides of the diff. So this works for:

  git diff tree:path path

but not for:

  git diff tree:other_path path

The appropriate tests are marked to expect failure.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t4063-diff-blobs.sh | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)
 create mode 100755 t/t4063-diff-blobs.sh

diff --git a/t/t4063-diff-blobs.sh b/t/t4063-diff-blobs.sh
new file mode 100755
index 000000000..90c6f6b85
--- /dev/null
+++ b/t/t4063-diff-blobs.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='test direct comparison of blobs via git-diff'
+. ./test-lib.sh
+
+run_diff () {
+	# use full-index to make it easy to match the index line
+	git diff --full-index "$@" >diff
+}
+
+check_index () {
+	grep "^index $1\\.\\.$2" diff
+}
+
+check_mode () {
+	grep "^old mode $1" diff &&
+	grep "^new mode $2" diff
+}
+
+check_paths () {
+	grep "^diff --git a/$1 b/$2" diff
+}
+
+test_expect_success 'create some blobs' '
+	echo one >one &&
+	echo two >two &&
+	chmod +x two &&
+	git add . &&
+
+	# cover systems where modes are ignored
+	git update-index --chmod=+x two &&
+
+	git commit -m base &&
+
+	sha1_one=$(git rev-parse HEAD:one) &&
+	sha1_two=$(git rev-parse HEAD:two)
+'
+
+test_expect_success 'diff by sha1' '
+	run_diff $sha1_one $sha1_two
+'
+test_expect_success 'index of sha1 diff' '
+	check_index $sha1_one $sha1_two
+'
+test_expect_success 'sha1 diff uses arguments as paths' '
+	check_paths $sha1_one $sha1_two
+'
+test_expect_success 'sha1 diff has no mode change' '
+	! grep mode diff
+'
+
+test_expect_success 'diff by tree:path (run)' '
+	run_diff HEAD:one HEAD:two
+'
+test_expect_success 'index of tree:path diff' '
+	check_index $sha1_one $sha1_two
+'
+test_expect_failure 'tree:path diff uses filenames as paths' '
+	check_paths one two
+'
+test_expect_success 'tree:path diff shows mode change' '
+	check_mode 100644 100755
+'
+
+test_expect_success 'diff by ranged tree:path' '
+	run_diff HEAD:one..HEAD:two
+'
+test_expect_success 'index of ranged tree:path diff' '
+	check_index $sha1_one $sha1_two
+'
+test_expect_failure 'ranged tree:path diff uses filenames as paths' '
+	check_paths one two
+'
+test_expect_failure 'ranged tree:path diff shows mode change' '
+	check_mode 100644 100755
+'
+
+test_expect_success 'diff blob against file' '
+	run_diff HEAD:one two
+'
+test_expect_success 'index of blob-file diff' '
+	check_index $sha1_one $sha1_two
+'
+test_expect_failure 'blob-file diff uses filename as paths' '
+	check_paths one two
+'
+test_expect_success FILEMODE 'blob-file diff shows mode change' '
+	check_mode 100644 100755
+'
+
+test_done
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/15] handle_revision_arg: record modes for "a..b" endpoints
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (8 preceding siblings ...)
  2017-05-19 12:54           ` [PATCH 09/15] t4063: add tests of direct blob diffs Jeff King
@ 2017-05-19 12:55           ` Jeff King
  2017-05-19 12:55           ` [PATCH 11/15] handle_revision_arg: record paths for pending objects Jeff King
                             ` (5 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

The "a..b" revision syntax was designed to handle commits,
so it doesn't bother to record any mode we find while
traversing a "tree:path" endpoint. These days "git diff" can
diff blobs using either "a:path..b:path" (with dots) or
"a:path b:path" (without), but the two behave
inconsistently, as the with-dots version fails to notice the
mode.

Let's teach the dot-dot range parser to record modes; it
doesn't cost us anything, and it makes this case work.

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c            | 10 ++++++----
 t/t4063-diff-blobs.sh |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/revision.c b/revision.c
index eb45501fd..96427e3c2 100644
--- a/revision.c
+++ b/revision.c
@@ -1448,9 +1448,11 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
 	const char *a_name, *b_name;
 	struct object_id a_oid, b_oid;
 	struct object *a_obj, *b_obj;
+	struct object_context a_oc, b_oc;
 	unsigned int a_flags, b_flags;
 	int symmetric = 0;
 	unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
+	unsigned int oc_flags = GET_SHA1_COMMITTISH;
 
 	a_name = arg;
 	if (!*a_name)
@@ -1464,8 +1466,8 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
 	if (!*b_name)
 		b_name = "HEAD";
 
-	if (get_sha1_committish(a_name, a_oid.hash) ||
-	    get_sha1_committish(b_name, b_oid.hash))
+	if (get_sha1_with_context(a_name, oc_flags, a_oid.hash, &a_oc) ||
+	    get_sha1_with_context(b_name, oc_flags, b_oid.hash, &b_oc))
 		return -1;
 
 	if (!cant_be_filename) {
@@ -1507,8 +1509,8 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
 	b_obj->flags |= b_flags;
 	add_rev_cmdline(revs, a_obj, a_name, REV_CMD_LEFT, a_flags);
 	add_rev_cmdline(revs, b_obj, b_name, REV_CMD_RIGHT, b_flags);
-	add_pending_object(revs, a_obj, a_name);
-	add_pending_object(revs, b_obj, b_name);
+	add_pending_object_with_mode(revs, a_obj, a_name, a_oc.mode);
+	add_pending_object_with_mode(revs, b_obj, b_name, b_oc.mode);
 	return 0;
 }
 
diff --git a/t/t4063-diff-blobs.sh b/t/t4063-diff-blobs.sh
index 90c6f6b85..df9c35b2d 100755
--- a/t/t4063-diff-blobs.sh
+++ b/t/t4063-diff-blobs.sh
@@ -71,7 +71,7 @@ test_expect_success 'index of ranged tree:path diff' '
 test_expect_failure 'ranged tree:path diff uses filenames as paths' '
 	check_paths one two
 '
-test_expect_failure 'ranged tree:path diff shows mode change' '
+test_expect_success 'ranged tree:path diff shows mode change' '
 	check_mode 100644 100755
 '
 
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/15] handle_revision_arg: record paths for pending objects
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (9 preceding siblings ...)
  2017-05-19 12:55           ` [PATCH 10/15] handle_revision_arg: record modes for "a..b" endpoints Jeff King
@ 2017-05-19 12:55           ` Jeff King
  2017-05-19 12:57           ` [PATCH 12/15] diff: pass whole pending entry in blobinfo Jeff King
                             ` (4 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

If the revision parser sees an argument like tree:path, we
parse it down to the correct blob (or tree), but throw away
the "path" portion. Let's ask get_sha1_with_context() to
record it, and pass it along in the pending array.

This will let programs like git-diff which rely on the
revision-parser show more accurate paths.

Note that the implementation is a little tricky; we have to
make sure we free oc.path in all code paths. For handle_dotdot(),
we can piggy-back on the existing cleanup-wrapper pattern.
The real work happens in handle_dotdot_1(), but the
handle_dotdot() wrapper makes sure that the path is freed no
matter how we exit the function (and for that reason we make
sure that the object_context struct is zero'd, so if we fail
to even get to the get_sha1_with_context() call, we just end
up calling free(NULL)).

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/revision.c b/revision.c
index 96427e3c2..a7b93dcc8 100644
--- a/revision.c
+++ b/revision.c
@@ -1443,16 +1443,17 @@ static int dotdot_missing(const char *arg, char *dotdot,
 
 static int handle_dotdot_1(const char *arg, char *dotdot,
 			   struct rev_info *revs, int flags,
-			   int cant_be_filename)
+			   int cant_be_filename,
+			   struct object_context *a_oc,
+			   struct object_context *b_oc)
 {
 	const char *a_name, *b_name;
 	struct object_id a_oid, b_oid;
 	struct object *a_obj, *b_obj;
-	struct object_context a_oc, b_oc;
 	unsigned int a_flags, b_flags;
 	int symmetric = 0;
 	unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
-	unsigned int oc_flags = GET_SHA1_COMMITTISH;
+	unsigned int oc_flags = GET_SHA1_COMMITTISH | GET_SHA1_RECORD_PATH;
 
 	a_name = arg;
 	if (!*a_name)
@@ -1466,8 +1467,8 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
 	if (!*b_name)
 		b_name = "HEAD";
 
-	if (get_sha1_with_context(a_name, oc_flags, a_oid.hash, &a_oc) ||
-	    get_sha1_with_context(b_name, oc_flags, b_oid.hash, &b_oc))
+	if (get_sha1_with_context(a_name, oc_flags, a_oid.hash, a_oc) ||
+	    get_sha1_with_context(b_name, oc_flags, b_oid.hash, b_oc))
 		return -1;
 
 	if (!cant_be_filename) {
@@ -1509,8 +1510,8 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
 	b_obj->flags |= b_flags;
 	add_rev_cmdline(revs, a_obj, a_name, REV_CMD_LEFT, a_flags);
 	add_rev_cmdline(revs, b_obj, b_name, REV_CMD_RIGHT, b_flags);
-	add_pending_object_with_mode(revs, a_obj, a_name, a_oc.mode);
-	add_pending_object_with_mode(revs, b_obj, b_name, b_oc.mode);
+	add_pending_object_with_path(revs, a_obj, a_name, a_oc->mode, a_oc->path);
+	add_pending_object_with_path(revs, b_obj, b_name, b_oc->mode, b_oc->path);
 	return 0;
 }
 
@@ -1518,16 +1519,24 @@ static int handle_dotdot(const char *arg,
 			 struct rev_info *revs, int flags,
 			 int cant_be_filename)
 {
+	struct object_context a_oc, b_oc;
 	char *dotdot = strstr(arg, "..");
 	int ret;
 
 	if (!dotdot)
 		return -1;
 
+	memset(&a_oc, 0, sizeof(a_oc));
+	memset(&b_oc, 0, sizeof(b_oc));
+
 	*dotdot = '\0';
-	ret = handle_dotdot_1(arg, dotdot, revs, flags, cant_be_filename);
+	ret = handle_dotdot_1(arg, dotdot, revs, flags, cant_be_filename,
+			      &a_oc, &b_oc);
 	*dotdot = '.';
 
+	free(a_oc.path);
+	free(b_oc.path);
+
 	return ret;
 }
 
@@ -1540,7 +1549,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 	int local_flags;
 	const char *arg = arg_;
 	int cant_be_filename = revarg_opt & REVARG_CANNOT_BE_FILENAME;
-	unsigned get_sha1_flags = 0;
+	unsigned get_sha1_flags = GET_SHA1_RECORD_PATH;
 
 	flags = flags & UNINTERESTING ? flags | BOTTOM : flags & ~BOTTOM;
 
@@ -1591,7 +1600,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 	}
 
 	if (revarg_opt & REVARG_COMMITTISH)
-		get_sha1_flags = GET_SHA1_COMMITTISH;
+		get_sha1_flags |= GET_SHA1_COMMITTISH;
 
 	if (get_sha1_with_context(arg, get_sha1_flags, sha1, &oc))
 		return revs->ignore_missing ? 0 : -1;
@@ -1599,7 +1608,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, sha1, flags ^ local_flags);
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
-	add_pending_object_with_mode(revs, object, arg, oc.mode);
+	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
+	free(oc.path);
 	return 0;
 }
 
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/15] diff: pass whole pending entry in blobinfo
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (10 preceding siblings ...)
  2017-05-19 12:55           ` [PATCH 11/15] handle_revision_arg: record paths for pending objects Jeff King
@ 2017-05-19 12:57           ` Jeff King
  2017-05-19 12:58           ` [PATCH 13/15] diff: use the word "path" instead of "name" for blobs Jeff King
                             ` (3 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

When diffing blobs directly, git-diff picks the blobs out of
the rev_info's pending array and copies the relevant bits to
a custom "struct blobinfo". But the pending array entry
already has all of this information (and more, which we'll
use in future patches). Let's just pass the original entry
instead.

In practice, these two blobs are probably adjacent in the
revs->pending array, and we could just pass the whole array.
But the current code is careful to pick each blob out
separately and put it into another array, so we'll continue
to do so and make our own array-of-pointers.

Signed-off-by: Jeff King <peff@peff.net>
---
I looked into just passing "blob_a" and "blob_b", instead of passing an
array and letting the called functions decide how many elements
should be in it. But I don't know if we would ever want to handle
arbitrary numbers of blobs (e.g., n-way combined diffs). It would be a
step backwards in that case, so I decided not to.

 builtin/diff.c | 38 +++++++++++++++-----------------------
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/builtin/diff.c b/builtin/diff.c
index d184aafab..8b276ae57 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -20,12 +20,6 @@
 #define DIFF_NO_INDEX_EXPLICIT 1
 #define DIFF_NO_INDEX_IMPLICIT 2
 
-struct blobinfo {
-	struct object_id oid;
-	const char *name;
-	unsigned mode;
-};
-
 static const char builtin_diff_usage[] =
 "git diff [<options>] [<commit> [<commit>]] [--] [<path>...]";
 
@@ -65,7 +59,7 @@ static void stuff_change(struct diff_options *opt,
 
 static int builtin_diff_b_f(struct rev_info *revs,
 			    int argc, const char **argv,
-			    struct blobinfo *blob)
+			    struct object_array_entry **blob)
 {
 	/* Blob vs file in the working tree*/
 	struct stat st;
@@ -84,12 +78,12 @@ static int builtin_diff_b_f(struct rev_info *revs,
 
 	diff_set_mnemonic_prefix(&revs->diffopt, "o/", "w/");
 
-	if (blob[0].mode == S_IFINVALID)
-		blob[0].mode = canon_mode(st.st_mode);
+	if (blob[0]->mode == S_IFINVALID)
+		blob[0]->mode = canon_mode(st.st_mode);
 
 	stuff_change(&revs->diffopt,
-		     blob[0].mode, canon_mode(st.st_mode),
-		     &blob[0].oid, &null_oid,
+		     blob[0]->mode, canon_mode(st.st_mode),
+		     &blob[0]->item->oid, &null_oid,
 		     1, 0,
 		     path, path);
 	diffcore_std(&revs->diffopt);
@@ -99,24 +93,24 @@ static int builtin_diff_b_f(struct rev_info *revs,
 
 static int builtin_diff_blobs(struct rev_info *revs,
 			      int argc, const char **argv,
-			      struct blobinfo *blob)
+			      struct object_array_entry **blob)
 {
 	unsigned mode = canon_mode(S_IFREG | 0644);
 
 	if (argc > 1)
 		usage(builtin_diff_usage);
 
-	if (blob[0].mode == S_IFINVALID)
-		blob[0].mode = mode;
+	if (blob[0]->mode == S_IFINVALID)
+		blob[0]->mode = mode;
 
-	if (blob[1].mode == S_IFINVALID)
-		blob[1].mode = mode;
+	if (blob[1]->mode == S_IFINVALID)
+		blob[1]->mode = mode;
 
 	stuff_change(&revs->diffopt,
-		     blob[0].mode, blob[1].mode,
-		     &blob[0].oid, &blob[1].oid,
+		     blob[0]->mode, blob[1]->mode,
+		     &blob[0]->item->oid, &blob[1]->item->oid,
 		     1, 1,
-		     blob[0].name, blob[1].name);
+		     blob[0]->name, blob[1]->name);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
 	return 0;
@@ -259,7 +253,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 	struct rev_info rev;
 	struct object_array ent = OBJECT_ARRAY_INIT;
 	int blobs = 0, paths = 0;
-	struct blobinfo blob[2];
+	struct object_array_entry *blob[2];
 	int nongit = 0, no_index = 0;
 	int result = 0;
 
@@ -408,9 +402,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 		} else if (obj->type == OBJ_BLOB) {
 			if (2 <= blobs)
 				die(_("more than two blobs given: '%s'"), name);
-			hashcpy(blob[blobs].oid.hash, obj->oid.hash);
-			blob[blobs].name = name;
-			blob[blobs].mode = entry->mode;
+			blob[blobs] = entry;
 			blobs++;
 
 		} else {
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/15] diff: use the word "path" instead of "name" for blobs
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (11 preceding siblings ...)
  2017-05-19 12:57           ` [PATCH 12/15] diff: pass whole pending entry in blobinfo Jeff King
@ 2017-05-19 12:58           ` Jeff King
  2017-05-19 12:59           ` [PATCH 14/15] diff: use pending "path" if it is available Jeff King
                             ` (2 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

The stuff_change() function makes diff_filespecs out of
blobs. The term we generally use for filespecs is "path",
not "name", so let's be consistent here.  That will make
things less confusing when the next patch starts caring
about the path/name distinction inside the pending object
array.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/diff.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/diff.c b/builtin/diff.c
index 8b276ae57..4c0811d6f 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -29,8 +29,8 @@ static void stuff_change(struct diff_options *opt,
 			 const struct object_id *new_oid,
 			 int old_oid_valid,
 			 int new_oid_valid,
-			 const char *old_name,
-			 const char *new_name)
+			 const char *old_path,
+			 const char *new_path)
 {
 	struct diff_filespec *one, *two;
 
@@ -41,16 +41,16 @@ static void stuff_change(struct diff_options *opt,
 	if (DIFF_OPT_TST(opt, REVERSE_DIFF)) {
 		SWAP(old_mode, new_mode);
 		SWAP(old_oid, new_oid);
-		SWAP(old_name, new_name);
+		SWAP(old_path, new_path);
 	}
 
 	if (opt->prefix &&
-	    (strncmp(old_name, opt->prefix, opt->prefix_length) ||
-	     strncmp(new_name, opt->prefix, opt->prefix_length)))
+	    (strncmp(old_path, opt->prefix, opt->prefix_length) ||
+	     strncmp(new_path, opt->prefix, opt->prefix_length)))
 		return;
 
-	one = alloc_filespec(old_name);
-	two = alloc_filespec(new_name);
+	one = alloc_filespec(old_path);
+	two = alloc_filespec(new_path);
 	fill_filespec(one, old_oid->hash, old_oid_valid, old_mode);
 	fill_filespec(two, new_oid->hash, new_oid_valid, new_mode);
 
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/15] diff: use pending "path" if it is available
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (12 preceding siblings ...)
  2017-05-19 12:58           ` [PATCH 13/15] diff: use the word "path" instead of "name" for blobs Jeff King
@ 2017-05-19 12:59           ` Jeff King
  2017-05-19 12:59           ` [PATCH 15/15] diff: use blob path for blob/file diffs Jeff King
  2017-05-24  2:44           ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Junio C Hamano
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

There's a subtle distinction between "name" and "path" for a
blob that we resolve: the name is what the user told us on
the command line, and the path is what we traversed when
finding the blob within a tree (if we did so).

When we diff blobs directly, we use "name", but "path" is
more likely to be useful to the user (it will find the
correct .gitattributes, and give them a saner diff header).

We still have to fall back to using the name for some cases
(i.e., any blob reference that isn't of the form tree:path).
That's the best we can do in such a case.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/diff.c        | 7 ++++++-
 t/t4063-diff-blobs.sh | 4 ++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/diff.c b/builtin/diff.c
index 4c0811d6f..1a1149eed 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -23,6 +23,11 @@
 static const char builtin_diff_usage[] =
 "git diff [<options>] [<commit> [<commit>]] [--] [<path>...]";
 
+static const char *blob_path(struct object_array_entry *entry)
+{
+	return entry->path ? entry->path : entry->name;
+}
+
 static void stuff_change(struct diff_options *opt,
 			 unsigned old_mode, unsigned new_mode,
 			 const struct object_id *old_oid,
@@ -110,7 +115,7 @@ static int builtin_diff_blobs(struct rev_info *revs,
 		     blob[0]->mode, blob[1]->mode,
 		     &blob[0]->item->oid, &blob[1]->item->oid,
 		     1, 1,
-		     blob[0]->name, blob[1]->name);
+		     blob_path(blob[0]), blob_path(blob[1]));
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
 	return 0;
diff --git a/t/t4063-diff-blobs.sh b/t/t4063-diff-blobs.sh
index df9c35b2d..80ce033ab 100755
--- a/t/t4063-diff-blobs.sh
+++ b/t/t4063-diff-blobs.sh
@@ -55,7 +55,7 @@ test_expect_success 'diff by tree:path (run)' '
 test_expect_success 'index of tree:path diff' '
 	check_index $sha1_one $sha1_two
 '
-test_expect_failure 'tree:path diff uses filenames as paths' '
+test_expect_success 'tree:path diff uses filenames as paths' '
 	check_paths one two
 '
 test_expect_success 'tree:path diff shows mode change' '
@@ -68,7 +68,7 @@ test_expect_success 'diff by ranged tree:path' '
 test_expect_success 'index of ranged tree:path diff' '
 	check_index $sha1_one $sha1_two
 '
-test_expect_failure 'ranged tree:path diff uses filenames as paths' '
+test_expect_success 'ranged tree:path diff uses filenames as paths' '
 	check_paths one two
 '
 test_expect_success 'ranged tree:path diff shows mode change' '
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 15/15] diff: use blob path for blob/file diffs
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (13 preceding siblings ...)
  2017-05-19 12:59           ` [PATCH 14/15] diff: use pending "path" if it is available Jeff King
@ 2017-05-19 12:59           ` Jeff King
  2017-05-24  2:44           ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Junio C Hamano
  15 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-19 12:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

When we diff a blob against a working tree file like:

  git diff HEAD:Makefile Makefile

we always use the working tree filename for both sides of
the diff. In most cases that's fine, as the two would be the
same anyway, as above. And until recently, we used the
"name" for the blob, not the path, which would have the
messy "HEAD:" on the beginning.

But when they don't match, like:

  git diff HEAD:old_path new_path

it makes sense to show both names.

This patch uses the blob's path field if it's available, and
otherwise falls back to using the filename (in preference to
the blob's name, which is likely to be garbage like a raw
sha1).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/diff.c        | 3 ++-
 t/t4063-diff-blobs.sh | 7 ++++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin/diff.c b/builtin/diff.c
index 1a1149eed..5e7c6428c 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -90,7 +90,8 @@ static int builtin_diff_b_f(struct rev_info *revs,
 		     blob[0]->mode, canon_mode(st.st_mode),
 		     &blob[0]->item->oid, &null_oid,
 		     1, 0,
-		     path, path);
+		     blob[0]->path ? blob[0]->path : path,
+		     path);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
 	return 0;
diff --git a/t/t4063-diff-blobs.sh b/t/t4063-diff-blobs.sh
index 80ce033ab..bc69e26c5 100755
--- a/t/t4063-diff-blobs.sh
+++ b/t/t4063-diff-blobs.sh
@@ -81,11 +81,16 @@ test_expect_success 'diff blob against file' '
 test_expect_success 'index of blob-file diff' '
 	check_index $sha1_one $sha1_two
 '
-test_expect_failure 'blob-file diff uses filename as paths' '
+test_expect_success 'blob-file diff uses filename as paths' '
 	check_paths one two
 '
 test_expect_success FILEMODE 'blob-file diff shows mode change' '
 	check_mode 100644 100755
 '
 
+test_expect_success 'blob-file diff prefers filename to sha1' '
+	run_diff $sha1_one two &&
+	check_paths two two
+'
+
 test_done
-- 
2.13.0.219.g63f6bc368

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently
  2017-05-19 12:48           ` [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently Jeff King
@ 2017-05-20 14:56             ` Philip Oakley
  2017-05-23 19:51               ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Philip Oakley @ 2017-05-20 14:56 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

From: "Jeff King" <peff@peff.net>
> When we are parsing a range like "a..b", we write a
> temporary NUL over the first ".", so that we can access the
> names "a" and "b" as C strings. But our restoration of the
> original "." is done at inconsistent times, which can lead
> to confusing results.
>
> For most calls, we restore the "." after we resolve the
> names, but before we call verify_non_filename().  This means
> that when we later call add_pending_object(), the name for
> the left-hand "a" has been re-expanded to "a..b". You can
> see this with:
>
>  git log --source a...b
>
> where "b" will be correctly marked with "b", but "a" will be
> marked with "a...b". Likewise with "a..b" (though you need
> to use --boundary to even see "a" at all in that case).
>
> To top off the confusion, when the REVARG_CANNOT_BE_FILENAME
> flag is set, we skip the non-filename check, and leave the
> NUL in place.
>
> That means we do report the correct name for "a" in the
> pending array. But some code paths try to show the whole
> "a..b" name in error messages, and these erroneously show
> only "a" instead of "a..b". E.g.:
>
>  $ git cherry-pick HEAD:foo..HEAD:foo

shouldn't this be three dots? Also the para above uses two dot examples in 
its description but the paras before that start by describing the three dot 
case.
--
Philip

>  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a 
> commit
>  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a 
> commit
>  fatal: Invalid symmetric difference expression HEAD:foo
>
> (That last message should be "HEAD:foo...HEAD:foo"; I used
> cherry-pick because it passes the CANNOT_BE_FILENAME flag).
>
> As an interesting side note, cherry-pick actually looks at
> and re-resolves the arguments from the pending->name fields.
> So it would have been visibly broken by the first bug, but
> the effect was canceled out by the second one.
>
> This patch makes the whole function consistent by re-writing
> the NUL immediately after calling verify_non_filename(), and
> then restoring the "." as appropriate in some error-printing
> and early-return code paths.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I also considered just making a copy of the string rather than this
> in-place munging (technically we get it as a pointer-to-const; it's only
> the use of strstr() that lets us quietly drop the const). But it doesn't
> really make the code any cleaner; now instead of restoring the dot you
> have to remember to free() the string in each code path.
>
> revision.c     | 3 +++
> t/t4202-log.sh | 9 +++++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/revision.c b/revision.c
> index 8a8c1789c..014bf52e3 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1477,12 +1477,14 @@ int handle_revision_arg(const char *arg_, struct 
> rev_info *revs, int flags, unsi
>  if (!cant_be_filename) {
>  *dotdot = '.';
>  verify_non_filename(revs->prefix, arg);
> + *dotdot = '\0';
>  }
>
>  a_obj = parse_object(from_sha1);
>  b_obj = parse_object(sha1);
>  if (!a_obj || !b_obj) {
>  missing:
> + *dotdot = '.';
>  if (revs->ignore_missing)
>  return 0;
>  die(symmetric
> @@ -1525,6 +1527,7 @@ int handle_revision_arg(const char *arg_, struct 
> rev_info *revs, int flags, unsi
>  REV_CMD_RIGHT, flags);
>  add_pending_object(revs, a_obj, this);
>  add_pending_object(revs, b_obj, next);
> + *dotdot = '.';
>  return 0;
>  }
>  *dotdot = '.';
> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index f57799071..6da1bbe91 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -1380,4 +1380,13 @@ test_expect_success 'log --source paints tag names' 
> '
>  test_cmp expect actual
> '
>
> +test_expect_success 'log --source paints symmetric ranges' '
> + cat >expect <<-\EOF &&
> + 09e12a9 source-b three
> + 8e393e1 source-a two
> + EOF
> + git log --oneline --source source-a...source-b >actual &&
> + test_cmp expect actual
> +'
> +
> test_done
> -- 
> 2.13.0.219.g63f6bc368
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently
  2017-05-20 14:56             ` Philip Oakley
@ 2017-05-23 19:51               ` Jeff King
  2017-05-23 23:47                 ` Philip Oakley
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2017-05-23 19:51 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Junio C Hamano, Johannes Schindelin, git,
	Nguyễn Thái Ngọc Duy

On Sat, May 20, 2017 at 03:56:32PM +0100, Philip Oakley wrote:

> > That means we do report the correct name for "a" in the
> > pending array. But some code paths try to show the whole
> > "a..b" name in error messages, and these erroneously show
> > only "a" instead of "a..b". E.g.:
> > 
> >  $ git cherry-pick HEAD:foo..HEAD:foo
> 
> shouldn't this be three dots? Also the para above uses two dot examples in
> its description but the paras before that start by describing the three dot
> case.

Yeah, it should be. The problem happens with the two-dot case, too (it's
the same code) but to provoke cherry-pick to actually show the error in
question, you need to use three-dots. There's probably a way to provoke
a broken error message with the two-dot case, but I didn't dig further
after finding this one.

Thanks for being a careful reader. Here's the patch with a modified
commit message. I don't think this series otherwise needs a re-roll so
far.

-- >8 --
Subject: [PATCH] handle_revision_arg: reset "dotdot" consistently

When we are parsing a range like "a..b", we write a
temporary NUL over the first ".", so that we can access the
names "a" and "b" as C strings. But our restoration of the
original "." is done at inconsistent times, which can lead
to confusing results.

For most calls, we restore the "." after we resolve the
names, but before we call verify_non_filename().  This means
that when we later call add_pending_object(), the name for
the left-hand "a" has been re-expanded to "a..b". You can
see this with:

  git log --source a...b

where "b" will be correctly marked with "b", but "a" will be
marked with "a...b". Likewise with "a..b" (though you need
to use --boundary to even see "a" at all in that case).

To top off the confusion, when the REVARG_CANNOT_BE_FILENAME
flag is set, we skip the non-filename check, and leave the
NUL in place.

That means we do report the correct name for "a" in the
pending array. But some code paths try to show the whole
"a...b" name in error messages, and these erroneously show
only "a" instead of "a...b". E.g.:

  $ git cherry-pick HEAD:foo...HEAD:foo
  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
  fatal: Invalid symmetric difference expression HEAD:foo

(That last message should be "HEAD:foo...HEAD:foo"; I used
cherry-pick because it passes the CANNOT_BE_FILENAME flag).

As an interesting side note, cherry-pick actually looks at
and re-resolves the arguments from the pending->name fields.
So it would have been visibly broken by the first bug, but
the effect was canceled out by the second one.

This patch makes the whole function consistent by re-writing
the NUL immediately after calling verify_non_filename(), and
then restoring the "." as appropriate in some error-printing
and early-return code paths.

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c     | 3 +++
 t/t4202-log.sh | 9 +++++++++
 2 files changed, 12 insertions(+)

diff --git a/revision.c b/revision.c
index 8a8c1789c..014bf52e3 100644
--- a/revision.c
+++ b/revision.c
@@ -1477,12 +1477,14 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 			if (!cant_be_filename) {
 				*dotdot = '.';
 				verify_non_filename(revs->prefix, arg);
+				*dotdot = '\0';
 			}
 
 			a_obj = parse_object(from_sha1);
 			b_obj = parse_object(sha1);
 			if (!a_obj || !b_obj) {
 			missing:
+				*dotdot = '.';
 				if (revs->ignore_missing)
 					return 0;
 				die(symmetric
@@ -1525,6 +1527,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 					REV_CMD_RIGHT, flags);
 			add_pending_object(revs, a_obj, this);
 			add_pending_object(revs, b_obj, next);
+			*dotdot = '.';
 			return 0;
 		}
 		*dotdot = '.';
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 1c7d6729c..76c511973 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1392,4 +1392,13 @@ test_expect_success 'log --source paints tag names' '
 	test_cmp expect actual
 '
 
+test_expect_success 'log --source paints symmetric ranges' '
+	cat >expect <<-\EOF &&
+	09e12a9	source-b three
+	8e393e1	source-a two
+	EOF
+	git log --oneline --source source-a...source-b >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.13.0.387.gec0afcebb.dirty


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently
  2017-05-23 19:51               ` Jeff King
@ 2017-05-23 23:47                 ` Philip Oakley
  0 siblings, 0 replies; 33+ messages in thread
From: Philip Oakley @ 2017-05-23 23:47 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Johannes Schindelin, git,
	Nguyễn Thái Ngọc Duy

From: "Jeff King" <peff@peff.net>
> On Sat, May 20, 2017 at 03:56:32PM +0100, Philip Oakley wrote:
>
>> > That means we do report the correct name for "a" in the
>> > pending array. But some code paths try to show the whole
>> > "a..b" name in error messages, and these erroneously show
>> > only "a" instead of "a..b". E.g.:
>> >
>> >  $ git cherry-pick HEAD:foo..HEAD:foo
>>
>> shouldn't this be three dots? Also the para above uses two dot examples 
>> in
>> its description but the paras before that start by describing the three 
>> dot
>> case.
>
> Yeah, it should be. The problem happens with the two-dot case, too (it's
> the same code) but to provoke cherry-pick to actually show the error in
> question, you need to use three-dots. There's probably a way to provoke
> a broken error message with the two-dot case, but I didn't dig further
> after finding this one.
>
> Thanks for being a careful reader. Here's the patch with a modified
> commit message. I don't think this series otherwise needs a re-roll so
> far.

Agreed, it's only a small point. It is difficult to cover both the two and 
three dot cases mid flow. Perhaps show the two options very early....
>
> -- >8 --
> Subject: [PATCH] handle_revision_arg: reset "dotdot" consistently
>
> When we are parsing a range like "a..b", we write a

Perhaps this early?   s/"a..b"/"a..b" or "a...b"/

> temporary NUL over the first ".", so that we can access the
> names "a" and "b" as C strings. But our restoration of the
> original "." is done at inconsistent times, which can lead
> to confusing results.
>
> For most calls, we restore the "." after we resolve the
> names, but before we call verify_non_filename().  This means
> that when we later call add_pending_object(), the name for
> the left-hand "a" has been re-expanded to "a..b". You can
> see this with:
>
This now looks 'correct' relative to the initial multi-dot options.

>  git log --source a...b
>
> where "b" will be correctly marked with "b", but "a" will be
> marked with "a...b". Likewise with "a..b" (though you need
> to use --boundary to even see "a" at all in that case).
>
> To top off the confusion, when the REVARG_CANNOT_BE_FILENAME
> flag is set, we skip the non-filename check, and leave the
> NUL in place.
>
> That means we do report the correct name for "a" in the
> pending array. But some code paths try to show the whole
> "a...b" name in error messages, and these erroneously show
> only "a" instead of "a...b". E.g.:
>
>  $ git cherry-pick HEAD:foo...HEAD:foo
>  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a 
> commit
>  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a 
> commit
>  fatal: Invalid symmetric difference expression HEAD:foo
>
> (That last message should be "HEAD:foo...HEAD:foo"; I used
> cherry-pick because it passes the CANNOT_BE_FILENAME flag).
>
> As an interesting side note, cherry-pick actually looks at
> and re-resolves the arguments from the pending->name fields.
> So it would have been visibly broken by the first bug, but
> the effect was canceled out by the second one.
>
> This patch makes the whole function consistent by re-writing
> the NUL immediately after calling verify_non_filename(), and
> then restoring the "." as appropriate in some error-printing
> and early-return code paths.
>
> Signed-off-by: Jeff King <peff@peff.net>
--
Philip


> ---
> revision.c     | 3 +++
> t/t4202-log.sh | 9 +++++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/revision.c b/revision.c
> index 8a8c1789c..014bf52e3 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1477,12 +1477,14 @@ int handle_revision_arg(const char *arg_, struct 
> rev_info *revs, int flags, unsi
>  if (!cant_be_filename) {
>  *dotdot = '.';
>  verify_non_filename(revs->prefix, arg);
> + *dotdot = '\0';
>  }
>
>  a_obj = parse_object(from_sha1);
>  b_obj = parse_object(sha1);
>  if (!a_obj || !b_obj) {
>  missing:
> + *dotdot = '.';
>  if (revs->ignore_missing)
>  return 0;
>  die(symmetric
> @@ -1525,6 +1527,7 @@ int handle_revision_arg(const char *arg_, struct 
> rev_info *revs, int flags, unsi
>  REV_CMD_RIGHT, flags);
>  add_pending_object(revs, a_obj, this);
>  add_pending_object(revs, b_obj, next);
> + *dotdot = '.';
>  return 0;
>  }
>  *dotdot = '.';
> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 1c7d6729c..76c511973 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -1392,4 +1392,13 @@ test_expect_success 'log --source paints tag names' 
> '
>  test_cmp expect actual
> '
>
> +test_expect_success 'log --source paints symmetric ranges' '
> + cat >expect <<-\EOF &&
> + 09e12a9 source-b three
> + 8e393e1 source-a two
> + EOF
> + git log --oneline --source source-a...source-b >actual &&
> + test_cmp expect actual
> +'
> +
> test_done
> -- 
> 2.13.0.387.gec0afcebb.dirty
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/15] handle_revision_arg: add handle_dotdot() helper
  2017-05-19 12:52           ` [PATCH 05/15] handle_revision_arg: add handle_dotdot() helper Jeff King
@ 2017-05-24  2:30             ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2017-05-24  2:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

> The handle_revision_arg function is rather long, and a big
> chunk of it is handling the range operators. Let's pull that
> out to a separate helper. While we're doing so, we can clean
> up a few of the rough edges that made the flow hard to
> follow:
>
>   - instead of manually restoring *dotdot (that we overwrote
>     with a NUL), do the real work in a sub-helper, which
>     makes it clear that the munge/restore lines are a
>     matched pair
>
>   - eliminate a goto which wasn't actually used for control
>     flow, but only to avoid duplicating a few lines
>     (instead, those lines are pushed into another helper
>     function)
>
>   - use early returns instead of deep nesting
>
>   - consistently name all variables for the left-hand side
>     of the range as "a" (rather than "this" or "from") and
>     the right-hand side as "b" (rather than "next", or using
>     the unadorned "sha1" or "flags" from the main function).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  revision.c | 177 +++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 102 insertions(+), 75 deletions(-)

Nicely done.  Together with [PATCH 04/15] the resulting code wrt ".."
is much easier to follow.

> diff --git a/revision.c b/revision.c
> index dec06ff8b..eb45501fd 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1429,10 +1429,109 @@ static void prepare_show_merge(struct rev_info *revs)
>  	revs->limited = 1;
>  }
>  
> +static int dotdot_missing(const char *arg, char *dotdot,
> +			  struct rev_info *revs, int symmetric)
> +{
> +	if (revs->ignore_missing)
> +		return 0;
> +	/* de-munge so we report the full argument */
> +	*dotdot = '.';
> +	die(symmetric
> +	    ? "Invalid symmetric difference expression %s"
> +	    : "Invalid revision range %s", arg);
> +}
> +
> +static int handle_dotdot_1(const char *arg, char *dotdot,
> +			   struct rev_info *revs, int flags,
> +			   int cant_be_filename)
> +{
> +	const char *a_name, *b_name;
> +	struct object_id a_oid, b_oid;
> +	struct object *a_obj, *b_obj;
> +	unsigned int a_flags, b_flags;
> +	int symmetric = 0;
> +	unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
> +
> +	a_name = arg;
> +	if (!*a_name)
> +		a_name = "HEAD";
> +
> +	b_name = dotdot + 2;
> +	if (*b_name == '.') {
> +		symmetric = 1;
> +		b_name++;
> +	}
> +	if (!*b_name)
> +		b_name = "HEAD";
> +
> +	if (get_sha1_committish(a_name, a_oid.hash) ||
> +	    get_sha1_committish(b_name, b_oid.hash))
> +		return -1;
> +
> +	if (!cant_be_filename) {
> +		*dotdot = '.';
> +		verify_non_filename(revs->prefix, arg);
> +		*dotdot = '\0';
> +	}
> +
> +	a_obj = parse_object(a_oid.hash);
> +	b_obj = parse_object(b_oid.hash);
> +	if (!a_obj || !b_obj)
> +		return dotdot_missing(arg, dotdot, revs, symmetric);
> +
> +	if (!symmetric) {
> +		/* just A..B */
> +		b_flags = flags;
> +		a_flags = flags_exclude;
> +	} else {
> +		/* A...B -- find merge bases between the two */
> +		struct commit *a, *b;
> +		struct commit_list *exclude;
> +
> +		a = lookup_commit_reference(a_obj->oid.hash);
> +		b = lookup_commit_reference(b_obj->oid.hash);
> +		if (!a || !b)
> +			return dotdot_missing(arg, dotdot, revs, symmetric);
> +
> +		exclude = get_merge_bases(a, b);
> +		add_rev_cmdline_list(revs, exclude, REV_CMD_MERGE_BASE,
> +				     flags_exclude);
> +		add_pending_commit_list(revs, exclude, flags_exclude);
> +		free_commit_list(exclude);
> +
> +		b_flags = flags;
> +		a_flags = flags | SYMMETRIC_LEFT;
> +	}
> +
> +	a_obj->flags |= a_flags;
> +	b_obj->flags |= b_flags;
> +	add_rev_cmdline(revs, a_obj, a_name, REV_CMD_LEFT, a_flags);
> +	add_rev_cmdline(revs, b_obj, b_name, REV_CMD_RIGHT, b_flags);
> +	add_pending_object(revs, a_obj, a_name);
> +	add_pending_object(revs, b_obj, b_name);
> +	return 0;
> +}
> +
> +static int handle_dotdot(const char *arg,
> +			 struct rev_info *revs, int flags,
> +			 int cant_be_filename)
> +{
> +	char *dotdot = strstr(arg, "..");
> +	int ret;
> +
> +	if (!dotdot)
> +		return -1;
> +
> +	*dotdot = '\0';
> +	ret = handle_dotdot_1(arg, dotdot, revs, flags, cant_be_filename);
> +	*dotdot = '.';
> +
> +	return ret;
> +}
> +
>  int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsigned revarg_opt)
>  {
>  	struct object_context oc;
> -	char *dotdot;
>  	char *mark;
>  	struct object *object;
>  	unsigned char sha1[20];
> @@ -1451,80 +1550,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
>  		return -1;
>  	}
>  
> -	dotdot = strstr(arg, "..");
> -	if (dotdot) {
> -		unsigned char from_sha1[20];
> -		const char *next = dotdot + 2;
> -		const char *this = arg;
> -		int symmetric = *next == '.';
> -		unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
> -		unsigned int a_flags;
> -
> -		*dotdot = 0;
> -		next += symmetric;
> -
> -		if (!*next)
> -			next = "HEAD";
> -		if (dotdot == arg)
> -			this = "HEAD";
> -		if (!get_sha1_committish(this, from_sha1) &&
> -		    !get_sha1_committish(next, sha1)) {
> -			struct object *a_obj, *b_obj;
> -
> -			if (!cant_be_filename) {
> -				*dotdot = '.';
> -				verify_non_filename(revs->prefix, arg);
> -				*dotdot = '\0';
> -			}
> -
> -			a_obj = parse_object(from_sha1);
> -			b_obj = parse_object(sha1);
> -			if (!a_obj || !b_obj) {
> -			missing:
> -				*dotdot = '.';
> -				if (revs->ignore_missing)
> -					return 0;
> -				die(symmetric
> -				    ? "Invalid symmetric difference expression %s"
> -				    : "Invalid revision range %s", arg);
> -			}
> -
> -			if (!symmetric) {
> -				/* just A..B */
> -				a_flags = flags_exclude;
> -			} else {
> -				/* A...B -- find merge bases between the two */
> -				struct commit *a, *b;
> -				struct commit_list *exclude;
> -
> -				a = lookup_commit_reference(a_obj->oid.hash);
> -				b = lookup_commit_reference(b_obj->oid.hash);
> -				if (!a || !b)
> -					goto missing;
> -				exclude = get_merge_bases(a, b);
> -				add_rev_cmdline_list(revs, exclude,
> -						     REV_CMD_MERGE_BASE,
> -						     flags_exclude);
> -				add_pending_commit_list(revs, exclude,
> -							flags_exclude);
> -				free_commit_list(exclude);
> -
> -				a_flags = flags | SYMMETRIC_LEFT;
> -			}
> -
> -			a_obj->flags |= a_flags;
> -			b_obj->flags |= flags;
> -			add_rev_cmdline(revs, a_obj, this,
> -					REV_CMD_LEFT, a_flags);
> -			add_rev_cmdline(revs, b_obj, next,
> -					REV_CMD_RIGHT, flags);
> -			add_pending_object(revs, a_obj, this);
> -			add_pending_object(revs, b_obj, next);
> -			*dotdot = '.';
> -			return 0;
> -		}
> -		*dotdot = '.';
> -	}
> +	if (!handle_dotdot(arg, revs, flags, revarg_opt))
> +		return 0;
>  
>  	mark = strstr(arg, "^@");
>  	if (mark && !mark[2]) {

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar
  2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
                             ` (14 preceding siblings ...)
  2017-05-19 12:59           ` [PATCH 15/15] diff: use blob path for blob/file diffs Jeff King
@ 2017-05-24  2:44           ` Junio C Hamano
  2017-05-24  9:57             ` Jeff King
  15 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2017-05-24  2:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

Patches around [PATCH 06-08/15] made some unexpected (at least to
me) turns but the series told a coherent story, building on top of
what has been achieved in the previous steps.

Thanks for a pleasant read.  

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer
  2017-05-19 12:50           ` [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer Jeff King
@ 2017-05-24  2:45             ` Junio C Hamano
  2017-05-24  9:55               ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2017-05-24  2:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

> The handle_revision_arg() function has a "dotdot" variable
> that it uses to find a ".." or "..." in the argument. If we
> don't find one, we look for other marks, like "^!". But we
> just keep re-using the "dotdot" variable, which is
> confusing.
>
> Let's introduce a separate "mark" variable that can be used
> for these other marks. They still reuse the same variable,
> but at least the name is no longer actively misleading.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> It may make sense to pull each of these into its own helper. I didn't
> really look because they're so small, and because the return semantics
> seemed confusing to me. Some of them return, and some of them keep
> parsing. Some of them restore the NUL they overwrite, and some do not.
>
> I didn't dig in to see if there are weird corner cases where they
> misbehave.

I do not quite know what corner cases you meant, but I agree that
many places in the codepath we forget to restore "^" we temporarily
overwrite.  I suspect that none of them is deliberately leaving "^"
unrestored and they are just being careless (or they truly do not
care because they assume nobody will look at arg later).

And I think not restoring cannot be a correct thing to do.  After
all of these parsing, add_rev_cmdline() wants to see arg_ intact.

But let's keep reading; perhaps they are addressed in a later patch,
or they are left as-is, and either is OK.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer
  2017-05-24  2:45             ` Junio C Hamano
@ 2017-05-24  9:55               ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-24  9:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

On Wed, May 24, 2017 at 11:45:39AM +0900, Junio C Hamano wrote:

> > It may make sense to pull each of these into its own helper. I didn't
> > really look because they're so small, and because the return semantics
> > seemed confusing to me. Some of them return, and some of them keep
> > parsing. Some of them restore the NUL they overwrite, and some do not.
> >
> > I didn't dig in to see if there are weird corner cases where they
> > misbehave.
> 
> I do not quite know what corner cases you meant, but I agree that
> many places in the codepath we forget to restore "^" we temporarily
> overwrite.  I suspect that none of them is deliberately leaving "^"
> unrestored and they are just being careless (or they truly do not
> care because they assume nobody will look at arg later).
> 
> And I think not restoring cannot be a correct thing to do.  After
> all of these parsing, add_rev_cmdline() wants to see arg_ intact.
> 
> But let's keep reading; perhaps they are addressed in a later patch,
> or they are left as-is, and either is OK.

I don't really know what corner cases I meant, either. :) I just saw
that the code looked funny, but nobody noticed for the common cases, so
I presumed any misbehavior would be from uncommon ones.

As far as "maybe not restoring is intentional", I wondered if there are
cases where we might allow multiple marks. E.g., if we wanted to allow
"foo^@^!", then we might need to progressively pull items off the end,
shortening the string. But I don't think that can be correct:

  - these marks generally don't make sense to combine in the first place

  - even if we allowed combinations, since we make only a single pass
    through the function, we'd require the combinations to come in a
    particular order

  - even if it were intentional to do so, we'd still be adding weird
    stuff to add_rev_cmdline(), as you noted

But I had some other questions, too, about what's supposed to be in the
"name" field of "pending" in some of these cases. For instance, try:

  git tag foo 6a0bc7cf0efbefa5a949d958947e68e29534f04d
  git log --oneline --source foo^-

All of the commits are marked as coming from "foo". That's because of
two things:

  1. When we parse "^-", we turn the first character to NUL. So our call
     to add_parents_only() sees just "foo", with no tail.

  2. We never restore the "^". So later when we add "foo" itself, the
     arg name is still "foo".

So arguably that's correct (these all came from "foo", which is a
resolvable name, and I think that's what the name field of
add_pending_object is going for). But that means add_rev_cmdline() sees
the same munged string, which is probably wrong. We can't possibly get
that case right with munging since the add_rev_cmdline() and
add_pending_object() calls come in pairs. We'd have to actually copy the
pending name into a separate string instead.

So like I said, I was sufficiently confused about what was supposed to
happen that I didn't try fixing it.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar
  2017-05-24  2:44           ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Junio C Hamano
@ 2017-05-24  9:57             ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2017-05-24  9:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Nguyễn Thái Ngọc Duy

On Wed, May 24, 2017 at 11:44:46AM +0900, Junio C Hamano wrote:

> Patches around [PATCH 06-08/15] made some unexpected (at least to
> me) turns but the series told a coherent story, building on top of
> what has been achieved in the previous steps.

Yeah, those patches are not technically required for the rest of it. The
"strbuf" conversion for oc->path is just something I've wanted to fix
for a long time. It made sense to me to do it before teaching
handle_dotdot() to look at oc->path (because the interface to do so
changed a bit).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-05-24  9:57 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 15:23 [PATCH 0/2] Demonstrate and partially work around a gitattributes problem Johannes Schindelin
2017-05-15 15:23 ` [PATCH 1/2] gitattributes: demonstrate that Git tries to read a bogus file Johannes Schindelin
2017-05-15 15:24 ` [PATCH 2/2] mingw: Suppress warning that <commit>:.gitattributes does not exist Johannes Schindelin
2017-05-16  7:54 ` [PATCH 0/2] Demonstrate and partially work around a gitattributes problem Jeff King
2017-05-16  8:10   ` Jeff King
2017-05-17  1:38     ` Junio C Hamano
2017-05-17  2:05       ` Jeff King
2017-05-18 19:23         ` Johannes Schindelin
2017-05-19  0:00           ` Jeff King
2017-05-19 12:46         ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Jeff King
2017-05-19 12:48           ` [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently Jeff King
2017-05-20 14:56             ` Philip Oakley
2017-05-23 19:51               ` Jeff King
2017-05-23 23:47                 ` Philip Oakley
2017-05-19 12:48           ` [PATCH 02/15] handle_revision_arg: simplify commit reference lookups Jeff King
2017-05-19 12:50           ` [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer Jeff King
2017-05-24  2:45             ` Junio C Hamano
2017-05-24  9:55               ` Jeff King
2017-05-19 12:51           ` [PATCH 04/15] handle_revision_arg: hoist ".." check out of range parsing Jeff King
2017-05-19 12:52           ` [PATCH 05/15] handle_revision_arg: add handle_dotdot() helper Jeff King
2017-05-24  2:30             ` Junio C Hamano
2017-05-19 12:52           ` [PATCH 06/15] sha1_name: consistently refer to object_context as "oc" Jeff King
2017-05-19 12:52           ` [PATCH 07/15] get_sha1_with_context: always initialize oc->symlink_path Jeff King
2017-05-19 12:54           ` [PATCH 08/15] get_sha1_with_context: dynamically allocate oc->path Jeff King
2017-05-19 12:54           ` [PATCH 09/15] t4063: add tests of direct blob diffs Jeff King
2017-05-19 12:55           ` [PATCH 10/15] handle_revision_arg: record modes for "a..b" endpoints Jeff King
2017-05-19 12:55           ` [PATCH 11/15] handle_revision_arg: record paths for pending objects Jeff King
2017-05-19 12:57           ` [PATCH 12/15] diff: pass whole pending entry in blobinfo Jeff King
2017-05-19 12:58           ` [PATCH 13/15] diff: use the word "path" instead of "name" for blobs Jeff King
2017-05-19 12:59           ` [PATCH 14/15] diff: use pending "path" if it is available Jeff King
2017-05-19 12:59           ` [PATCH 15/15] diff: use blob path for blob/file diffs Jeff King
2017-05-24  2:44           ` [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar Junio C Hamano
2017-05-24  9:57             ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).