git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
@ 2012-10-09  9:03 Johannes Sixt
  2012-10-09  9:38 ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-09  9:03 UTC (permalink / raw)
  To: Git Mailing List

Running 'git grep needle origin/master' on Windows gives numerous warnings
of the kind

warning: unable to access 'origin/master:Documentation/.gitattributes':
Invalid argument

It is worrysome that it is attempted to access a file whose name is
prefixed by a revision.

-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-09  9:03 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree Johannes Sixt
@ 2012-10-09  9:38 ` Nguyen Thai Ngoc Duy
  2012-10-09 12:01   ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-09  9:38 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List

On Tue, Oct 9, 2012 at 4:03 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Running 'git grep needle origin/master' on Windows gives numerous warnings
> of the kind
>
> warning: unable to access 'origin/master:Documentation/.gitattributes':
> Invalid argument

strace confirms it. Stack trace

#0  read_attr_from_file (path=0x820e818
"HEAD:Documentation/.gitattributes", macro_ok=0) at attr.c:351
#1  0x080d378d in read_attr (path=0x820e818
"HEAD:Documentation/.gitattributes", macro_ok=0) at attr.c:436
#2  0x080d3bf1 in prepare_attr_stack (path=0x820e7f0
"HEAD:Documentation/.gitattributes") at attr.c:630
#3  0x080d3f68 in collect_all_attrs (path=0x820e7f0
"HEAD:Documentation/.gitattributes") at attr.c:747
#4  0x080d3ffd in git_check_attr (path=0x820e7f0
"HEAD:Documentation/.gitattributes", num=1, check=0xbfffd848) at
attr.c:761
#5  0x0815e736 in userdiff_find_by_path (path=0x820e7f0
"HEAD:Documentation/.gitattributes") at userdiff.c:278
#6  0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504
#7  0x08105907 in grep_source_is_binary (gs=0xbfffd978) at grep.c:1512
#8  0x08104eaa in grep_source_1 (opt=0xbfffe304, gs=0xbfffd978,
collect_hits=0) at grep.c:1180

Never touched userdiff.c before. Anybody?
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-09  9:38 ` Nguyen Thai Ngoc Duy
@ 2012-10-09 12:01   ` Nguyen Thai Ngoc Duy
  2012-10-09 12:41     ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-09 12:01 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List

On Tue, Oct 09, 2012 at 04:38:32PM +0700, Nguyen Thai Ngoc Duy wrote:
> #5  0x0815e736 in userdiff_find_by_path (path=0x820e7f0
> "HEAD:Documentation/.gitattributes") at userdiff.c:278
> #6  0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504

A bandage patch may look like this. But it does not solve the real
problem:

 - we should be able to look up in-tree .gitattributes, not just
   ignore like this patch does

 - gs->name seems to be prepared for human consumption, not for
   accessing files. grep_file() with opt->relative is true can put
   quotes in gs->name, for example.

I feel like we should make this function a callback and let git-grep
deal with driver loading itself.

-- 8< --
diff --git a/grep.c b/grep.c
index edc7776..c5ed067 100644
--- a/grep.c
+++ b/grep.c
@@ -1501,7 +1501,8 @@ void grep_source_load_driver(struct grep_source *gs)
 		return;
 
 	grep_attr_lock();
-	gs->driver = userdiff_find_by_path(gs->name);
+	if (gs->type == GREP_SOURCE_FILE)
+		gs->driver = userdiff_find_by_path(gs->name);
 	if (!gs->driver)
 		gs->driver = userdiff_find_by_name("default");
 	grep_attr_unlock();
-- 8< --

-- 
Duy

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-09 12:01   ` Nguyen Thai Ngoc Duy
@ 2012-10-09 12:41     ` Jeff King
  2012-10-09 18:59       ` Junio C Hamano
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2012-10-09 12:41 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Johannes Sixt, Git Mailing List

On Tue, Oct 09, 2012 at 07:01:44PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Tue, Oct 09, 2012 at 04:38:32PM +0700, Nguyen Thai Ngoc Duy wrote:
> > #5  0x0815e736 in userdiff_find_by_path (path=0x820e7f0
> > "HEAD:Documentation/.gitattributes") at userdiff.c:278
> > #6  0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504
> 
> A bandage patch may look like this. But it does not solve the real
> problem:
> 
>  - we should be able to look up in-tree .gitattributes, not just
>    ignore like this patch does
> 
>  - gs->name seems to be prepared for human consumption, not for
>    accessing files. grep_file() with opt->relative is true can put
>    quotes in gs->name, for example.

Right. For the second, you would probably want gs->identifier in the
case of GREP_SOURCE_FILE. But that identifier information is not
available at all for GREP_SOURCE_SHA1, which is what is breaking the
first point.

> I feel like we should make this function a callback and let git-grep
> deal with driver loading itself.

I think we just need to have callers of grep_source_init provide us with
the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
where the information is lost.

Like this incomplete and untested patch, which should fix the quoting
problem for GREP_SOURCE_FILE, but leave the sha1 bits broken (see the
in-code comment). I'm traveling this week, so I doubt I'll have time to
look for a few more days. If you want to work on it, please do.

diff --git a/builtin/grep.c b/builtin/grep.c
index 82530a6..be602dd 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -86,7 +86,7 @@ static void add_work(struct grep_opt *opt, enum grep_source_type type,
 static int skip_first_line;
 
 static void add_work(struct grep_opt *opt, enum grep_source_type type,
-		     const char *name, const void *id)
+		     const char *name, const char *path, const void *id)
 {
 	grep_lock();
 
@@ -94,7 +94,7 @@ static void add_work(struct grep_opt *opt, enum grep_source_type type,
 		pthread_cond_wait(&cond_write, &grep_mutex);
 	}
 
-	grep_source_init(&todo[todo_end].source, type, name, id);
+	grep_source_init(&todo[todo_end].source, type, name, path, id);
 	if (opt->binary != GREP_BINARY_TEXT)
 		grep_source_load_driver(&todo[todo_end].source);
 	todo[todo_end].done = 0;
@@ -378,14 +378,21 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, -1, &pathbuf,
 				    opt->prefix);
+		/* XXX Why do we insert here instead of just putting it in
+		 * first? */
 		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
 
+	/* XXX We seem to get all kinds of junk via the filename field here,
+	 * including partial filenames, sha1:path, etc. We could parse it
+	 * ourselves, but that is probably insanity. We should ask the
+	 * caller to break it down more for us. For now, just pass NULL. */
+
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, NULL, sha1);
 		strbuf_release(&pathbuf);
 		return 0;
 	} else
@@ -394,7 +401,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 		struct grep_source gs;
 		int hit;
 
-		grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, NULL, sha1);
 		strbuf_release(&pathbuf);
 		hit = grep_source(opt, &gs);
 
@@ -414,7 +421,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename);
+		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		return 0;
 	} else
@@ -423,7 +430,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 		struct grep_source gs;
 		int hit;
 
-		grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename);
+		grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		hit = grep_source(opt, &gs);
 
diff --git a/grep.c b/grep.c
index edc7776..06bc1c6 100644
--- a/grep.c
+++ b/grep.c
@@ -1373,7 +1373,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 	struct grep_source gs;
 	int r;
 
-	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL);
+	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
 	gs.buf = buf;
 	gs.size = size;
 
@@ -1384,10 +1384,12 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 }
 
 void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const void *identifier)
+		      const char *name, const char *path,
+		      const void *identifier)
 {
 	gs->type = type;
 	gs->name = name ? xstrdup(name) : NULL;
+	gs->path = path ? xstrdup(path) : NULL;
 	gs->buf = NULL;
 	gs->size = 0;
 	gs->driver = NULL;
@@ -1409,6 +1411,8 @@ void grep_source_clear(struct grep_source *gs)
 {
 	free(gs->name);
 	gs->name = NULL;
+	free(gs->path);
+	gs->path = NULL;
 	free(gs->identifier);
 	gs->identifier = NULL;
 	grep_source_clear_data(gs);
diff --git a/grep.h b/grep.h
index c256ac6..c2cf57b 100644
--- a/grep.h
+++ b/grep.h
@@ -158,11 +158,13 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 	char *buf;
 	unsigned long size;
 
+	char *path; /* for attribute lookups */
 	struct userdiff_driver *driver;
 };
 
 void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const void *identifier);
+		      const char *name, const char *path,
+		      const void *identifier);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs);

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-09 12:41     ` Jeff King
@ 2012-10-09 18:59       ` Junio C Hamano
  2012-10-10  5:17         ` Nguyen Thai Ngoc Duy
  2012-10-10 11:34         ` Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 43+ messages in thread
From: Junio C Hamano @ 2012-10-09 18:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Nguyen Thai Ngoc Duy, Johannes Sixt, Git Mailing List

Jeff King <peff@peff.net> writes:

> I think we just need to have callers of grep_source_init provide us with
> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
> where the information is lost.

Yes.  I agree that is the right approach.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-09 18:59       ` Junio C Hamano
@ 2012-10-10  5:17         ` Nguyen Thai Ngoc Duy
  2012-10-10  5:33           ` Junio C Hamano
  2012-10-10 11:34         ` Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-10  5:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> I think we just need to have callers of grep_source_init provide us with
>> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
>> where the information is lost.
>
> Yes.  I agree that is the right approach.

Passing full path this way means prepare_attr_stack has to walk back
to top directory for every files (even in the same directory). If
.gitattributes are loaded while git-grep traverses the tree, then it
can preload attr once per directory. But Jeff's approach looks
simpler.
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-10  5:17         ` Nguyen Thai Ngoc Duy
@ 2012-10-10  5:33           ` Junio C Hamano
  2012-10-10  5:45             ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2012-10-10  5:33 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jeff King <peff@peff.net> writes:
>>
>>> I think we just need to have callers of grep_source_init provide us with
>>> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
>>> where the information is lost.
>>
>> Yes.  I agree that is the right approach.
>
> Passing full path this way means prepare_attr_stack has to walk back
> to top directory for every files (even in the same directory). If
> .gitattributes are loaded while git-grep traverses the tree, then it
> can preload attr once per directory. But Jeff's approach looks
> simpler.

Why can't you do both?  That is, to build a full path as you
descend, and read per-directory .gitattributes as you go?

Confused...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-10  5:33           ` Junio C Hamano
@ 2012-10-10  5:45             ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-10  5:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

On Wed, Oct 10, 2012 at 12:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>> On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Jeff King <peff@peff.net> writes:
>>>
>>>> I think we just need to have callers of grep_source_init provide us with
>>>> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
>>>> where the information is lost.
>>>
>>> Yes.  I agree that is the right approach.
>>
>> Passing full path this way means prepare_attr_stack has to walk back
>> to top directory for every files (even in the same directory). If
>> .gitattributes are loaded while git-grep traverses the tree, then it
>> can preload attr once per directory. But Jeff's approach looks
>> simpler.
>
> Why can't you do both?  That is, to build a full path as you
> descend, and read per-directory .gitattributes as you go?

Hm... I need to check attr.c code but I think it means read
.gitattributes and prepare attr stack in builtin/grep.c (where we
traverse trees) and actually check the attribute in
grep_source_load_driver(), much further down the call stack. I'm not
sure how we can pass the prepared attr stack down to
grep_source_load_driver().
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-09 18:59       ` Junio C Hamano
  2012-10-10  5:17         ` Nguyen Thai Ngoc Duy
@ 2012-10-10 11:34         ` Nguyễn Thái Ngọc Duy
  2012-10-10 11:34           ` [PATCH 1/3] quote: let caller reset buffer for quote_path_relative() Nguyễn Thái Ngọc Duy
                             ` (3 more replies)
  1 sibling, 4 replies; 43+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-10 11:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Nguyễn Thái Ngọc Duy

On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> I think we just need to have callers of grep_source_init provide us with
>> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
>> where the information is lost.
>
> Yes.  I agree that is the right approach.

Here we go. No additional tests as I don't know how to write them. But
strace shows it's ok.

Currently we don't have many options to optimize attr access. We
cannot prepare a attr stack in advance for a specific directory, then
check attributes many times using the stack. We cannot prepare attr
stack from a tree object. As a result a normal "grep pattern --"
results in a lot of duplicate failed open(".../.gitattributes") calls.

Jeff King (1):
  grep: pass true path name to grep machinery

Nguyễn Thái Ngọc Duy (2):
  quote: let caller reset buffer for quote_path_relative()
  grep: stop looking at random places for .gitattributes

 Documentation/git-grep.txt |  7 +++++--
 builtin/clean.c            |  2 ++
 builtin/grep.c             | 19 ++++++++++++-------
 builtin/ls-files.c         |  1 +
 grep.c                     | 11 ++++++++---
 grep.h                     |  4 +++-
 quote.c                    |  1 -
 wt-status.c                |  1 +
 8 files changed, 32 insertions(+), 14 deletions(-)

-- 
1.7.12.1.406.g6ab07c4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/3] quote: let caller reset buffer for quote_path_relative()
  2012-10-10 11:34         ` Nguyễn Thái Ngọc Duy
@ 2012-10-10 11:34           ` Nguyễn Thái Ngọc Duy
  2012-10-10 21:13             ` Junio C Hamano
  2012-10-10 11:34           ` [PATCH 2/3] grep: pass true path name to grep machinery Nguyễn Thái Ngọc Duy
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-10 11:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Nguyễn Thái Ngọc Duy

quote_path_relative() resetting output buffer is sometimes unnecessary
as the buffer has never been used, and some other times makes it
harder for the caller to use (see builtin/grep.c, the caller has to
insert a string after quote_path_relative)

Move the buffer reset back to call sites when necessary.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 The answer for Jeff's XXX in his patch, why strbuf_insert() instead
 of just adding in advance.

 builtin/clean.c    | 2 ++
 builtin/grep.c     | 2 +-
 builtin/ls-files.c | 1 +
 quote.c            | 1 -
 wt-status.c        | 1 +
 5 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 69c1cda..ff633cc 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -149,6 +149,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 		if (S_ISDIR(st.st_mode)) {
 			strbuf_addstr(&directory, ent->name);
+			strbuf_reset(&buf);
 			qname = quote_path_relative(directory.buf, directory.len, &buf, prefix);
 			if (show_only && (remove_directories ||
 			    (matches == MATCHED_EXACTLY))) {
@@ -171,6 +172,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		} else {
 			if (pathspec && !matches)
 				continue;
+			strbuf_reset(&buf);
 			qname = quote_path_relative(ent->name, -1, &buf, prefix);
 			if (show_only) {
 				printf(_("Would remove %s\n"), qname);
diff --git a/builtin/grep.c b/builtin/grep.c
index 82530a6..377c904 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -376,9 +376,9 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (opt->relative && opt->prefix_length) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
 		quote_path_relative(filename + tree_name_len, -1, &pathbuf,
 				    opt->prefix);
-		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b5434af..1e0cae9 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -395,6 +395,7 @@ int report_path_error(const char *ps_matched, const char **pathspec, const char
 		if (found_dup)
 			continue;
 
+		strbuf_reset(&sb);
 		name = quote_path_relative(pathspec[num], -1, &sb, prefix);
 		error("pathspec '%s' did not match any file(s) known to git.",
 		      name);
diff --git a/quote.c b/quote.c
index 911229f..7e23ba9 100644
--- a/quote.c
+++ b/quote.c
@@ -381,7 +381,6 @@ char *quote_path_relative(const char *in, int len,
 {
 	struct strbuf sb = STRBUF_INIT;
 	const char *rel = path_relative(in, len, &sb, prefix, -1);
-	strbuf_reset(out);
 	quote_c_style_counted(rel, strlen(rel), out, NULL, 0);
 	strbuf_release(&sb);
 
diff --git a/wt-status.c b/wt-status.c
index 2a9658b..be8b600 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -690,6 +690,7 @@ static void wt_status_print_other(struct wt_status *s,
 		struct string_list_item *it;
 		const char *path;
 		it = &(l->items[i]);
+		strbuf_reset(&buf);
 		path = quote_path(it->string, strlen(it->string),
 				  &buf, s->prefix);
 		if (column_active(s->colopts)) {
-- 
1.7.12.1.406.g6ab07c4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/3] grep: pass true path name to grep machinery
  2012-10-10 11:34         ` Nguyễn Thái Ngọc Duy
  2012-10-10 11:34           ` [PATCH 1/3] quote: let caller reset buffer for quote_path_relative() Nguyễn Thái Ngọc Duy
@ 2012-10-10 11:34           ` Nguyễn Thái Ngọc Duy
  2012-10-10 11:34           ` [PATCH 3/3] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
  2012-10-10 13:59           ` [PATCH v2 0/2] Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree Nguyễn Thái Ngọc Duy
  3 siblings, 0 replies; 43+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-10 11:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Nguyễn Thái Ngọc Duy

From: Jeff King <peff@peff.net>

Having path name (if available) may be useful. For example, if grep
machinery needs to look up git attributes, it'll need a good path
name. grep_source->name is not good because it's for display purpose
only.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/grep.c | 17 +++++++++++------
 grep.c         |  8 ++++++--
 grep.h         |  4 +++-
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 377c904..0211e35 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -86,7 +86,7 @@ static pthread_cond_t cond_result;
 static int skip_first_line;
 
 static void add_work(struct grep_opt *opt, enum grep_source_type type,
-		     const char *name, const void *id)
+		     const char *name, const char *path, const void *id)
 {
 	grep_lock();
 
@@ -94,7 +94,7 @@ static void add_work(struct grep_opt *opt, enum grep_source_type type,
 		pthread_cond_wait(&cond_write, &grep_mutex);
 	}
 
-	grep_source_init(&todo[todo_end].source, type, name, id);
+	grep_source_init(&todo[todo_end].source, type, name, path, id);
 	if (opt->binary != GREP_BINARY_TEXT)
 		grep_source_load_driver(&todo[todo_end].source);
 	todo[todo_end].done = 0;
@@ -383,9 +383,14 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 		strbuf_addstr(&pathbuf, filename);
 	}
 
+	/* XXX We seem to get all kinds of junk via the filename field here,
+	 * including partial filenames, sha1:path, etc. We could parse it
+	 * ourselves, but that is probably insanity. We should ask the
+	 * caller to break it down more for us. For now, just pass NULL. */
+
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, NULL, sha1);
 		strbuf_release(&pathbuf);
 		return 0;
 	} else
@@ -394,7 +399,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 		struct grep_source gs;
 		int hit;
 
-		grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, NULL, sha1);
 		strbuf_release(&pathbuf);
 		hit = grep_source(opt, &gs);
 
@@ -414,7 +419,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename);
+		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		return 0;
 	} else
@@ -423,7 +428,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 		struct grep_source gs;
 		int hit;
 
-		grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename);
+		grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		hit = grep_source(opt, &gs);
 
diff --git a/grep.c b/grep.c
index edc7776..06bc1c6 100644
--- a/grep.c
+++ b/grep.c
@@ -1373,7 +1373,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 	struct grep_source gs;
 	int r;
 
-	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL);
+	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
 	gs.buf = buf;
 	gs.size = size;
 
@@ -1384,10 +1384,12 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 }
 
 void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const void *identifier)
+		      const char *name, const char *path,
+		      const void *identifier)
 {
 	gs->type = type;
 	gs->name = name ? xstrdup(name) : NULL;
+	gs->path = path ? xstrdup(path) : NULL;
 	gs->buf = NULL;
 	gs->size = 0;
 	gs->driver = NULL;
@@ -1409,6 +1411,8 @@ void grep_source_clear(struct grep_source *gs)
 {
 	free(gs->name);
 	gs->name = NULL;
+	free(gs->path);
+	gs->path = NULL;
 	free(gs->identifier);
 	gs->identifier = NULL;
 	grep_source_clear_data(gs);
diff --git a/grep.h b/grep.h
index c256ac6..c2cf57b 100644
--- a/grep.h
+++ b/grep.h
@@ -158,11 +158,13 @@ struct grep_source {
 	char *buf;
 	unsigned long size;
 
+	char *path; /* for attribute lookups */
 	struct userdiff_driver *driver;
 };
 
 void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const void *identifier);
+		      const char *name, const char *path,
+		      const void *identifier);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs);
-- 
1.7.12.1.406.g6ab07c4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 11:34         ` Nguyễn Thái Ngọc Duy
  2012-10-10 11:34           ` [PATCH 1/3] quote: let caller reset buffer for quote_path_relative() Nguyễn Thái Ngọc Duy
  2012-10-10 11:34           ` [PATCH 2/3] grep: pass true path name to grep machinery Nguyễn Thái Ngọc Duy
@ 2012-10-10 11:34           ` Nguyễn Thái Ngọc Duy
  2012-10-10 11:51             ` Johannes Sixt
  2012-10-10 13:59           ` [PATCH v2 0/2] Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree Nguyễn Thái Ngọc Duy
  3 siblings, 1 reply; 43+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-10 11:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Nguyễn Thái Ngọc Duy

grep searches for .gitattributes using "name" field in struct
grep_source but that field is not real on-disk path name. For example,
"grep pattern rev" fills the field with "rev:path", which is
non-existent usually until somebody exploits it to drive git away.

attr does not support looking up .gitattributes from a tree object.
Making "git grep pattern <rev>" support .gitattributes could be a big
work. Just note in document what we support for now. The document
changes in this patch are to be reverted once support is in place.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-grep.txt | 7 +++++--
 grep.c                     | 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index cfecf84..a4c66ee 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -86,7 +86,9 @@ OPTIONS
 	files.
 
 -I::
-	Don't match the pattern in binary files.
+	Don't match the pattern in binary files. Note that binary
+	detection via .gitattributes only works with searching files
+	in working directory.
 
 --max-depth <depth>::
 	For each <pathspec> given on command line, descend at most <depth>
@@ -189,7 +191,8 @@ OPTIONS
 	the match, unless the matching line is a function name itself.
 	The name is determined in the same way as 'git diff' works out
 	patch hunk headers (see 'Defining a custom hunk-header' in
-	linkgit:gitattributes[5]).
+	linkgit:gitattributes[5]). Note that .gitattributes are only
+	support for searching files in working directory.
 
 -<num>::
 -C <num>::
diff --git a/grep.c b/grep.c
index 06bc1c6..e36c01b 100644
--- a/grep.c
+++ b/grep.c
@@ -1505,7 +1505,8 @@ void grep_source_load_driver(struct grep_source *gs)
 		return;
 
 	grep_attr_lock();
-	gs->driver = userdiff_find_by_path(gs->name);
+	if (gs->path)
+		gs->driver = userdiff_find_by_path(gs->path);
 	if (!gs->driver)
 		gs->driver = userdiff_find_by_name("default");
 	grep_attr_unlock();
-- 
1.7.12.1.406.g6ab07c4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 11:34           ` [PATCH 3/3] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
@ 2012-10-10 11:51             ` Johannes Sixt
  2012-10-10 12:03               ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-10 11:51 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano, Jeff King

Thanks for working on this issue!

Am 10/10/2012 13:34, schrieb Nguyễn Thái Ngọc Duy:
> +	linkgit:gitattributes[5]). Note that .gitattributes are only
> +	support for searching files in working directory.

Does this mean it does not work for 'git grep --cached'? That would be a
real loss.

-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 11:51             ` Johannes Sixt
@ 2012-10-10 12:03               ` Nguyen Thai Ngoc Duy
  2012-10-10 12:12                 ` Johannes Sixt
  0 siblings, 1 reply; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-10 12:03 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, Jeff King

On Wed, Oct 10, 2012 at 6:51 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Thanks for working on this issue!
>
> Am 10/10/2012 13:34, schrieb Nguyễn Thái Ngọc Duy:
>> +     linkgit:gitattributes[5]). Note that .gitattributes are only
>> +     support for searching files in working directory.
>
> Does this mean it does not work for 'git grep --cached'? That would be a
> real loss.

I missed this case. I don't think it works correctly before as it
reads worktree's .gitattributes instead of the index version. But at
least we support reading .gitattributes from index. Patches coming
soon.
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 12:03               ` Nguyen Thai Ngoc Duy
@ 2012-10-10 12:12                 ` Johannes Sixt
  2012-10-10 12:32                   ` Nguyen Thai Ngoc Duy
  2012-10-10 19:44                   ` Junio C Hamano
  0 siblings, 2 replies; 43+ messages in thread
From: Johannes Sixt @ 2012-10-10 12:12 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Junio C Hamano, Jeff King

Am 10/10/2012 14:03, schrieb Nguyen Thai Ngoc Duy:
> On Wed, Oct 10, 2012 at 6:51 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>> Thanks for working on this issue!
>>
>> Am 10/10/2012 13:34, schrieb Nguyễn Thái Ngọc Duy:
>>> +     linkgit:gitattributes[5]). Note that .gitattributes are only
>>> +     support for searching files in working directory.
>>
>> Does this mean it does not work for 'git grep --cached'? That would be a
>> real loss.
> 
> I missed this case. I don't think it works correctly before as it
> reads worktree's .gitattributes instead of the index version. But at
> least we support reading .gitattributes from index. Patches coming
> soon.

Is there already an established definition which the "correct"
.gitattributes are? IIRC, everywhere else we are looking at the
.gitattributes in the worktree, regardless of whether the object at the
path in question is in the worktree, the index, or in an old commit.

-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 12:12                 ` Johannes Sixt
@ 2012-10-10 12:32                   ` Nguyen Thai Ngoc Duy
  2012-10-10 12:43                     ` Johannes Sixt
  2012-10-10 19:44                   ` Junio C Hamano
  1 sibling, 1 reply; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-10 12:32 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, Jeff King

On Wed, Oct 10, 2012 at 7:12 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Is there already an established definition which the "correct"
> .gitattributes are?

If I ask to grep the index then to me it should read only the index.
Although other people can counter that they may want different
attributes than the one stored in index, which either comes from
worktree or $GIT_DIR.

> IIRC, everywhere else we are looking at the
> .gitattributes in the worktree, regardless of whether the object at the
> path in question is in the worktree, the index, or in an old commit.

git-archive has --worktree-attributes to specify where attributes come
from. Sparse checkout can choose to read index version first then
worktree's or the other way around. All normal operations read
worktree version, if not found index version.
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 12:32                   ` Nguyen Thai Ngoc Duy
@ 2012-10-10 12:43                     ` Johannes Sixt
  2012-10-10 12:51                       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-10 12:43 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Junio C Hamano, Jeff King

Am 10/10/2012 14:32, schrieb Nguyen Thai Ngoc Duy:
> git-archive has --worktree-attributes to specify where attributes come
> from. Sparse checkout can choose to read index version first then
> worktree's or the other way around. All normal operations read
> worktree version, if not found index version.

So, even if I run 'git diff master~2000 master~1999', it uses the
attributes from the worktree, and if not found from the index? (I would
not mind, BTW.)

-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 12:43                     ` Johannes Sixt
@ 2012-10-10 12:51                       ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-10 12:51 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, Jeff King

On Wed, Oct 10, 2012 at 7:43 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Am 10/10/2012 14:32, schrieb Nguyen Thai Ngoc Duy:
>> git-archive has --worktree-attributes to specify where attributes come
>> from. Sparse checkout can choose to read index version first then
>> worktree's or the other way around. All normal operations read
>> worktree version, if not found index version.
>
> So, even if I run 'git diff master~2000 master~1999', it uses the
> attributes from the worktree, and if not found from the index? (I would
> not mind, BTW.)

Yes. Tested and verified.
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 0/2] Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
  2012-10-10 11:34         ` Nguyễn Thái Ngọc Duy
                             ` (2 preceding siblings ...)
  2012-10-10 11:34           ` [PATCH 3/3] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
@ 2012-10-10 13:59           ` Nguyễn Thái Ngọc Duy
  2012-10-10 13:59             ` [PATCH v2 1/2] quote: let caller reset buffer for quote_path_relative() Nguyễn Thái Ngọc Duy
  2012-10-10 13:59             ` [PATCH v2 2/2] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
  3 siblings, 2 replies; 43+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-10 13:59 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Nguyễn Thái Ngọc Duy

Round too. .gitattributes should be respected in all cases except blob
grepping. As Johannes pointed elsewhere in this thread, if "git
diff <rev1> <rev2>" looks up worktree's .gitattributes (and none in
rev1/rev2), there's no reason "git grep" should behave differently.

Nguyễn Thái Ngọc Duy (2):
  quote: let caller reset buffer for quote_path_relative()
  grep: stop looking at random places for .gitattributes

 builtin/clean.c        |  2 ++
 builtin/grep.c         | 24 +++++++++++++-----------
 builtin/ls-files.c     |  1 +
 grep.c                 | 11 ++++++++---
 grep.h                 |  4 +++-
 quote.c                |  1 -
 t/t7008-grep-binary.sh | 22 ++++++++++++++++++++++
 wt-status.c            |  1 +
 8 files changed, 50 insertions(+), 16 deletions(-)

-- 
1.7.12.1.406.g6ab07c4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 1/2] quote: let caller reset buffer for quote_path_relative()
  2012-10-10 13:59           ` [PATCH v2 0/2] Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree Nguyễn Thái Ngọc Duy
@ 2012-10-10 13:59             ` Nguyễn Thái Ngọc Duy
  2012-10-10 13:59             ` [PATCH v2 2/2] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
  1 sibling, 0 replies; 43+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-10 13:59 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Nguyễn Thái Ngọc Duy

quote_path_relative() resetting output buffer is sometimes unnecessary
as the buffer has never been used, and some other times makes it
harder for the caller to use (see builtin/grep.c, the caller has to
insert a string after quote_path_relative)

Move the buffer reset back to call sites when necessary.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clean.c    | 2 ++
 builtin/grep.c     | 2 +-
 builtin/ls-files.c | 1 +
 quote.c            | 1 -
 wt-status.c        | 1 +
 5 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 69c1cda..ff633cc 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -149,6 +149,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 		if (S_ISDIR(st.st_mode)) {
 			strbuf_addstr(&directory, ent->name);
+			strbuf_reset(&buf);
 			qname = quote_path_relative(directory.buf, directory.len, &buf, prefix);
 			if (show_only && (remove_directories ||
 			    (matches == MATCHED_EXACTLY))) {
@@ -171,6 +172,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		} else {
 			if (pathspec && !matches)
 				continue;
+			strbuf_reset(&buf);
 			qname = quote_path_relative(ent->name, -1, &buf, prefix);
 			if (show_only) {
 				printf(_("Would remove %s\n"), qname);
diff --git a/builtin/grep.c b/builtin/grep.c
index 82530a6..377c904 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -376,9 +376,9 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (opt->relative && opt->prefix_length) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
 		quote_path_relative(filename + tree_name_len, -1, &pathbuf,
 				    opt->prefix);
-		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b5434af..1e0cae9 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -395,6 +395,7 @@ int report_path_error(const char *ps_matched, const char **pathspec, const char
 		if (found_dup)
 			continue;
 
+		strbuf_reset(&sb);
 		name = quote_path_relative(pathspec[num], -1, &sb, prefix);
 		error("pathspec '%s' did not match any file(s) known to git.",
 		      name);
diff --git a/quote.c b/quote.c
index 911229f..7e23ba9 100644
--- a/quote.c
+++ b/quote.c
@@ -381,7 +381,6 @@ char *quote_path_relative(const char *in, int len,
 {
 	struct strbuf sb = STRBUF_INIT;
 	const char *rel = path_relative(in, len, &sb, prefix, -1);
-	strbuf_reset(out);
 	quote_c_style_counted(rel, strlen(rel), out, NULL, 0);
 	strbuf_release(&sb);
 
diff --git a/wt-status.c b/wt-status.c
index 2a9658b..be8b600 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -690,6 +690,7 @@ static void wt_status_print_other(struct wt_status *s,
 		struct string_list_item *it;
 		const char *path;
 		it = &(l->items[i]);
+		strbuf_reset(&buf);
 		path = quote_path(it->string, strlen(it->string),
 				  &buf, s->prefix);
 		if (column_active(s->colopts)) {
-- 
1.7.12.1.406.g6ab07c4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-10 13:59           ` [PATCH v2 0/2] Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree Nguyễn Thái Ngọc Duy
  2012-10-10 13:59             ` [PATCH v2 1/2] quote: let caller reset buffer for quote_path_relative() Nguyễn Thái Ngọc Duy
@ 2012-10-10 13:59             ` Nguyễn Thái Ngọc Duy
  2012-10-10 14:21               ` Johannes Sixt
  2012-10-12 10:49               ` [PATCH v3] " Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 43+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-10 13:59 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Nguyễn Thái Ngọc Duy

grep searches for .gitattributes using "name" field in struct
grep_source but that field is not real on-disk path name. For example,
"grep pattern rev" fills the field with "rev:path", which is
non-existent usually until somebody exploits it to drive git away.

This patch passes real paths down to grep_source_load_driver(). Except
grepping a blob, all other cases should have right paths down to
grep_source_load_driver(). In other words, .gitattributes are still
respected.

Initial-work-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/grep.c         | 22 ++++++++++++----------
 grep.c                 | 11 ++++++++---
 grep.h                 |  4 +++-
 t/t7008-grep-binary.sh | 22 ++++++++++++++++++++++
 4 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 377c904..f6c5ba2 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -86,7 +86,7 @@ static pthread_cond_t cond_result;
 static int skip_first_line;
 
 static void add_work(struct grep_opt *opt, enum grep_source_type type,
-		     const char *name, const void *id)
+		     const char *name, const char *path, const void *id)
 {
 	grep_lock();
 
@@ -94,7 +94,7 @@ static void add_work(struct grep_opt *opt, enum grep_source_type type,
 		pthread_cond_wait(&cond_write, &grep_mutex);
 	}
 
-	grep_source_init(&todo[todo_end].source, type, name, id);
+	grep_source_init(&todo[todo_end].source, type, name, path, id);
 	if (opt->binary != GREP_BINARY_TEXT)
 		grep_source_load_driver(&todo[todo_end].source);
 	todo[todo_end].done = 0;
@@ -371,7 +371,8 @@ static void *lock_and_read_sha1_file(const unsigned char *sha1, enum object_type
 }
 
 static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
-		     const char *filename, int tree_name_len)
+		     const char *filename, int tree_name_len,
+		     const char *path)
 {
 	struct strbuf pathbuf = STRBUF_INIT;
 
@@ -385,7 +386,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, path, sha1);
 		strbuf_release(&pathbuf);
 		return 0;
 	} else
@@ -394,7 +395,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 		struct grep_source gs;
 		int hit;
 
-		grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, path, sha1);
 		strbuf_release(&pathbuf);
 		hit = grep_source(opt, &gs);
 
@@ -414,7 +415,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename);
+		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		return 0;
 	} else
@@ -423,7 +424,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 		struct grep_source gs;
 		int hit;
 
-		grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename);
+		grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		hit = grep_source(opt, &gs);
 
@@ -479,7 +480,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
 			if (ce_stage(ce))
 				continue;
-			hit |= grep_sha1(opt, ce->sha1, ce->name, 0);
+			hit |= grep_sha1(opt, ce->sha1, ce->name, 0, ce->name);
 		}
 		else
 			hit |= grep_file(opt, ce->name);
@@ -518,7 +519,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		strbuf_add(base, entry.path, te_len);
 
 		if (S_ISREG(entry.mode)) {
-			hit |= grep_sha1(opt, entry.sha1, base->buf, tn_len);
+			hit |= grep_sha1(opt, entry.sha1, base->buf, tn_len,
+					 base->buf + tn_len);
 		}
 		else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
@@ -548,7 +550,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		       struct object *obj, const char *name)
 {
 	if (obj->type == OBJ_BLOB)
-		return grep_sha1(opt, obj->sha1, name, 0);
+		return grep_sha1(opt, obj->sha1, name, 0, NULL);
 	if (obj->type == OBJ_COMMIT || obj->type == OBJ_TREE) {
 		struct tree_desc tree;
 		void *data;
diff --git a/grep.c b/grep.c
index edc7776..e36c01b 100644
--- a/grep.c
+++ b/grep.c
@@ -1373,7 +1373,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 	struct grep_source gs;
 	int r;
 
-	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL);
+	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
 	gs.buf = buf;
 	gs.size = size;
 
@@ -1384,10 +1384,12 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 }
 
 void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const void *identifier)
+		      const char *name, const char *path,
+		      const void *identifier)
 {
 	gs->type = type;
 	gs->name = name ? xstrdup(name) : NULL;
+	gs->path = path ? xstrdup(path) : NULL;
 	gs->buf = NULL;
 	gs->size = 0;
 	gs->driver = NULL;
@@ -1409,6 +1411,8 @@ void grep_source_clear(struct grep_source *gs)
 {
 	free(gs->name);
 	gs->name = NULL;
+	free(gs->path);
+	gs->path = NULL;
 	free(gs->identifier);
 	gs->identifier = NULL;
 	grep_source_clear_data(gs);
@@ -1501,7 +1505,8 @@ void grep_source_load_driver(struct grep_source *gs)
 		return;
 
 	grep_attr_lock();
-	gs->driver = userdiff_find_by_path(gs->name);
+	if (gs->path)
+		gs->driver = userdiff_find_by_path(gs->path);
 	if (!gs->driver)
 		gs->driver = userdiff_find_by_name("default");
 	grep_attr_unlock();
diff --git a/grep.h b/grep.h
index c256ac6..c2cf57b 100644
--- a/grep.h
+++ b/grep.h
@@ -158,11 +158,13 @@ struct grep_source {
 	char *buf;
 	unsigned long size;
 
+	char *path; /* for attribute lookups */
 	struct userdiff_driver *driver;
 };
 
 void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const void *identifier);
+		      const char *name, const char *path,
+		      const void *identifier);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs);
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index fd6410f..2c85a30 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -111,6 +111,28 @@ test_expect_success 'grep respects binary diff attribute' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep --cached respects binary diff attribute' '
+	git grep --cached text t >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep --cached respects binary diff attribute (2)' '
+	git add .gitattributes &&
+	rm .gitattributes &&
+	git grep --cached text t >actual &&
+	test_cmp expect actual &&
+	git checkout .gitattributes &&
+	git rm --cached .gitattributes
+'
+
+test_expect_success 'grep tree respects binary diff attribute' '
+	git commit -m new &&
+	echo "Binary file HEAD:t matches" >expect &&
+	git grep text HEAD -- t >actual &&
+	test_cmp expect actual &&
+	git reset HEAD^
+'
+
 test_expect_success 'grep respects not-binary diff attribute' '
 	echo binQary | q_to_nul >b &&
 	git add b &&
-- 
1.7.12.1.406.g6ab07c4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-10 13:59             ` [PATCH v2 2/2] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
@ 2012-10-10 14:21               ` Johannes Sixt
  2012-10-10 19:56                 ` Junio C Hamano
  2012-10-11  1:49                 ` Nguyen Thai Ngoc Duy
  2012-10-12 10:49               ` [PATCH v3] " Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 43+ messages in thread
From: Johannes Sixt @ 2012-10-10 14:21 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano, Jeff King

Am 10/10/2012 15:59, schrieb Nguyễn Thái Ngọc Duy:
> This patch passes real paths down to grep_source_load_driver(). Except
> grepping a blob, all other cases should have right paths down to

... grepping a blob or tree object...

> grep_source_load_driver(). In other words, .gitattributes are still
> respected.
...
> +test_expect_success 'grep --cached respects binary diff attribute (2)' '
> +	git add .gitattributes &&
> +	rm .gitattributes &&
> +	git grep --cached text t >actual &&
> +	test_cmp expect actual &&
> +	git checkout .gitattributes &&
> +	git rm --cached .gitattributes
> +'

This should perhaps be test_when_finished "git rm --cached .gitattributes".

> +
> +test_expect_success 'grep tree respects binary diff attribute' '

I was confused by the word "tree" here. Isn't "pathspec" more correct?

> +	git commit -m new &&
> +	echo "Binary file HEAD:t matches" >expect &&
> +	git grep text HEAD -- t >actual &&
> +	test_cmp expect actual &&
> +	git reset HEAD^
> +'

And in yet another test, should

	git grep text HEAD:t

/not/ respect the binary attribute?

At any rate, I don't observe the warnings anymore with this series.

Thanks,
-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 12:12                 ` Johannes Sixt
  2012-10-10 12:32                   ` Nguyen Thai Ngoc Duy
@ 2012-10-10 19:44                   ` Junio C Hamano
  2012-10-11  5:55                     ` Johannes Sixt
  1 sibling, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2012-10-10 19:44 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Nguyen Thai Ngoc Duy, git, Jeff King

Johannes Sixt <j.sixt@viscovery.net> writes:

> Is there already an established definition which the "correct"
> .gitattributes are? IIRC, everywhere else we are looking at the
> .gitattributes in the worktree, regardless of whether the object at the
> path in question is in the worktree, the index, or in an old commit.

No, and it is deliberately kept vague while waiting for us to come
up with a clear definition of what is "correct".

We could declare, from a purist's point of view, that the attribute
should be taken from the same place as the path in question is taken
from.  When running "git add foo.c", we grab the contents of "foo.c"
from the working tree, so ".gitignore" from the working tree should
be applied when dealing with "foo.c".  Similarly, the contents of
blob "foo.c" that "git checkout foo.c" reads from the index would
get attributes from ".gitignore" in the index (to find what its
smudging semantics is) before it gets written out to the working
tree.  "git diff A B" may give the attributes from tree A to the
preimage side while using the attributes from tree B to the
postimage side.

But the last example has some practical issues.  Very often, people
retroactively define attributes to correct earlier mistakes.  If an
older tree A forgot to declare that a path mybank.gnucash is a
GnuCash ledger file, while a newer tree B (and the current checkout
that is even newer) does [*1*], it is more useful to apply the newer
definition from .gitattributes to both trees in practice (and in
practice, you are much less likely to have a check-out of ancient
tree while running "git diff A B" to compare two trees that are
newer than the current check-out).  Using the file from the working
tree is the best approximation of "we want to use the newer one",
both from the semantics (i.e. you are likely to have fresher tree
checked out) and the performance (i.e. reading from files in the
working tree is far more trivial than reading from historical trees)
point of view.

So it is not so cut-and-dried that "take the attributes from the
same place" is a good and "correct" definition [*2*].


[Footnote]

*1* GnuCash writes, by default, a gzip compressed xml file, so I
have in my .gitattributes file

	*.gnucash	filter=gnucash

and then in my .git/config

	[filter "gnucash"]
        	clean = gzip -dc
                smudge = gzip -c

This allows "git diff" to work reasonably well (if you do not mind
reading diff between two versions of xml files, that is) and also
helps delta compression when packing the repository.


*2* Besides, the attributes are primarily used to define the
semantics about the contents in question.  If one file is of
"gnucash" kind (i.e. has "filter=gnucash" attribute in the previous
example) in one tree, and the path is of a different kind
(e.g. "filter=ooo" that says "this is an Ooo file"), it is very
likely that it does not even make sense, with or without content
filtering, to compare them with "git diff", so "take the attributes
from the same place" would have to imply "if the attributes do not
match, say something similar to 'Binary files differ'", which is
just as useless as applying one attribute taken from a convenient
but random place (i.e. the working tree).

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-10 14:21               ` Johannes Sixt
@ 2012-10-10 19:56                 ` Junio C Hamano
  2012-10-11  5:45                   ` Johannes Sixt
  2012-10-11  1:49                 ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2012-10-10 19:56 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Johannes Sixt <j.sixt@viscovery.net> writes:

> At any rate, I don't observe the warnings anymore with this series.

What kind of warnings have you been getting?  Earlier we had a bug
in the jk/config-warn-on-inaccessible-paths series that made it warn
when we tried to open a .gitattribute file and open() returned an
error other than ENOENT.  The bug was that we saw unnecessary errors
when a directory that used to exist no longer exists in the working
tree; we would instead get ENOTDIR in such a case that needs to be
ignored.

The problem was supposed to be "fixed" by 8e950da (attr: failure to
open a ".gitattributes" file is OK with ENOTDIR, 2012-09-13); if you
are still seeing the error, that error still may need to be
addressed, regardless of Nguyễn's patch.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] quote: let caller reset buffer for quote_path_relative()
  2012-10-10 11:34           ` [PATCH 1/3] quote: let caller reset buffer for quote_path_relative() Nguyễn Thái Ngọc Duy
@ 2012-10-10 21:13             ` Junio C Hamano
  2012-10-11 13:04               ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2012-10-10 21:13 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Johannes Sixt

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> quote_path_relative() resetting output buffer is sometimes unnecessary
> as the buffer has never been used, and some other times makes it
> harder for the caller to use (see builtin/grep.c, the caller has to
> insert a string after quote_path_relative)
>
> Move the buffer reset back to call sites when necessary.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  The answer for Jeff's XXX in his patch, why strbuf_insert() instead
>  of just adding in advance.

This sounds a lot stronger than "let" to me.  All existing callers
that assumed that buf to be emptied by the function now has to clear
it.  "quote: stop resetting output buffer of quote_path_relative()"
may better describe what this really does.

How should this interact with the logic in the called function that
used to say "if we ended up returning an empty string because the
path is the same as the base, we should give ./ back", and what
should the return value of this function be?

To answer these questions, you must define the meaning of the string
in the output buffer that already exists when the function is
called.  If the caller did this:

	strbuf_addstr(&out, "The path relative to your HOME is: ");
        quote_path_relative(path, pathlen, &out, "/home/pclouds/");

then the answers are "We still need to add ./ but !out->len is no
longer a good test to decide" and "It should point at the first byte
of what we added, not out->buf".

But if the caller did this instead:

	srcdir = "/src/";
	strbuf_addstr(&dst, "/dst/");
        quote_path_relative(path, pathlen, &dst, srcdir);
        printf("cp '%s' '%s'\n", path, dst->buf);

then it is nonsensical to add "./" when out->len is not zero when
the function returns.

So what does it mean to have an existing string in the output buffer
when calling the function?  Is it supposed to be a path to a
directory, or just a general uninterpreted string (e.g. a message)?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-10 14:21               ` Johannes Sixt
  2012-10-10 19:56                 ` Junio C Hamano
@ 2012-10-11  1:49                 ` Nguyen Thai Ngoc Duy
  2012-10-11  3:15                   ` Junio C Hamano
  1 sibling, 1 reply; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-11  1:49 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, Jeff King

On Wed, Oct 10, 2012 at 9:21 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>> +     git commit -m new &&
>> +     echo "Binary file HEAD:t matches" >expect &&
>> +     git grep text HEAD -- t >actual &&
>> +     test_cmp expect actual &&
>> +     git reset HEAD^
>> +'
>
> And in yet another test, should
>
>         git grep text HEAD:t
>
> /not/ respect the binary attribute?

Gray area. Is it ok to do that without documenting it (i.e. common
sense)? I have something in mind that could do that, but it also makes
"git grep text HEAD^{tree}" not respect attributes.
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-11  1:49                 ` Nguyen Thai Ngoc Duy
@ 2012-10-11  3:15                   ` Junio C Hamano
  0 siblings, 0 replies; 43+ messages in thread
From: Junio C Hamano @ 2012-10-11  3:15 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Johannes Sixt, git, Jeff King

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Wed, Oct 10, 2012 at 9:21 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>>> +     git commit -m new &&
>>> +     echo "Binary file HEAD:t matches" >expect &&
>>> +     git grep text HEAD -- t >actual &&
>>> +     test_cmp expect actual &&
>>> +     git reset HEAD^
>>> +'
>>
>> And in yet another test, should
>>
>>         git grep text HEAD:t
>>
>> /not/ respect the binary attribute?
>
> Gray area. Is it ok to do that without documenting it (i.e. common
> sense)? I have something in mind that could do that, but it also makes
> "git grep text HEAD^{tree}" not respect attributes.

Personally, I do not think HEAD:t is worth worrying about.

We could use the get_sha1_with_context() to get "t" out of "HEAD:t",
and we could even enhance get_sha1_with_context() to also preserve
the value of what came before the colon, but that would mean that
these three

    git grep text HEAD:t/t0200
    git grep text $(git rev-parse HEAD:t/t0200)
    git grep text $(git rev-parse HEAD:t):t0200

would behave differently; only the first one has any chance of
applying the correct set of ".gitattributes".  All of them would be
able to use the ".gitattributes" file contained in the tree object
that corresponds to t/t0200 (if we updated attr.c to read from tree
objects, that is), but the latter two would skip ".gitattributes"
files from the top-level and "t" directories, still using the final
fallback definition from $GIT_DIR/info/attributes file.

If we have to draw a line somewhere, the saner place to draw it is
to stop at

    git grep text HEAD -- t/t0200

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-10 19:56                 ` Junio C Hamano
@ 2012-10-11  5:45                   ` Johannes Sixt
  2012-10-11 15:51                     ` Junio C Hamano
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-11  5:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Am 10/10/2012 21:56, schrieb Junio C Hamano:
> Johannes Sixt <j.sixt@viscovery.net> writes:
> 
>> At any rate, I don't observe the warnings anymore with this series.
> 
> What kind of warnings have you been getting?  Earlier we had a bug
> in the jk/config-warn-on-inaccessible-paths series that made it warn
> when we tried to open a .gitattribute file and open() returned an
> error other than ENOENT.  The bug was that we saw unnecessary errors
> when a directory that used to exist no longer exists in the working
> tree; we would instead get ENOTDIR in such a case that needs to be
> ignored.

I saw EINVAL errors when 'git grep pattern rev' was run on Windows. The
reason is that the code attempted to access "rev:dir/.gitattributes" in
the worktree, which is an invalid path on Windows due to the colon. The
lack of this warning indicates that the attempts to access these files are
eliminated.

-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-10 19:44                   ` Junio C Hamano
@ 2012-10-11  5:55                     ` Johannes Sixt
  2012-10-11  7:04                       ` Michael Haggerty
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-11  5:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git, Jeff King

Am 10/10/2012 21:44, schrieb Junio C Hamano:
> Johannes Sixt <j.sixt@viscovery.net> writes:
> 
>> Is there already an established definition which the "correct"
>> .gitattributes are?
> 
> No, and it is deliberately kept vague while waiting for us to come
> up with a clear definition of what is "correct".
...
> Very often, people
> retroactively define attributes to correct earlier mistakes.

Absolutely. I have Windows resource files that are Shift-JIS encoded
checked in long ago, and I want to retoactively declare them with
"encoding=Shift-JIS" because I prefer to see Japanese script in gitk
rather than gibberish.

-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-11  5:55                     ` Johannes Sixt
@ 2012-10-11  7:04                       ` Michael Haggerty
  2012-10-11  8:17                         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 43+ messages in thread
From: Michael Haggerty @ 2012-10-11  7:04 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, git, Jeff King

On 10/11/2012 07:55 AM, Johannes Sixt wrote:
> Am 10/10/2012 21:44, schrieb Junio C Hamano:
>> Johannes Sixt <j.sixt@viscovery.net> writes:
>>
>>> Is there already an established definition which the "correct"
>>> .gitattributes are?
>>
>> No, and it is deliberately kept vague while waiting for us to come
>> up with a clear definition of what is "correct".
> ...
>> Very often, people
>> retroactively define attributes to correct earlier mistakes.
> 
> Absolutely. I have Windows resource files that are Shift-JIS encoded
> checked in long ago, and I want to retoactively declare them with
> "encoding=Shift-JIS" because I prefer to see Japanese script in gitk
> rather than gibberish.

Maybe I'm being too much of a purist, but I don't think that git should
retroactively reinterpret history on its own initiative in a way that
might not be correct (e.g., maybe your encoding changed from ASCII to
Shift-JIS sometime in the past).  It would be more appropriate for this
to happen only if explicitly requested by the user.  For example, why
don't you override the incorrect historical attributes via
.git/info/attributes?

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes
  2012-10-11  7:04                       ` Michael Haggerty
@ 2012-10-11  8:17                         ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-11  8:17 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Johannes Sixt, Junio C Hamano, git, Jeff King

On Thu, Oct 11, 2012 at 2:04 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Maybe I'm being too much of a purist, but I don't think that git should
> retroactively reinterpret history on its own initiative in a way that
> might not be correct (e.g., maybe your encoding changed from ASCII to
> Shift-JIS sometime in the past).  It would be more appropriate for this
> to happen only if explicitly requested by the user.  For example, why
> don't you override the incorrect historical attributes via
> .git/info/attributes?

I think git-notes is a more appropriate place to correct these things.
If the incorrect commits are pruned, their notes can also be pruned.
No poluttion in $GIT_DIR/info/attr.. And i'm your side, no checking
worktree without user's permission.
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] quote: let caller reset buffer for quote_path_relative()
  2012-10-10 21:13             ` Junio C Hamano
@ 2012-10-11 13:04               ` Nguyen Thai Ngoc Duy
  2012-10-11 16:42                 ` Junio C Hamano
  0 siblings, 1 reply; 43+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-11 13:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Johannes Sixt

On Thu, Oct 11, 2012 at 4:13 AM, Junio C Hamano <gitster@pobox.com> wrote:
> This sounds a lot stronger than "let" to me.  All existing callers
> that assumed that buf to be emptied by the function now has to clear
> it.  "quote: stop resetting output buffer of quote_path_relative()"
> may better describe what this really does.
>
> How should this interact with the logic in the called function that
> used to say "if we ended up returning an empty string because the
> path is the same as the base, we should give ./ back", and what
> should the return value of this function be?
> ...
> So what does it mean to have an existing string in the output buffer
> when calling the function?  Is it supposed to be a path to a
> directory, or just a general uninterpreted string (e.g. a message)?

I prefer the KISS principle in this case: do not interpret existing string.
-- 
Duy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-11  5:45                   ` Johannes Sixt
@ 2012-10-11 15:51                     ` Junio C Hamano
  2012-10-12  7:33                       ` Johannes Sixt
  0 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2012-10-11 15:51 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Johannes Sixt <j.sixt@viscovery.net> writes:

> I saw EINVAL errors when 'git grep pattern rev' was run on Windows. The
> reason is that the code attempted to access "rev:dir/.gitattributes" in
> the worktree, which is an invalid path on Windows due to the colon. The
> lack of this warning indicates that the attempts to access these files are
> eliminated.

It means that whenever we ask for attributes for a tracked file that
is inside a directory whose name contains a colon, we would get the
same error on Windows, no?  Perhaps Windows may not let you create
such a directory in the first place, but you may still get a
repository of a cross platform project that contains such a path.

What I am wondering is if we should do something similar to 8e950da
(attr: failure to open a .gitattributes file is OK with ENOTDIR,
2012-09-13), at least on Windows, for EINVAL.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] quote: let caller reset buffer for quote_path_relative()
  2012-10-11 13:04               ` Nguyen Thai Ngoc Duy
@ 2012-10-11 16:42                 ` Junio C Hamano
  0 siblings, 0 replies; 43+ messages in thread
From: Junio C Hamano @ 2012-10-11 16:42 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Jeff King, Johannes Sixt

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Thu, Oct 11, 2012 at 4:13 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> This sounds a lot stronger than "let" to me.  All existing callers
>> that assumed that buf to be emptied by the function now has to clear
>> it.  "quote: stop resetting output buffer of quote_path_relative()"
>> may better describe what this really does.
>>
>> How should this interact with the logic in the called function that
>> used to say "if we ended up returning an empty string because the
>> path is the same as the base, we should give ./ back", and what
>> should the return value of this function be?
>> ...
>> So what does it mean to have an existing string in the output buffer
>> when calling the function?  Is it supposed to be a path to a
>> directory, or just a general uninterpreted string (e.g. a message)?
>
> I prefer the KISS principle in this case: do not interpret existing string.

Sorry, I do not quite understand what you mean.

The original function is about turning one path into a path that is
relative to the other path (e.g. make the result usable for chdir()
when your cwd() is the other path to name that path), and because
you do not want to end up with an empty path, it adds "./" to the
result when the two paths are equal.

That is simple and stupid, in the sense that you can explain in
simple terms what it does even to a stupid person and make him
understand.

If a patch changes one aspect of the implementation of the function,
but keeps other parts intact without thinking the ramifications of
doing so through, and ends up giving the end result unnecessarily
complex semantics, the _patch_ might be simple and stupid, but the
end result is no longer simple.

That is why I asked what it means to "append" and asked you to write
it down for people who may need to decide if this function is an
appropriate thing to call for their new code in the future.

You didn't answer that question, so I have to ask again.  What is
the existing string this function appends its result to?

 - If it is a leading path (in other words, you are transplanting a
   hierarchy to somewhere else --- see and (re)read "But if the
   caller did this instead" part from the message you are responding
   to), "because you do not want to end up with an empty path, it
   adds "./" to the result when the two paths are equal" does not
   make sense at all.

 - If it is a message that describes the resulting relative path
   (the use case above that "transplant" example in the same
   message), it needs to add "./" to the result, but the logic to
   determine it now needs to take the length of what was in the
   buffer when the function was called into account.

 - Is there a third possibility?

Whatever your answer is, please make sure the next person who wants
to call this function from his code does not have to ask "Why does
it append "./" when out->len is zero?  Shouldn't the condition be
when *rel is an empty string?"

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-11 15:51                     ` Junio C Hamano
@ 2012-10-12  7:33                       ` Johannes Sixt
  2012-10-14  4:29                         ` Junio C Hamano
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-12  7:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Am 10/11/2012 17:51, schrieb Junio C Hamano:
> Johannes Sixt <j.sixt@viscovery.net> writes:
> 
>> I saw EINVAL errors when 'git grep pattern rev' was run on Windows. The
>> reason is that the code attempted to access "rev:dir/.gitattributes" in
>> the worktree, which is an invalid path on Windows due to the colon. The
>> lack of this warning indicates that the attempts to access these files are
>> eliminated.
> 
> It means that whenever we ask for attributes for a tracked file that
> is inside a directory whose name contains a colon, we would get the
> same error on Windows, no?  Perhaps Windows may not let you create
> such a directory in the first place, but you may still get a
> repository of a cross platform project that contains such a path.

Your assessment is correct.

> What I am wondering is if we should do something similar to 8e950da
> (attr: failure to open a .gitattributes file is OK with ENOTDIR,
> 2012-09-13), at least on Windows, for EINVAL.

It might be worth it. We already have a similar special case in
write_or_die.c:maybe_flush_or_die() for Windows, although it is not about
a colon in a path name.

Perhaps like this.

--- 8< ---
From: Johannes Sixt <j6t@kdbg.org>
Subject: [PATCH] attr: do not warn on path with colon on Windows

In the same spirit as commit 8e950dab (attr: failure to open a
.gitattributes file is OK with ENOTDIR), ignore EINVAL errors. On
Windows, a path that contains a colon that is not the one after the
drive letter is not allowed and is reported with errno set to
EINVAL.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 attr.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 8010429..ac945ad 100644
--- a/attr.c
+++ b/attr.c
@@ -354,7 +354,15 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 	int lineno = 0;
 
 	if (!fp) {
-		if (errno != ENOENT && errno != ENOTDIR)
+		/*
+		 * If path does not exist, it is not worth mentioning,
+		 * but I/O errors and permission errors are.
+		 *
+		 * On Windows, EINVAL is reported if path contains a colon
+		 * that is not the driver letter separator. Such a path
+		 * cannot exist in the file system and is also uninteresting.
+		 */
+		if (errno != ENOENT && errno != ENOTDIR && errno != EINVAL)
 			warn_on_inaccessible(path);
 		return NULL;
 	}
-- 
1.8.0.rc2.1237.g5522246

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3] grep: stop looking at random places for .gitattributes
  2012-10-10 13:59             ` [PATCH v2 2/2] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
  2012-10-10 14:21               ` Johannes Sixt
@ 2012-10-12 10:49               ` Nguyễn Thái Ngọc Duy
  2012-10-12 16:47                 ` Junio C Hamano
  1 sibling, 1 reply; 43+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-12 10:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Nguyễn Thái Ngọc Duy

grep searches for .gitattributes using "name" field in struct
grep_source but that field is not real on-disk path name. For example,
"grep pattern rev" fills the field with "rev:path", and Git looks for
.gitattributes in the (non-existent but exploitable) path "rev:path"
instead of "path".

This patch passes real paths down to grep_source_load_driver() when:

 - grep on work tree
 - grep on the index
 - grep a commit (or a tag if it points to a commit)

so that these cases look up .gitattributes at proper paths.
.gitattributes lookup is disabled in all other cases.

Initial-work-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 This fixes t7008 as Johannes commented and makes "git grep foo HEAD:t"
 not look up worktree's .gitattributes (in fact it does not look up
 anywhere).

 The quote_path_relative() patch is not required, it's an independent
 design issue.

 builtin/grep.c         | 31 ++++++++++++++++++-------------
 grep.c                 | 11 ++++++++---
 grep.h                 |  4 +++-
 t/t7008-grep-binary.sh | 22 ++++++++++++++++++++++
 4 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 82530a6..38a17eb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -86,7 +86,7 @@ static pthread_cond_t cond_result;
 static int skip_first_line;
 
 static void add_work(struct grep_opt *opt, enum grep_source_type type,
-		     const char *name, const void *id)
+		     const char *name, const char *path, const void *id)
 {
 	grep_lock();
 
@@ -94,7 +94,7 @@ static void add_work(struct grep_opt *opt, enum grep_source_type type,
 		pthread_cond_wait(&cond_write, &grep_mutex);
 	}
 
-	grep_source_init(&todo[todo_end].source, type, name, id);
+	grep_source_init(&todo[todo_end].source, type, name, path, id);
 	if (opt->binary != GREP_BINARY_TEXT)
 		grep_source_load_driver(&todo[todo_end].source);
 	todo[todo_end].done = 0;
@@ -371,7 +371,8 @@ static void *lock_and_read_sha1_file(const unsigned char *sha1, enum object_type
 }
 
 static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
-		     const char *filename, int tree_name_len)
+		     const char *filename, int tree_name_len,
+		     const char *path)
 {
 	struct strbuf pathbuf = STRBUF_INIT;
 
@@ -385,7 +386,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, path, sha1);
 		strbuf_release(&pathbuf);
 		return 0;
 	} else
@@ -394,7 +395,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 		struct grep_source gs;
 		int hit;
 
-		grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, path, sha1);
 		strbuf_release(&pathbuf);
 		hit = grep_source(opt, &gs);
 
@@ -414,7 +415,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename);
+		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		return 0;
 	} else
@@ -423,7 +424,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 		struct grep_source gs;
 		int hit;
 
-		grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename);
+		grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		hit = grep_source(opt, &gs);
 
@@ -479,7 +480,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
 			if (ce_stage(ce))
 				continue;
-			hit |= grep_sha1(opt, ce->sha1, ce->name, 0);
+			hit |= grep_sha1(opt, ce->sha1, ce->name, 0, ce->name);
 		}
 		else
 			hit |= grep_file(opt, ce->name);
@@ -497,7 +498,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 }
 
 static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
-		     struct tree_desc *tree, struct strbuf *base, int tn_len)
+		     struct tree_desc *tree, struct strbuf *base, int tn_len,
+		     int check_attr)
 {
 	int hit = 0;
 	enum interesting match = entry_not_interesting;
@@ -518,7 +520,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		strbuf_add(base, entry.path, te_len);
 
 		if (S_ISREG(entry.mode)) {
-			hit |= grep_sha1(opt, entry.sha1, base->buf, tn_len);
+			hit |= grep_sha1(opt, entry.sha1, base->buf, tn_len,
+					 check_attr ? base->buf + tn_len : NULL);
 		}
 		else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
@@ -533,7 +536,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 
 			strbuf_addch(base, '/');
 			init_tree_desc(&sub, data, size);
-			hit |= grep_tree(opt, pathspec, &sub, base, tn_len);
+			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
+					 check_attr);
 			free(data);
 		}
 		strbuf_setlen(base, old_baselen);
@@ -548,7 +552,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		       struct object *obj, const char *name)
 {
 	if (obj->type == OBJ_BLOB)
-		return grep_sha1(opt, obj->sha1, name, 0);
+		return grep_sha1(opt, obj->sha1, name, 0, NULL);
 	if (obj->type == OBJ_COMMIT || obj->type == OBJ_TREE) {
 		struct tree_desc tree;
 		void *data;
@@ -571,7 +575,8 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 			strbuf_addch(&base, ':');
 		}
 		init_tree_desc(&tree, data, size);
-		hit = grep_tree(opt, pathspec, &tree, &base, base.len);
+		hit = grep_tree(opt, pathspec, &tree, &base, base.len,
+				obj->type == OBJ_COMMIT);
 		strbuf_release(&base);
 		free(data);
 		return hit;
diff --git a/grep.c b/grep.c
index edc7776..e36c01b 100644
--- a/grep.c
+++ b/grep.c
@@ -1373,7 +1373,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 	struct grep_source gs;
 	int r;
 
-	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL);
+	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
 	gs.buf = buf;
 	gs.size = size;
 
@@ -1384,10 +1384,12 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 }
 
 void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const void *identifier)
+		      const char *name, const char *path,
+		      const void *identifier)
 {
 	gs->type = type;
 	gs->name = name ? xstrdup(name) : NULL;
+	gs->path = path ? xstrdup(path) : NULL;
 	gs->buf = NULL;
 	gs->size = 0;
 	gs->driver = NULL;
@@ -1409,6 +1411,8 @@ void grep_source_clear(struct grep_source *gs)
 {
 	free(gs->name);
 	gs->name = NULL;
+	free(gs->path);
+	gs->path = NULL;
 	free(gs->identifier);
 	gs->identifier = NULL;
 	grep_source_clear_data(gs);
@@ -1501,7 +1505,8 @@ void grep_source_load_driver(struct grep_source *gs)
 		return;
 
 	grep_attr_lock();
-	gs->driver = userdiff_find_by_path(gs->name);
+	if (gs->path)
+		gs->driver = userdiff_find_by_path(gs->path);
 	if (!gs->driver)
 		gs->driver = userdiff_find_by_name("default");
 	grep_attr_unlock();
diff --git a/grep.h b/grep.h
index c256ac6..c2cf57b 100644
--- a/grep.h
+++ b/grep.h
@@ -158,11 +158,13 @@ struct grep_source {
 	char *buf;
 	unsigned long size;
 
+	char *path; /* for attribute lookups */
 	struct userdiff_driver *driver;
 };
 
 void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const void *identifier);
+		      const char *name, const char *path,
+		      const void *identifier);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs);
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index fd6410f..26f8319 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -111,6 +111,28 @@ test_expect_success 'grep respects binary diff attribute' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep --cached respects binary diff attribute' '
+	git grep --cached text t >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep --cached respects binary diff attribute (2)' '
+	git add .gitattributes &&
+	rm .gitattributes &&
+	git grep --cached text t >actual &&
+	test_when_finished "git rm --cached .gitattributes" &&
+	test_when_finished "git checkout .gitattributes" &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep revision respects binary diff attribute' '
+	git commit -m new &&
+	echo "Binary file HEAD:t matches" >expect &&
+	git grep text HEAD -- t >actual &&
+	test_when_finished "git reset HEAD^" &&
+	test_cmp expect actual
+'
+
 test_expect_success 'grep respects not-binary diff attribute' '
 	echo binQary | q_to_nul >b &&
 	git add b &&
-- 
1.7.12.1.406.g6ab07c4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v3] grep: stop looking at random places for .gitattributes
  2012-10-12 10:49               ` [PATCH v3] " Nguyễn Thái Ngọc Duy
@ 2012-10-12 16:47                 ` Junio C Hamano
  0 siblings, 0 replies; 43+ messages in thread
From: Junio C Hamano @ 2012-10-12 16:47 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Johannes Sixt

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> grep searches for .gitattributes using "name" field in struct
> grep_source but that field is not real on-disk path name. For example,
> "grep pattern rev" fills the field with "rev:path", and Git looks for
> .gitattributes in the (non-existent but exploitable) path "rev:path"
> instead of "path".
>
> This patch passes real paths down to grep_source_load_driver() when:
>
>  - grep on work tree
>  - grep on the index
>  - grep a commit (or a tag if it points to a commit)
>
> so that these cases look up .gitattributes at proper paths.
> .gitattributes lookup is disabled in all other cases.
>
> Initial-work-by: Jeff King <peff@peff.net>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

Looks sensible and straightforward.  Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-12  7:33                       ` Johannes Sixt
@ 2012-10-14  4:29                         ` Junio C Hamano
  2012-10-15  6:02                           ` Johannes Sixt
  0 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2012-10-14  4:29 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Johannes Sixt <j.sixt@viscovery.net> writes:

> It might be worth it. We already have a similar special case in
> write_or_die.c:maybe_flush_or_die() for Windows, although it is not about
> a colon in a path name.
>
> Perhaps like this.

Hrm, the "we already have" one b2f5e26 (Windows: Work around an
oddity when a pipe with no reader is written to., 2007-08-17) was
what you added while I was looking the other way ;-) as a part of
Windows specific pull.

That change, and this patch, seem to cover the cases to be ignored
with a bit too wide a net to my taste.  On other systems, and even
on Windows if the path does not have any colon, EINVAL is something
we would want to notice and report, as a potential problem, no?

> --- 8< ---
> From: Johannes Sixt <j6t@kdbg.org>
> Subject: [PATCH] attr: do not warn on path with colon on Windows
>
> In the same spirit as commit 8e950dab (attr: failure to open a
> .gitattributes file is OK with ENOTDIR), ignore EINVAL errors. On
> Windows, a path that contains a colon that is not the one after the
> drive letter is not allowed and is reported with errno set to
> EINVAL.
>
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  attr.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/attr.c b/attr.c
> index 8010429..ac945ad 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -354,7 +354,15 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
>  	int lineno = 0;
>  
>  	if (!fp) {
> -		if (errno != ENOENT && errno != ENOTDIR)
> +		/*
> +		 * If path does not exist, it is not worth mentioning,
> +		 * but I/O errors and permission errors are.
> +		 *
> +		 * On Windows, EINVAL is reported if path contains a colon
> +		 * that is not the driver letter separator. Such a path
> +		 * cannot exist in the file system and is also uninteresting.
> +		 */
> +		if (errno != ENOENT && errno != ENOTDIR && errno != EINVAL)
>  			warn_on_inaccessible(path);
>  		return NULL;
>  	}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-14  4:29                         ` Junio C Hamano
@ 2012-10-15  6:02                           ` Johannes Sixt
  2012-10-15 16:54                             ` Junio C Hamano
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-15  6:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Am 10/14/2012 6:29, schrieb Junio C Hamano:
> Johannes Sixt <j.sixt@viscovery.net> writes:
> 
>> It might be worth it. We already have a similar special case in
>> write_or_die.c:maybe_flush_or_die() for Windows, although it is not about
>> a colon in a path name.
>>
>> Perhaps like this.
> 
> Hrm, the "we already have" one b2f5e26 (Windows: Work around an
> oddity when a pipe with no reader is written to., 2007-08-17) was
> what you added while I was looking the other way ;-) as a part of
> Windows specific pull.
> 
> That change, and this patch, seem to cover the cases to be ignored
> with a bit too wide a net to my taste.  On other systems, and even
> on Windows if the path does not have any colon, EINVAL is something
> we would want to noticbbe and report, as a potential problem, no?

For fopen(), EINVAL should occur only if the mode argument is wrong, which
it isn't. For fflush() (as in write_or_die.c), EINVAL is not even listed
as possible error code. Therefore, catching EINVAL wholesale should not be
a problem, IMO, at least not "on other systems".

On Windows, it is more problematic because there is a table of "customary"
Windows API error codes, which are mapped to errno values, and EINVAL is
used for all other Windows error codes (and for a few listed ones), and
ignoring EINVAL might indeed miss something worth to be reported.

Sooo... I don't mind if you do not pick up this patch because it handles a
rather theoretic case, i.e., where a project with strange paths somehow
ended up on a Windows drive.

But reverting the EINVAL check from write_or_die.c is out of question,
because that handles a real case.

-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-15  6:02                           ` Johannes Sixt
@ 2012-10-15 16:54                             ` Junio C Hamano
  2012-10-16  6:39                               ` Johannes Sixt
  0 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2012-10-15 16:54 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Johannes Sixt <j.sixt@viscovery.net> writes:

> For fflush() (as in write_or_die.c), EINVAL is not even listed
> as possible error code. Therefore, catching EINVAL wholesale should not be
> a problem, IMO, at least not "on other systems".

I see a disturbing gap between "EINVAL is not supposed to be even
possible" and "therefore it is safe to ignore".  Our usual attitude
towards catching errors is to ignore only specific things that are
expected to happen and known to be safe, e.g. the original code
before that patch which special cased to ignore EPIPE with

	if (fflush(f)) {
		if (errno == EPIPE)
                	exit(0);
		die_errno("write failure on '%s'", desc);
	}

where we know we are often downstream of something else and
diagnosing EPIPE as an error is actively wrong.

Unless you assume that *no* other platform has the same glitch as
Windows has with respect to fflush(f) returning EINVAL, or any other
platform that may return EINVAL from fflush(f) has the exactly same
glitch as Windows, ignoring EINVAL unconditionally everywhere is
wrong.  Perhaps the next broken platform may return EINVAL when it
should return EIO or something.  Ideally, that earlier workaround
should have done a logica equivalent of:


	if (fflush(f)) {
#ifdef WINDOWS		
		/*
		 * On Windows, EPIPE is returned only by the first write()
		 * after the reading end has closed its handle; subsequent
		 * write()s return EINVAL.
		 */
                if (errno == EINVAL && this is after a second write)
			errno = EPIPE;
#endif
		if (errno == EPIPE)
                	exit(0);
		die_errno("write failure on '%s'", desc);
	}

and did so not in-line at the calling site but in a compat/ wrapper
for fflush() to eliminate the need for the ifdef.

> But reverting the EINVAL check from write_or_die.c is out of question,
> because that handles a real case.

I am not saying we should not deal with EINVAL on Windows at all.  I
am just saying ignoring EINVAL on other platforms is wrong, and
these two shouldn't have been mutually incompatible goals.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-15 16:54                             ` Junio C Hamano
@ 2012-10-16  6:39                               ` Johannes Sixt
  2012-10-17  7:05                                 ` Johannes Sixt
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-16  6:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Am 10/15/2012 18:54, schrieb Junio C Hamano:
> Ideally, that earlier workaround
> should have done a logica equivalent of:
> ...
> and did so not in-line at the calling site but in a compat/ wrapper
> for fflush() to eliminate the need for the ifdef.

Fair enough.

>> But reverting the EINVAL check from write_or_die.c is out of question,
>> because that handles a real case.

Correction: I can't reproduce the error messages that this was working
around anymore in a brief test. I'll revert the check locally and see what
happens.

-- Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-16  6:39                               ` Johannes Sixt
@ 2012-10-17  7:05                                 ` Johannes Sixt
  2012-10-17  7:33                                   ` Junio C Hamano
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Sixt @ 2012-10-17  7:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Am 10/16/2012 8:39, schrieb Johannes Sixt:
> Am 10/15/2012 18:54, schrieb Junio C Hamano:
>> Ideally, that earlier workaround
>> should have done a logica equivalent of:
>> ...
>> and did so not in-line at the calling site but in a compat/ wrapper
>> for fflush() to eliminate the need for the ifdef.
> 
> Fair enough.
> 
>>> But reverting the EINVAL check from write_or_die.c is out of question,
>>> because that handles a real case.
> 
> Correction: I can't reproduce the error messages that this was working
> around anymore in a brief test. I'll revert the check locally and see what
> happens.

The error is reproducible with this command on Windows:

$ git rev-list HEAD | sed 1q
42b333d8bb8709dfc5783dd4c44bdb6012c2c17d
fatal: write failure on 'stdout': Invalid argument

Let's do as you suggested.

--- 8< ---
From: Johannes Sixt <j6t@kdbg.org>
Subject: [PATCH] maybe_flush_or_die: move a too-loose Windows specific error
 check to compat

Commit b2f5e268 (Windows: Work around an oddity when a pipe with no reader
is written to) introduced a check for EINVAL after fflush() to fight
spurious "Invalid argument" errors on Windows when a pipe was broken. But
this check may hide real errors on systems that do not have the this odd
behavior. Introduce an fflush wrapper in compat/mingw.* so that the treatment
is only applied on Windows.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 compat/mingw.c | 22 ++++++++++++++++++++++
 compat/mingw.h |  3 +++
 write_or_die.c |  7 +------
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index afc892d..4e63838 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -335,6 +335,28 @@ FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream)
 	return freopen(filename, otype, stream);
 }
 
+#undef fflush
+int mingw_fflush(FILE *stream)
+{
+	int ret = fflush(stream);
+
+	/*
+	 * write() is used behind the scenes of stdio output functions.
+	 * Since git code does not check for errors after each stdio write
+	 * operation, it can happen that write() is called by a later
+	 * stdio function even if an earlier write() call failed. In the
+	 * case of a pipe whose readable end was closed, only the first
+	 * call to write() reports EPIPE on Windows. Subsequent write()
+	 * calls report EINVAL. It is impossible to notice whether this
+	 * fflush invocation triggered such a case, therefore, we have to
+	 * catch all EINVAL errors whole-sale.
+	 */
+	if (ret && errno == EINVAL)
+		errno = EPIPE;
+
+	return ret;
+}
+
 /*
  * The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
  * Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
diff --git a/compat/mingw.h b/compat/mingw.h
index 61a6521..eeb08d1 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -185,6 +185,9 @@ FILE *mingw_fopen (const char *filename, const char *otype);
 FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream);
 #define freopen mingw_freopen
 
+int mingw_fflush(FILE *stream);
+#define fflush mingw_fflush
+
 char *mingw_getcwd(char *pointer, int len);
 #define getcwd mingw_getcwd
 
diff --git a/write_or_die.c b/write_or_die.c
index d45b536..960f448 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -34,12 +34,7 @@ void maybe_flush_or_die(FILE *f, const char *desc)
 			return;
 	}
 	if (fflush(f)) {
-		/*
-		 * On Windows, EPIPE is returned only by the first write()
-		 * after the reading end has closed its handle; subsequent
-		 * write()s return EINVAL.
-		 */
-		if (errno == EPIPE || errno == EINVAL)
+		if (errno == EPIPE)
 			exit(0);
 		die_errno("write failure on '%s'", desc);
 	}
-- 
1.8.0.rc2.1248.g3f5b13f

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/2] grep: stop looking at random places for .gitattributes
  2012-10-17  7:05                                 ` Johannes Sixt
@ 2012-10-17  7:33                                   ` Junio C Hamano
  0 siblings, 0 replies; 43+ messages in thread
From: Junio C Hamano @ 2012-10-17  7:33 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Nguyễn Thái Ngọc Duy, git, Jeff King

Johannes Sixt <j.sixt@viscovery.net> writes:

> diff --git a/compat/mingw.c b/compat/mingw.c
> index afc892d..4e63838 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -335,6 +335,28 @@ FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream)
>  	return freopen(filename, otype, stream);
>  }
>  
> +#undef fflush
> +int mingw_fflush(FILE *stream)
> +{
> +	int ret = fflush(stream);

The "#undef" above is a bit unfortunate.

Whenever I see this construct I start to wonder "I know this is to
disable our own #define we have elsewhere that renames fflush() to
mingw_fflush(), but what happens if the system include implements
fflush() as a macro?"

A better organization might be

 - make "int mingw_fflush(FILE *);" declaration available to all the
   callers and to this part of the file; and

 - make "#define fflush(x) mingw_fflush(x)" macro visible when
   compiling the rest of the system, but make it invisible to the
   implementation of the emulation function.

The latter implies that a function in the emulation layer, if it
needs to fflush(), would explicitly call mingw_fflush().

I know you did this knowing that it is not an issue on your
platform, and this file is only used on your platform anyway, so I
do not think we should address such a reorganization right now, but
it is something we may want to keep an eye on, as other people may
later try to stub away a real macro imitating this part of the code.

Thanks for following through.

Sometimes discussions on our list result in participant feeling
satisified with the conclusion without completing the last mile of
producing and applying the patch, which I find only after a few
month when I'm trawling the list archive for anything we missed.

Now I'll have to do my part and queue this to my tree ;-)

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2012-10-17  7:33 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-09  9:03 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree Johannes Sixt
2012-10-09  9:38 ` Nguyen Thai Ngoc Duy
2012-10-09 12:01   ` Nguyen Thai Ngoc Duy
2012-10-09 12:41     ` Jeff King
2012-10-09 18:59       ` Junio C Hamano
2012-10-10  5:17         ` Nguyen Thai Ngoc Duy
2012-10-10  5:33           ` Junio C Hamano
2012-10-10  5:45             ` Nguyen Thai Ngoc Duy
2012-10-10 11:34         ` Nguyễn Thái Ngọc Duy
2012-10-10 11:34           ` [PATCH 1/3] quote: let caller reset buffer for quote_path_relative() Nguyễn Thái Ngọc Duy
2012-10-10 21:13             ` Junio C Hamano
2012-10-11 13:04               ` Nguyen Thai Ngoc Duy
2012-10-11 16:42                 ` Junio C Hamano
2012-10-10 11:34           ` [PATCH 2/3] grep: pass true path name to grep machinery Nguyễn Thái Ngọc Duy
2012-10-10 11:34           ` [PATCH 3/3] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
2012-10-10 11:51             ` Johannes Sixt
2012-10-10 12:03               ` Nguyen Thai Ngoc Duy
2012-10-10 12:12                 ` Johannes Sixt
2012-10-10 12:32                   ` Nguyen Thai Ngoc Duy
2012-10-10 12:43                     ` Johannes Sixt
2012-10-10 12:51                       ` Nguyen Thai Ngoc Duy
2012-10-10 19:44                   ` Junio C Hamano
2012-10-11  5:55                     ` Johannes Sixt
2012-10-11  7:04                       ` Michael Haggerty
2012-10-11  8:17                         ` Nguyen Thai Ngoc Duy
2012-10-10 13:59           ` [PATCH v2 0/2] Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree Nguyễn Thái Ngọc Duy
2012-10-10 13:59             ` [PATCH v2 1/2] quote: let caller reset buffer for quote_path_relative() Nguyễn Thái Ngọc Duy
2012-10-10 13:59             ` [PATCH v2 2/2] grep: stop looking at random places for .gitattributes Nguyễn Thái Ngọc Duy
2012-10-10 14:21               ` Johannes Sixt
2012-10-10 19:56                 ` Junio C Hamano
2012-10-11  5:45                   ` Johannes Sixt
2012-10-11 15:51                     ` Junio C Hamano
2012-10-12  7:33                       ` Johannes Sixt
2012-10-14  4:29                         ` Junio C Hamano
2012-10-15  6:02                           ` Johannes Sixt
2012-10-15 16:54                             ` Junio C Hamano
2012-10-16  6:39                               ` Johannes Sixt
2012-10-17  7:05                                 ` Johannes Sixt
2012-10-17  7:33                                   ` Junio C Hamano
2012-10-11  1:49                 ` Nguyen Thai Ngoc Duy
2012-10-11  3:15                   ` Junio C Hamano
2012-10-12 10:49               ` [PATCH v3] " Nguyễn Thái Ngọc Duy
2012-10-12 16:47                 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).