git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* double occurrence of filenames on command lines
@ 2007-08-29  8:11 martin f krafft
  2007-08-29 19:44 ` [PATCH] Remove duplicate pathspecs from ls-files command line Alex Riesen
  0 siblings, 1 reply; 9+ messages in thread
From: martin f krafft @ 2007-08-29  8:11 UTC (permalink / raw
  To: git discussion list; +Cc: 439992-quiet

[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]

Dear list,

While I can say something like

  git add foo bar foo
  (note the doubled foo)

when using git-add from a script, the following fails:

  $ git commit -m. foo foo
  error: pathspec 'foo' did not match any file(s) known to git.
  Did you forget to 'git add'?

I am bringing this up in the context of
http://bugs.debian.org/439992, where debcommit.pl would duplicate
a file argument under certain conditions. It's since been fixed, but
I wonder whether git-commit could be made more robust in the
presence of duplicate arguments? Or is this behaviour by choice?

PS: please keep 439992-quiet@bugs.debian.org on Cc.

-- 
martin;              (greetings from the heart of the sun.)
  \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
 
an egg has the shortest sex-life of all: if gets laid once; it gets
eaten once. it also has to come in a box with 11 others, and the
only person who will sit on its face is its mother.
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] Remove duplicate pathspecs from ls-files command line
  2007-08-29  8:11 double occurrence of filenames on command lines martin f krafft
@ 2007-08-29 19:44 ` Alex Riesen
  2007-08-29 20:44   ` Junio C Hamano
  2007-08-29 20:57   ` martin f krafft
  0 siblings, 2 replies; 9+ messages in thread
From: Alex Riesen @ 2007-08-29 19:44 UTC (permalink / raw
  To: martin f krafft; +Cc: git discussion list, 439992-quiet, Junio C Hamano

The first entry wins, all the subsequent entries will be discarded.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
---

martin f krafft, Wed, Aug 29, 2007 10:11:22 +0200:
> when using git-add from a script, the following fails:
> 
>   $ git commit -m. foo foo
>   error: pathspec 'foo' did not match any file(s) known to git.
>   Did you forget to 'git add'?
> 
> I am bringing this up in the context of
> http://bugs.debian.org/439992, where debcommit.pl would duplicate
> a file argument under certain conditions. It's since been fixed, but
> I wonder whether git-commit could be made more robust in the
> presence of duplicate arguments? Or is this behaviour by choice?
> 

Don't think so. Looks like accident. The patch below fixes it,
by introducing a costly argument duplication check. Shouldn't
be a problem for a normal use (git-ls-files expects globs, not
pathnames).

 setup.c |   21 +++++++++++++++++----
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/setup.c b/setup.c
index 06004f1..b13b628 100644
--- a/setup.c
+++ b/setup.c
@@ -111,10 +111,19 @@ void verify_non_filename(const char *prefix, const char *arg)
 		die("'%s': %s", arg, strerror(errno));
 }
 
+static const char **has_pathspec(const char **start, const char **end, const char *spec)
+{
+	const char **p;
+	for (p = start; p != end; ++p)
+		if (!strcmp(*p, spec))
+			return p;
+	return NULL;
+}
+
 const char **get_pathspec(const char *prefix, const char **pathspec)
 {
 	const char *entry = *pathspec;
-	const char **p;
+	const char **in, **out;
 	int prefixlen;
 
 	if (!prefix && !entry)
@@ -128,11 +137,15 @@ const char **get_pathspec(const char *prefix, const char **pathspec)
 	}
 
 	/* Otherwise we have to re-write the entries.. */
-	p = pathspec;
+	in = out = pathspec;
 	prefixlen = prefix ? strlen(prefix) : 0;
 	do {
-		*p = prefix_path(prefix, prefixlen, entry);
-	} while ((entry = *++p) != NULL);
+		const char *spec = prefix_path(prefix, prefixlen, entry);
+		if (!has_pathspec(pathspec, out, spec))
+			*out++ = spec;
+	} while ((entry = *++in) != NULL);
+	if (in != out)
+		*out = NULL;
 	return (const char **) pathspec;
 }
 
-- 
1.5.3.rc7.24.g0e57

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Remove duplicate pathspecs from ls-files command line
  2007-08-29 19:44 ` [PATCH] Remove duplicate pathspecs from ls-files command line Alex Riesen
@ 2007-08-29 20:44   ` Junio C Hamano
  2007-08-29 21:04     ` martin f krafft
  2007-08-29 21:15     ` Alex Riesen
  2007-08-29 20:57   ` martin f krafft
  1 sibling, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2007-08-29 20:44 UTC (permalink / raw
  To: Alex Riesen; +Cc: martin f krafft, git discussion list, 439992-quiet

Alex Riesen <raa.lkml@gmail.com> writes:

> The first entry wins, all the subsequent entries will be discarded.
>
> Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
> ---
>
> martin f krafft, Wed, Aug 29, 2007 10:11:22 +0200:
>> when using git-add from a script, the following fails:
>> 
>>   $ git commit -m. foo foo
>>   error: pathspec 'foo' did not match any file(s) known to git.
>>   Did you forget to 'git add'?
>> 
>> I am bringing this up in the context of
>> http://bugs.debian.org/439992, where debcommit.pl would duplicate
>> a file argument under certain conditions. It's since been fixed, but
>> I wonder whether git-commit could be made more robust in the
>> presence of duplicate arguments? Or is this behaviour by choice?
>
> Don't think so. Looks like accident. The patch below fixes it,
> by introducing a costly argument duplication check. Shouldn't
> be a problem for a normal use (git-ls-files expects globs, not
> pathnames).

Thanks both for your attention to the detail.  It was to catch

	git commit Makefiel

and did not mean to warn about listing the same thing twice (it
is still a mistaken usage in the sense that it is unnecessary to
list things twice, not in the sense that it instructs the
command to commit the same file twice).

The patch is not wrong per-se from correctness standpoint, but I
must say that it is a horrible thing to do from both performance
and principle point of view.

That loop is plain old O(n^2) that penalizes everybody.

Please do not penalize sane callers when you try to improve
support of mistaken usage.  Move expensive error recovery in the
error path when possible, and have _only_ mistaken users pay the
price.

Like this perhaps.

---
 builtin-ls-files.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index d36181a..cce17b5 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -511,8 +511,28 @@ int cmd_ls_files(int argc, const char **argv, const char *prefix)
 		 */
 		int num, errors = 0;
 		for (num = 0; pathspec[num]; num++) {
+			int other, found_dup;
+
 			if (ps_matched[num])
 				continue;
+			/*
+			 * The caller might have fed identical pathspec
+			 * twice.  Do not barf on such a mistake.
+			 */
+			for (found_dup = other = 0;
+			     !found_dup && pathspec[other];
+			     other++) {
+				if (other == num || !ps_matched[other])
+					continue;
+				if (!strcmp(pathspec[other], pathspec[num]))
+					/*
+					 * Ok, we have a match already.
+					 */
+					found_dup = 1;
+			}
+			if (found_dup)
+				continue;
+
 			error("pathspec '%s' did not match any file(s) known to git.",
 			      pathspec[num] + prefix_offset);
 			errors++;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Remove duplicate pathspecs from ls-files command line
  2007-08-29 19:44 ` [PATCH] Remove duplicate pathspecs from ls-files command line Alex Riesen
  2007-08-29 20:44   ` Junio C Hamano
@ 2007-08-29 20:57   ` martin f krafft
  1 sibling, 0 replies; 9+ messages in thread
From: martin f krafft @ 2007-08-29 20:57 UTC (permalink / raw
  To: git discussion list; +Cc: Alex Riesen, 439992-quiet, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

also sprach Alex Riesen <raa.lkml@gmail.com> [2007.08.29.2144 +0200]:
> Signed-off-by: Alex Riesen <raa.lkml@gmail.com>

Signed-off-by: martin f krafft <madduck@madduck.net>

I confirm that Alex' patch does what it should:

$ git init
Initialized empty Git repository in .git/
$ date > a; git add a; git commit -m.
Created initial commit afb6bca: .
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 a
$ date >> a
$ git commit -m. a a
Created commit f41d10a: .
 1 files changed, 1 insertions(+), 0 deletions(-)

-- 
 .''`.   martin f. krafft <madduck@debian.org>
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems
 
"although occasionally there is something to be said for solitude."
                                          -- special agent dale cooper

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Remove duplicate pathspecs from ls-files command line
  2007-08-29 20:44   ` Junio C Hamano
@ 2007-08-29 21:04     ` martin f krafft
  2007-08-29 21:15     ` Alex Riesen
  1 sibling, 0 replies; 9+ messages in thread
From: martin f krafft @ 2007-08-29 21:04 UTC (permalink / raw
  To: git discussion list; +Cc: Junio C Hamano, Alex Riesen, 439992-quiet

[-- Attachment #1: Type: text/plain, Size: 503 bytes --]

also sprach Junio C Hamano <gitster@pobox.com> [2007.08.29.2244 +0200]:
> Like this perhaps.

This also works as expected. Thanks!

-- 
martin;              (greetings from the heart of the sun.)
  \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
 
"cs class at 8:30am. ugly. if you can wake up early enough to get
 good grades here, you need to develop hacker habits..."
                                     -- jeff bailey on #debian-devel
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Remove duplicate pathspecs from ls-files command line
  2007-08-29 20:44   ` Junio C Hamano
  2007-08-29 21:04     ` martin f krafft
@ 2007-08-29 21:15     ` Alex Riesen
  2007-08-29 21:36       ` David Kastrup
  1 sibling, 1 reply; 9+ messages in thread
From: Alex Riesen @ 2007-08-29 21:15 UTC (permalink / raw
  To: Junio C Hamano; +Cc: martin f krafft, git discussion list, 439992-quiet

Junio C Hamano, Wed, Aug 29, 2007 22:44:22 +0200:
> 
> That loop is plain old O(n^2) that penalizes everybody.
> 

Maybe something in pathspec matching code could be reused to notice
the duplications? It has to go through all of them anyway...

> Please do not penalize sane callers when you try to improve
> support of mistaken usage.  Move expensive error recovery in the
> error path when possible, and have _only_ mistaken users pay the
> price.
> 
> Like this perhaps.
> 

I just would write it shorter (except for that ugly label before
closing brace).

diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index d36181a..258868e 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -511,11 +511,28 @@ int cmd_ls_files(int argc, const char **argv, const char *prefix)
 		 */
 		int num, errors = 0;
 		for (num = 0; pathspec[num]; num++) {
+			int other;
+
 			if (ps_matched[num])
 				continue;
+			/*
+			 * The caller might have fed identical pathspec
+			 * twice.  Do not barf on such a mistake.
+			 */
+			for (other = 0; pathspec[other]; other++) {
+				if (other == num || !ps_matched[other])
+					continue;
+				if (!strcmp(pathspec[other], pathspec[num]))
+					/*
+					 * Ok, we have a match already.
+					 */
+					goto found_dup;
+			}
+
 			error("pathspec '%s' did not match any file(s) known to git.",
 			      pathspec[num] + prefix_offset);
 			errors++;
+		found_dup:;
 		}
 
 		if (errors)
-- 
1.5.3.rc7.26.g5f7e4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Remove duplicate pathspecs from ls-files command line
  2007-08-29 21:15     ` Alex Riesen
@ 2007-08-29 21:36       ` David Kastrup
  2007-08-30  1:25         ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: David Kastrup @ 2007-08-29 21:36 UTC (permalink / raw
  To: Alex Riesen
  Cc: Junio C Hamano, martin f krafft, git discussion list,
	439992-quiet

Alex Riesen <raa.lkml@gmail.com> writes:

> Junio C Hamano, Wed, Aug 29, 2007 22:44:22 +0200:
>> 
>> That loop is plain old O(n^2) that penalizes everybody.
>> 
>
> Maybe something in pathspec matching code could be reused to notice
> the duplications? It has to go through all of them anyway...

Does anything speak against sorting the pathspecs?  That is an O(n log
n) operation, and dup detection is trivial afterwards.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Remove duplicate pathspecs from ls-files command line
  2007-08-29 21:36       ` David Kastrup
@ 2007-08-30  1:25         ` Junio C Hamano
  2007-08-30  5:52           ` David Kastrup
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2007-08-30  1:25 UTC (permalink / raw
  To: David Kastrup
  Cc: Alex Riesen, martin f krafft, git discussion list, 439992-quiet

David Kastrup <dak@gnu.org> writes:

> Does anything speak against sorting the pathspecs?  That is an O(n log
> n) operation,

Not sorting is O(0) operation without losing cycles for the
normal case.  I think you can sort first in that error handling
path to avoid O(n^2) but sorting upfront to remove duplicates
for every case is unnecessary bloat that penalizes sane callers.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Remove duplicate pathspecs from ls-files command line
  2007-08-30  1:25         ` Junio C Hamano
@ 2007-08-30  5:52           ` David Kastrup
  0 siblings, 0 replies; 9+ messages in thread
From: David Kastrup @ 2007-08-30  5:52 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Alex Riesen, martin f krafft, git discussion list, 439992-quiet

Junio C Hamano <gitster@pobox.com> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> Does anything speak against sorting the pathspecs?  That is an O(n log
>> n) operation,
>
> Not sorting is O(0) operation without losing cycles for the
> normal case.  I think you can sort first in that error handling
> path to avoid O(n^2) but sorting upfront to remove duplicates
> for every case is unnecessary bloat that penalizes sane callers.

Not if you need to sort the file list anyway in order to merge it with
the index.  Of course, sorting patterns will not in all cases
establish the order of their expansions.

In any case, uniqueness should be establishable when matching to the
index, regardless of whether this matching is done by sort&merge
(probably the most efficient variant) or by binary search as it is
done now if I understand correctly.  I am just not happy with the
ensuing binary _insertion_ as this is an O(nm) operation when
inserting n elements into an m element data structure.  A list merge
is O(n+m), in contrast, and we can access the index sequentially.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-08-30  5:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-29  8:11 double occurrence of filenames on command lines martin f krafft
2007-08-29 19:44 ` [PATCH] Remove duplicate pathspecs from ls-files command line Alex Riesen
2007-08-29 20:44   ` Junio C Hamano
2007-08-29 21:04     ` martin f krafft
2007-08-29 21:15     ` Alex Riesen
2007-08-29 21:36       ` David Kastrup
2007-08-30  1:25         ` Junio C Hamano
2007-08-30  5:52           ` David Kastrup
2007-08-29 20:57   ` martin f krafft

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).