git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* bug: unexpected output for "git st" + suggestion
@ 2010-11-23 12:23 Tarek Ziadé
  2010-11-23 12:40 ` Nguyen Thai Ngoc Duy
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Tarek Ziadé @ 2010-11-23 12:23 UTC (permalink / raw)
  To: git

Hello,

I am new to Git and I tried to run "git st"

I have found one small bug: "status" is not listed in the help screen
Git displays in that case.

$ git st
git: 'st' is not a git command. See 'git --help'.

Did you mean one of these?
	reset
	stage
	stash


I also have a suggestion: I was looking for the way to report that bug
by visiting http://git-scm.com/ and looking for the bug tracker.
Someone eventually explained to me on the IRC channel that I had to
post a mail here. I would suggest making it clear on how to report
bugs on the project's website. Maybe under  "Got a question" /
"Email".

Cheers
Tarek

-- 
Tarek Ziadé | http://ziade.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 12:23 bug: unexpected output for "git st" + suggestion Tarek Ziadé
@ 2010-11-23 12:40 ` Nguyen Thai Ngoc Duy
  2010-11-23 12:49   ` Tarek Ziadé
  2010-11-23 12:43 ` Sylvain Rabot
  2010-11-23 13:22 ` Erik Faye-Lund
  2 siblings, 1 reply; 27+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-11-23 12:40 UTC (permalink / raw)
  To: Tarek Ziadé; +Cc: git

On Tue, Nov 23, 2010 at 7:23 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
> Hello,
>
> I am new to Git and I tried to run "git st"
>
> I have found one small bug: "status" is not listed in the help screen
> Git displays in that case.
>
> $ git st
> git: 'st' is not a git command. See 'git --help'.
>
> Did you mean one of these?
>        reset
>        stage
>        stash

It's heuristics, based on the assumption that you mistype a command by
a letter or two. It gives helpful suggestions most of the time, but
you can't expect it to be always right, especially when "st" is not a
mistyping. "git --help" does show "status" though so I guess it's ok.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 12:23 bug: unexpected output for "git st" + suggestion Tarek Ziadé
  2010-11-23 12:40 ` Nguyen Thai Ngoc Duy
@ 2010-11-23 12:43 ` Sylvain Rabot
  2010-11-23 13:22 ` Erik Faye-Lund
  2 siblings, 0 replies; 27+ messages in thread
From: Sylvain Rabot @ 2010-11-23 12:43 UTC (permalink / raw)
  To: git

Hi,

"st" does not exist in git. You have to create an alias yourself by
doing this :

$ git config --global --add alias.st status

Regards

On Tue, 2010-11-23 at 13:23 +0100, Tarek Ziadé wrote:
> Hello,
> 
> I am new to Git and I tried to run "git st"
> 
> I have found one small bug: "status" is not listed in the help screen
> Git displays in that case.
> 
> $ git st
> git: 'st' is not a git command. See 'git --help'.
> 
> Did you mean one of these?
> 	reset
> 	stage
> 	stash
> 
> 
> I also have a suggestion: I was looking for the way to report that bug
> by visiting http://git-scm.com/ and looking for the bug tracker.
> Someone eventually explained to me on the IRC channel that I had to
> post a mail here. I would suggest making it clear on how to report
> bugs on the project's website. Maybe under  "Got a question" /
> "Email".
> 
> Cheers
> Tarek
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 12:40 ` Nguyen Thai Ngoc Duy
@ 2010-11-23 12:49   ` Tarek Ziadé
  2010-11-23 13:08     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 27+ messages in thread
From: Tarek Ziadé @ 2010-11-23 12:49 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

On Tue, Nov 23, 2010 at 1:40 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Tue, Nov 23, 2010 at 7:23 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
>> Hello,
>>
>> I am new to Git and I tried to run "git st"
>>
>> I have found one small bug: "status" is not listed in the help screen
>> Git displays in that case.
>>
>> $ git st
>> git: 'st' is not a git command. See 'git --help'.
>>
>> Did you mean one of these?
>>        reset
>>        stage
>>        stash
>
> It's heuristics, based on the assumption that you mistype a command by
> a letter or two.
> It gives helpful suggestions most of the time, but
> you can't expect it to be always right, especially when "st" is not a
> mistyping. "git --help" does show "status" though so I guess it's ok.

Yes, I understood this, but given the list of base commands git comes
with, if "st" gives "stage" and "stash", it would find it logical to
give also "status", by listing commands that starts with 'st'

st
stage
stash
status

That's what the tab completion does:

$ git st<tab>
stage    stash    status


Cheers
Tarek

> --
> Duy
>



-- 
Tarek Ziadé | http://ziade.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 12:49   ` Tarek Ziadé
@ 2010-11-23 13:08     ` Nguyen Thai Ngoc Duy
  2010-11-23 13:18       ` Tarek Ziadé
  0 siblings, 1 reply; 27+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-11-23 13:08 UTC (permalink / raw)
  To: Tarek Ziadé; +Cc: git

On Tue, Nov 23, 2010 at 7:49 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
> On Tue, Nov 23, 2010 at 1:40 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>> On Tue, Nov 23, 2010 at 7:23 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
>>> Hello,
>>>
>>> I am new to Git and I tried to run "git st"
>>>
>>> I have found one small bug: "status" is not listed in the help screen
>>> Git displays in that case.
>>>
>>> $ git st
>>> git: 'st' is not a git command. See 'git --help'.
>>>
>>> Did you mean one of these?
>>>        reset
>>>        stage
>>>        stash
>>
>> It's heuristics, based on the assumption that you mistype a command by
>> a letter or two.
>> It gives helpful suggestions most of the time, but
>> you can't expect it to be always right, especially when "st" is not a
>> mistyping. "git --help" does show "status" though so I guess it's ok.
>
> Yes, I understood this, but given the list of base commands git comes
> with, if "st" gives "stage" and "stash", it would find it logical to
> give also "status", by listing commands that starts with 'st'
>
> st
> stage
> stash
> status

There's another command that starts with "st": stripspace (it's a
lowlevel command by the way). It's going to be a lot more if you type
"git m" and expect all commands starting with 'm'. Personally I would
do "git help -a|grep st" in that case. Hmm.. "git apropos" could be a
good idea.

> That's what the tab completion does:
>
> $ git st<tab>
> stage    stash    status

And it does for tab _completion_ (notice stripspace is missing,
git-completion.sh only lists high level commands). The above case is
to help mistyping.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 13:08     ` Nguyen Thai Ngoc Duy
@ 2010-11-23 13:18       ` Tarek Ziadé
  2010-11-23 13:26         ` Nguyen Thai Ngoc Duy
  2010-11-23 20:38         ` Andreas Schwab
  0 siblings, 2 replies; 27+ messages in thread
From: Tarek Ziadé @ 2010-11-23 13:18 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

On Tue, Nov 23, 2010 at 2:08 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
...
>
> There's another command that starts with "st": stripspace (it's a
> lowlevel command by the way). It's going to be a lot more if you type
> "git m" and expect all commands starting with 'm'.

Maybe so yeah. That's how Mercurial does.

$  hg s
hg: command 's' is ambiguous:
    serve showconfig status strip summary


> Personally I would
> do "git help -a|grep st" in that case. Hmm.. "git apropos" could be a
> good idea.

I guess.

>> That's what the tab completion does:
>>
>> $ git st<tab>
>> stage    stash    status
>
> And it does for tab _completion_ (notice stripspace is missing,
> git-completion.sh only lists high level commands). The above case is
> to help mistyping.

Right.  Overall, I guess it's just a cultural thing. Using mainly
Mercurial, "st" means for me "status" so I was surprised not to find
it as a suggestion.

Cheers
Tarek

-- 
Tarek Ziadé | http://ziade.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 12:23 bug: unexpected output for "git st" + suggestion Tarek Ziadé
  2010-11-23 12:40 ` Nguyen Thai Ngoc Duy
  2010-11-23 12:43 ` Sylvain Rabot
@ 2010-11-23 13:22 ` Erik Faye-Lund
  2010-11-23 13:47   ` Tarek Ziadé
  2 siblings, 1 reply; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-23 13:22 UTC (permalink / raw)
  To: Tarek Ziadé; +Cc: git

On Tue, Nov 23, 2010 at 1:23 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
> Hello,
>
> I am new to Git and I tried to run "git st"
>
> I have found one small bug: "status" is not listed in the help screen
> Git displays in that case.
>
> $ git st
> git: 'st' is not a git command. See 'git --help'.
>
> Did you mean one of these?
>        reset
>        stage
>        stash
>

This isn't strictly speaking a bug. Git uses Levenshtein distance
(http://en.wikipedia.org/wiki/Levenshtein_distance) to figure out what
to suggest. If any command has a sLevenshtein distance of less than 6
(given our coefficients), then all commands with that distance is
suggested. But perhaps we should do something different

But perhaps we could do better. We have some commands that are
considered more "important", ie the ones listed when doing "git help"
without "--all". "status" is one of these. Perhaps these commands
should always be included if they are below the Levenshtein distance
threshold or something?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 13:18       ` Tarek Ziadé
@ 2010-11-23 13:26         ` Nguyen Thai Ngoc Duy
  2010-11-23 20:38         ` Andreas Schwab
  1 sibling, 0 replies; 27+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-11-23 13:26 UTC (permalink / raw)
  To: Tarek Ziadé; +Cc: git

On Tue, Nov 23, 2010 at 8:18 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
> Right.  Overall, I guess it's just a cultural thing. Using mainly
> Mercurial, "st" means for me "status" so I was surprised not to find
> it as a suggestion.

Git does not have a standard set of aliases like others,
unfortunately. You may want to set default aliases in ~/.gitconfig
like st, ci, co, di, br.. Also look around for fancy aliases like
git-lol [1]

[1] http://blog.kfish.org/2010/04/git-lola.html
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 13:22 ` Erik Faye-Lund
@ 2010-11-23 13:47   ` Tarek Ziadé
  2010-11-23 13:56     ` Erik Faye-Lund
  0 siblings, 1 reply; 27+ messages in thread
From: Tarek Ziadé @ 2010-11-23 13:47 UTC (permalink / raw)
  To: kusmabite; +Cc: git

On Tue, Nov 23, 2010 at 2:22 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Tue, Nov 23, 2010 at 1:23 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
>> Hello,
>>
>> I am new to Git and I tried to run "git st"
>>
>> I have found one small bug: "status" is not listed in the help screen
>> Git displays in that case.
>>
>> $ git st
>> git: 'st' is not a git command. See 'git --help'.
>>
>> Did you mean one of these?
>>        reset
>>        stage
>>        stash
>>
>
> This isn't strictly speaking a bug. Git uses Levenshtein distance
> (http://en.wikipedia.org/wiki/Levenshtein_distance) to figure out what
> to suggest. If any command has a sLevenshtein distance of less than 6
> (given our coefficients), then all commands with that distance is
> suggested. But perhaps we should do something different
>
> But perhaps we could do better. We have some commands that are
> considered more "important", ie the ones listed when doing "git help"
> without "--all". "status" is one of these. Perhaps these commands
> should always be included if they are below the Levenshtein distance
> threshold or something?
>

Oh, interesting ! Levenshtein is great for typos but highly depends on
the fact that the word I am entering has about the same length as the
command I am looking for.

When I typed "st" I was thinking about an alias/shortcut. So the
question would be: is "st" a common alias in the git community for the
"status" command ?

If the answer is yes, and if there are other common aliases used out
there, I would suggest keeping the Levenshtein distance as it is now,
but complete the list of suggestions by using a "common aliases
mapper."

Cheers
Tarek

-- 
Tarek Ziadé | http://ziade.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 13:47   ` Tarek Ziadé
@ 2010-11-23 13:56     ` Erik Faye-Lund
  2010-11-23 13:58       ` Tarek Ziadé
  0 siblings, 1 reply; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-23 13:56 UTC (permalink / raw)
  To: Tarek Ziadé; +Cc: git

On Tue, Nov 23, 2010 at 2:47 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
> On Tue, Nov 23, 2010 at 2:22 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Tue, Nov 23, 2010 at 1:23 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
>>> Hello,
>>>
>>> I am new to Git and I tried to run "git st"
>>>
>>> I have found one small bug: "status" is not listed in the help screen
>>> Git displays in that case.
>>>
>>> $ git st
>>> git: 'st' is not a git command. See 'git --help'.
>>>
>>> Did you mean one of these?
>>>        reset
>>>        stage
>>>        stash
>>>
>>
>> This isn't strictly speaking a bug. Git uses Levenshtein distance
>> (http://en.wikipedia.org/wiki/Levenshtein_distance) to figure out what
>> to suggest. If any command has a sLevenshtein distance of less than 6
>> (given our coefficients), then all commands with that distance is
>> suggested. But perhaps we should do something different
>>
>> But perhaps we could do better. We have some commands that are
>> considered more "important", ie the ones listed when doing "git help"
>> without "--all". "status" is one of these. Perhaps these commands
>> should always be included if they are below the Levenshtein distance
>> threshold or something?
>>
>
> Oh, interesting ! Levenshtein is great for typos but highly depends on
> the fact that the word I am entering has about the same length as the
> command I am looking for.
>
> When I typed "st" I was thinking about an alias/shortcut. So the
> question would be: is "st" a common alias in the git community for the
> "status" command ?
>
> If the answer is yes, and if there are other common aliases used out
> there, I would suggest keeping the Levenshtein distance as it is now,
> but complete the list of suggestions by using a "common aliases
> mapper."
>

I experimented a bit around, and the last idea I played around with
was to keep the Levenshtein-suggestions as-is, but to add all common
commands that had the entered command as a prefix. That's a bit more
generic than what you suggested, but also not as flexible as it would
have to be a strict prefix.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 13:56     ` Erik Faye-Lund
@ 2010-11-23 13:58       ` Tarek Ziadé
  2010-11-23 19:11         ` [PATCH] help: always suggest common-cmds if prefix of cmd Erik Faye-Lund
  0 siblings, 1 reply; 27+ messages in thread
From: Tarek Ziadé @ 2010-11-23 13:58 UTC (permalink / raw)
  To: kusmabite; +Cc: git

On Tue, Nov 23, 2010 at 2:56 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
..
> I experimented a bit around, and the last idea I played around with
> was to keep the Levenshtein-suggestions as-is, but to add all common
> commands that had the entered command as a prefix. That's a bit more
> generic than what you suggested, but also not as flexible as it would
> have to be a strict prefix.

+1. That would improve it a lot

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] help: always suggest common-cmds if prefix of cmd
  2010-11-23 13:58       ` Tarek Ziadé
@ 2010-11-23 19:11         ` Erik Faye-Lund
  2010-11-24 19:49           ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-23 19:11 UTC (permalink / raw)
  To: git; +Cc: ziade.tarek, Erik Faye-Lund

If someone runs "git st", the command "git status" is not suggested
because it's not one of the closest levenshtein-neighbour.

Reserve the distance of 0 for common commands where the entered command
is a prefixe, as these are often more likely to be what the user meant.

This way, "git status" is the first suggestion, while a list of possible
typos are still suggested as well.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
---

I guess something like this should do the trick. Thoughts?

 Makefile |    2 ++
 help.c   |   23 +++++++++++++++++------
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 1f1ce04..d6ba349 100644
--- a/Makefile
+++ b/Makefile
@@ -1611,6 +1611,8 @@ git$X: git.o $(BUILTIN_OBJS) $(GITLIBS)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ git.o \
 		$(BUILTIN_OBJS) $(ALL_LDFLAGS) $(LIBS)
 
+help.o: common-cmds.h
+
 builtin/help.o: common-cmds.h
 builtin/help.s builtin/help.o: EXTRA_CPPFLAGS = \
 	'-DGIT_HTML_PATH="$(htmldir_SQ)"' \
diff --git a/help.c b/help.c
index 7f4928e..dc76a62 100644
--- a/help.c
+++ b/help.c
@@ -3,6 +3,7 @@
 #include "exec_cmd.h"
 #include "levenshtein.h"
 #include "help.h"
+#include "common-cmds.h"
 
 /* most GUI terminals set COLUMNS (although some don't export it) */
 static int term_columns(void)
@@ -298,7 +299,7 @@ static void add_cmd_list(struct cmdnames *cmds, struct cmdnames *old)
 }
 
 /* An empirically derived magic number */
-#define SIMILAR_ENOUGH(x) ((x) < 6)
+#define SIMILAR_ENOUGH(x) ((x) < 7)
 
 const char *help_unknown_cmd(const char *cmd)
 {
@@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd)
 	uniq(&main_cmds);
 
 	/* This reuses cmdname->len for similarity index */
-	for (i = 0; i < main_cmds.cnt; ++i)
-		main_cmds.names[i]->len =
+	for (i = 0; i < main_cmds.cnt; ++i) {
+		main_cmds.names[i]->len = 1 +
 			levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
+		for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) {
+			if (!strcmp(main_cmds.names[i]->name,
+			    common_cmds[n].name) &&
+			    !prefixcmp(main_cmds.names[i]->name, cmd))
+				main_cmds.names[i]->len = 0;
+		}
+	}
 
 	qsort(main_cmds.names, main_cmds.cnt,
 	      sizeof(*main_cmds.names), levenshtein_compare);
@@ -330,9 +338,12 @@ const char *help_unknown_cmd(const char *cmd)
 	if (!main_cmds.cnt)
 		die ("Uh oh. Your system reports no Git commands at all.");
 
-	best_similarity = main_cmds.names[0]->len;
-	n = 1;
-	while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
+	n = 0;
+	do {
+		best_similarity = main_cmds.names[n++]->len;
+	} while (!best_similarity);
+	n++;
+	while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len)
 		++n;
 	if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
 		const char *assumed = main_cmds.names[0]->name;
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: bug: unexpected output for "git st" + suggestion
  2010-11-23 13:18       ` Tarek Ziadé
  2010-11-23 13:26         ` Nguyen Thai Ngoc Duy
@ 2010-11-23 20:38         ` Andreas Schwab
  1 sibling, 0 replies; 27+ messages in thread
From: Andreas Schwab @ 2010-11-23 20:38 UTC (permalink / raw)
  To: Tarek Ziadé; +Cc: Nguyen Thai Ngoc Duy, git

Tarek Ziadé <ziade.tarek@gmail.com> writes:

> On Tue, Nov 23, 2010 at 2:08 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> ...
>>
>> There's another command that starts with "st": stripspace (it's a
>> lowlevel command by the way). It's going to be a lot more if you type
>> "git m" and expect all commands starting with 'm'.
>
> Maybe so yeah. That's how Mercurial does.
>
> $  hg s
> hg: command 's' is ambiguous:
>     serve showconfig status strip summary

Mercurial allows unique abbreviations.  Git doesn't.  Thus a git command
is never ambiguous.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] help: always suggest common-cmds if prefix of cmd
  2010-11-23 19:11         ` [PATCH] help: always suggest common-cmds if prefix of cmd Erik Faye-Lund
@ 2010-11-24 19:49           ` Junio C Hamano
  2010-11-24 20:20             ` Erik Faye-Lund
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2010-11-24 19:49 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git, ziade.tarek

Erik Faye-Lund <kusmabite@gmail.com> writes:

> @@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd)
>  	uniq(&main_cmds);
>  
>  	/* This reuses cmdname->len for similarity index */
> +	for (i = 0; i < main_cmds.cnt; ++i) {
> +		main_cmds.names[i]->len = 1 +
>  			levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
> +		for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) {
> +			if (!strcmp(main_cmds.names[i]->name,
> +			    common_cmds[n].name) &&
> +			    !prefixcmp(main_cmds.names[i]->name, cmd))
> +				main_cmds.names[i]->len = 0;
> +		}
> +	}

So main_cmds.names[]->len (which is not "len" anymore at this point but is
just a "score") gets levenshtein distance (i.e. a smaller number indicates
cmd is more likely to be a typo of it), and in addition ->len == 0 is "it
is prefix".  Overall, the smaller the score, the likelier the match.

> @@ -330,9 +338,12 @@ const char *help_unknown_cmd(const char *cmd)
>  	if (!main_cmds.cnt)
>  		die ("Uh oh. Your system reports no Git commands at all.");
>  
> -	best_similarity = main_cmds.names[0]->len;
> -	n = 1;
> -	while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
> +	n = 0;
> +	do {
> +		best_similarity = main_cmds.names[n++]->len;
> +	} while (!best_similarity);

At this point, main_cmds.names[] is sorted by the above score (smaller to
larger), and first you skip all the "prefix" ones that score 0.

This relies on the fact that there is at least one entry with non-zero
score, which in practice is true, but without even a comment?  I feel
dirty.

The score of the first non-prefix entry is in best_similarity and that
entry is at main_cmds.names[n-1] at this point.  You haven't checked
main_cmds.names[n] yet...

> +	n++;

... but you increment n to skip that entry without even looking, and then
go on to ...

> +	while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len)
>  		++n;

You skip the entries with the same similarity as the closest typo,
presumably to point n to the first entry that is irrelevant (i.e. 0 thru n
but not including n are candidates).

Your rewrite of the loop makes it very hard to read and spot bugs, I
think.

>  	if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
>  		const char *assumed = main_cmds.names[0]->name;
> -- 
> 1.7.3.2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] help: always suggest common-cmds if prefix of cmd
  2010-11-24 19:49           ` Junio C Hamano
@ 2010-11-24 20:20             ` Erik Faye-Lund
  2010-11-24 23:53               ` Erik Faye-Lund
  2010-11-25  4:49               ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-24 20:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ziade.tarek

On Wed, Nov 24, 2010 at 8:49 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> writes:
>
>> @@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd)
>>       uniq(&main_cmds);
>>
>>       /* This reuses cmdname->len for similarity index */
>> +     for (i = 0; i < main_cmds.cnt; ++i) {
>> +             main_cmds.names[i]->len = 1 +
>>                       levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
>> +             for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) {
>> +                     if (!strcmp(main_cmds.names[i]->name,
>> +                         common_cmds[n].name) &&
>> +                         !prefixcmp(main_cmds.names[i]->name, cmd))
>> +                             main_cmds.names[i]->len = 0;
>> +             }
>> +     }
>
> So main_cmds.names[]->len (which is not "len" anymore at this point but is
> just a "score") gets levenshtein distance (i.e. a smaller number indicates
> cmd is more likely to be a typo of it), and in addition ->len == 0 is "it
> is prefix".  Overall, the smaller the score, the likelier the match.
>

Correct. This was already the case, though. I just reserved len = 0 as
"this is a prefix of the entered command".

>> @@ -330,9 +338,12 @@ const char *help_unknown_cmd(const char *cmd)
>>       if (!main_cmds.cnt)
>>               die ("Uh oh. Your system reports no Git commands at all.");
>>
>> -     best_similarity = main_cmds.names[0]->len;
>> -     n = 1;
>> -     while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
>> +     n = 0;
>> +     do {
>> +             best_similarity = main_cmds.names[n++]->len;
>> +     } while (!best_similarity);
>
> At this point, main_cmds.names[] is sorted by the above score (smaller to
> larger), and first you skip all the "prefix" ones that score 0.
>
> This relies on the fact that there is at least one entry with non-zero
> score, which in practice is true, but without even a comment?  I feel
> dirty.
>

Ah, yes. This needs clearer code badly. I'll see what I can cook up.

> The score of the first non-prefix entry is in best_similarity and that
> entry is at main_cmds.names[n-1] at this point.  You haven't checked
> main_cmds.names[n] yet...
>
>> +     n++;
>
> ... but you increment n to skip that entry without even looking, and then
> go on to ...
>

This is a bug, thanks for spotting it. It was intended to be the same
as "i = 1" in the old version, but I didn't think about it being
increased in the previous loop as well, silly me.

>> +     while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len)
>>               ++n;
>
> You skip the entries with the same similarity as the closest typo,
> presumably to point n to the first entry that is irrelevant (i.e. 0 thru n
> but not including n are candidates).
>
> Your rewrite of the loop makes it very hard to read and spot bugs, I
> think.
>

Indeed. What about this intra-diff? Hopefully it's a bit clearer, as
it's closer to the original, just reusing the same logic for the new
similar loop... Also makes the final diff smaller, which is nice.

diff --git a/help.c b/help.c
index dc76a62..d02a019 100644
--- a/help.c
+++ b/help.c
@@ -339,11 +339,10 @@ const char *help_unknown_cmd(const char *cmd)
 		die ("Uh oh. Your system reports no Git commands at all.");

 	n = 0;
-	do {
-		best_similarity = main_cmds.names[n++]->len;
-	} while (!best_similarity);
-	n++;
-	while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len)
+	while (n < main_cmds.cnt && !main_cmds.names[n]->len)
+		++n;
+	best_similarity = main_cmds.names[n++]->len;
+	while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
 		++n;
 	if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
 		const char *assumed = main_cmds.names[0]->name;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH] help: always suggest common-cmds if prefix of cmd
  2010-11-24 20:20             ` Erik Faye-Lund
@ 2010-11-24 23:53               ` Erik Faye-Lund
  2010-11-25  4:49               ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-24 23:53 UTC (permalink / raw)
  To: git; +Cc: gitster, ziade.tarek, Erik Faye-Lund

If someone runs "git st", the command "git status" is not suggested
because it's not one of the closest levenshtein-neighbour.

Reserve the distance of 0 for common commands where the entered command
is a prefixe, as these are often more likely to be what the user meant.

This way, "git status" is the first suggestion, while a list of possible
typos are still suggested as well.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
---

Here's an updated version, with the issues Junio fixed.

I hope this is a bit clearer.

 Makefile |    2 ++
 help.c   |   26 ++++++++++++++++++++------
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 1f1ce04..d6ba349 100644
--- a/Makefile
+++ b/Makefile
@@ -1611,6 +1611,8 @@ git$X: git.o $(BUILTIN_OBJS) $(GITLIBS)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ git.o \
 		$(BUILTIN_OBJS) $(ALL_LDFLAGS) $(LIBS)
 
+help.o: common-cmds.h
+
 builtin/help.o: common-cmds.h
 builtin/help.s builtin/help.o: EXTRA_CPPFLAGS = \
 	'-DGIT_HTML_PATH="$(htmldir_SQ)"' \
diff --git a/help.c b/help.c
index 7f4928e..dff5d6b 100644
--- a/help.c
+++ b/help.c
@@ -3,6 +3,7 @@
 #include "exec_cmd.h"
 #include "levenshtein.h"
 #include "help.h"
+#include "common-cmds.h"
 
 /* most GUI terminals set COLUMNS (although some don't export it) */
 static int term_columns(void)
@@ -298,7 +299,7 @@ static void add_cmd_list(struct cmdnames *cmds, struct cmdnames *old)
 }
 
 /* An empirically derived magic number */
-#define SIMILAR_ENOUGH(x) ((x) < 6)
+#define SIMILAR_ENOUGH(x) ((x) < 7)
 
 const char *help_unknown_cmd(const char *cmd)
 {
@@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd)
 	uniq(&main_cmds);
 
 	/* This reuses cmdname->len for similarity index */
-	for (i = 0; i < main_cmds.cnt; ++i)
-		main_cmds.names[i]->len =
+	for (i = 0; i < main_cmds.cnt; ++i) {
+		main_cmds.names[i]->len = 1 +
 			levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
+		for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) {
+			if (!strcmp(main_cmds.names[i]->name,
+			    common_cmds[n].name) &&
+			    !prefixcmp(main_cmds.names[i]->name, cmd))
+				main_cmds.names[i]->len = 0;
+		}
+	}
 
 	qsort(main_cmds.names, main_cmds.cnt,
 	      sizeof(*main_cmds.names), levenshtein_compare);
@@ -330,10 +338,16 @@ const char *help_unknown_cmd(const char *cmd)
 	if (!main_cmds.cnt)
 		die ("Uh oh. Your system reports no Git commands at all.");
 
-	best_similarity = main_cmds.names[0]->len;
-	n = 1;
-	while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
+	n = 0;
+	while (n < main_cmds.cnt && !main_cmds.names[n]->len)
 		++n;
+	if (n < main_cmds.cnt) {
+		best_similarity = main_cmds.names[n++]->len;
+		while (n < main_cmds.cnt &&
+		       best_similarity == main_cmds.names[n]->len)
+			++n;
+	} else
+		best_similarity = 0;
 	if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
 		const char *assumed = main_cmds.names[0]->name;
 		main_cmds.names[0] = NULL;
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] help: always suggest common-cmds if prefix of cmd
  2010-11-24 20:20             ` Erik Faye-Lund
  2010-11-24 23:53               ` Erik Faye-Lund
@ 2010-11-25  4:49               ` Junio C Hamano
  2010-11-25 10:39                 ` Erik Faye-Lund
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2010-11-25  4:49 UTC (permalink / raw)
  To: kusmabite; +Cc: Junio C Hamano, git, ziade.tarek

Erik Faye-Lund <kusmabite@gmail.com> writes:

> Indeed. What about this intra-diff? Hopefully it's a bit clearer, as
> it's closer to the original, just reusing the same logic for the new
> similar loop... Also makes the final diff smaller, which is nice.
>
> diff --git a/help.c b/help.c
> index dc76a62..d02a019 100644
> --- a/help.c
> +++ b/help.c
> @@ -339,11 +339,10 @@ const char *help_unknown_cmd(const char *cmd)
>  		die ("Uh oh. Your system reports no Git commands at all.");
>
>  	n = 0;
> -	do {
> -		best_similarity = main_cmds.names[n++]->len;
> -	} while (!best_similarity);
> -	n++;
> -	while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len)
> +	while (n < main_cmds.cnt && !main_cmds.names[n]->len)
> +		++n;
> +	best_similarity = main_cmds.names[n++]->len;
> +	while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
>  		++n;

Perhaps, but it is probably more conventional to write this kind of loop with:

	for (n = 0; ...; n++)
		...

no?

>  	if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
>  		const char *assumed = main_cmds.names[0]->name;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] help: always suggest common-cmds if prefix of cmd
  2010-11-25  4:49               ` Junio C Hamano
@ 2010-11-25 10:39                 ` Erik Faye-Lund
  2010-11-26 16:00                   ` [PATCH v3] " Erik Faye-Lund
  0 siblings, 1 reply; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-25 10:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ziade.tarek

On Thu, Nov 25, 2010 at 5:49 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> writes:
>
>> Indeed. What about this intra-diff? Hopefully it's a bit clearer, as
>> it's closer to the original, just reusing the same logic for the new
>> similar loop... Also makes the final diff smaller, which is nice.
>>
>> diff --git a/help.c b/help.c
>> index dc76a62..d02a019 100644
>> --- a/help.c
>> +++ b/help.c
>> @@ -339,11 +339,10 @@ const char *help_unknown_cmd(const char *cmd)
>>               die ("Uh oh. Your system reports no Git commands at all.");
>>
>>       n = 0;
>> -     do {
>> -             best_similarity = main_cmds.names[n++]->len;
>> -     } while (!best_similarity);
>> -     n++;
>> -     while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len)
>> +     while (n < main_cmds.cnt && !main_cmds.names[n]->len)
>> +             ++n;
>> +     best_similarity = main_cmds.names[n++]->len;
>> +     while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
>>               ++n;
>
> Perhaps, but it is probably more conventional to write this kind of loop with:
>
>        for (n = 0; ...; n++)
>                ...
>
> no?
>
>>       if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
>>               const char *assumed = main_cmds.names[0]->name;
>

Sure. I was just trying to match the existing code. But sure, I can
change that if you prefer. I think it makes the end-result slightly
nicer.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3] help: always suggest common-cmds if prefix of cmd
  2010-11-25 10:39                 ` Erik Faye-Lund
@ 2010-11-26 16:00                   ` Erik Faye-Lund
  2010-11-27  0:18                     ` Junio C Hamano
  2018-11-19 20:35                     ` help.autoCorrect prefix selection considered a bit dangerous Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-26 16:00 UTC (permalink / raw)
  To: git; +Cc: gitster, ziade.tarek

If someone runs "git st", the command "git status" is not suggested
because it's not one of the closest levenshtein-neighbour.

Reserve the distance of 0 for common commands where the entered command
is a prefixe, as these are often more likely to be what the user meant.

This way, "git status" is the first suggestion, while a list of possible
typos are still suggested as well.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
---
 Makefile |    2 ++
 help.c   |   27 ++++++++++++++++++++-------
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 1f1ce04..d6ba349 100644
--- a/Makefile
+++ b/Makefile
@@ -1611,6 +1611,8 @@ git$X: git.o $(BUILTIN_OBJS) $(GITLIBS)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ git.o \
 		$(BUILTIN_OBJS) $(ALL_LDFLAGS) $(LIBS)
 
+help.o: common-cmds.h
+
 builtin/help.o: common-cmds.h
 builtin/help.s builtin/help.o: EXTRA_CPPFLAGS = \
 	'-DGIT_HTML_PATH="$(htmldir_SQ)"' \
diff --git a/help.c b/help.c
index 7f4928e..0d76a82 100644
--- a/help.c
+++ b/help.c
@@ -3,6 +3,7 @@
 #include "exec_cmd.h"
 #include "levenshtein.h"
 #include "help.h"
+#include "common-cmds.h"
 
 /* most GUI terminals set COLUMNS (although some don't export it) */
 static int term_columns(void)
@@ -298,7 +299,7 @@ static void add_cmd_list(struct cmdnames *cmds, struct cmdnames *old)
 }
 
 /* An empirically derived magic number */
-#define SIMILAR_ENOUGH(x) ((x) < 6)
+#define SIMILAR_ENOUGH(x) ((x) < 7)
 
 const char *help_unknown_cmd(const char *cmd)
 {
@@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd)
 	uniq(&main_cmds);
 
 	/* This reuses cmdname->len for similarity index */
-	for (i = 0; i < main_cmds.cnt; ++i)
-		main_cmds.names[i]->len =
+	for (i = 0; i < main_cmds.cnt; ++i) {
+		main_cmds.names[i]->len = 1 +
 			levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
+		for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) {
+			if (!strcmp(main_cmds.names[i]->name,
+			    common_cmds[n].name) &&
+			    !prefixcmp(main_cmds.names[i]->name, cmd))
+				main_cmds.names[i]->len = 0;
+		}
+	}
 
 	qsort(main_cmds.names, main_cmds.cnt,
 	      sizeof(*main_cmds.names), levenshtein_compare);
@@ -330,10 +338,15 @@ const char *help_unknown_cmd(const char *cmd)
 	if (!main_cmds.cnt)
 		die ("Uh oh. Your system reports no Git commands at all.");
 
-	best_similarity = main_cmds.names[0]->len;
-	n = 1;
-	while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
-		++n;
+	for (n = 0; n < main_cmds.cnt && !main_cmds.names[n]->len; ++n)
+		; /* nothing */
+	if (n < main_cmds.cnt) {
+		best_similarity = main_cmds.names[n++]->len;
+		while (n < main_cmds.cnt &&
+		       best_similarity == main_cmds.names[n]->len)
+			++n;
+	} else
+		best_similarity = 0;
 	if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
 		const char *assumed = main_cmds.names[0]->name;
 		main_cmds.names[0] = NULL;
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] help: always suggest common-cmds if prefix of cmd
  2010-11-26 16:00                   ` [PATCH v3] " Erik Faye-Lund
@ 2010-11-27  0:18                     ` Junio C Hamano
  2010-11-29 11:20                       ` Erik Faye-Lund
  2018-11-19 20:35                     ` help.autoCorrect prefix selection considered a bit dangerous Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2010-11-27  0:18 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git, gitster, ziade.tarek

Erik Faye-Lund <kusmabite@gmail.com> writes:

> @@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd)
>  	uniq(&main_cmds);
>  
>  	/* This reuses cmdname->len for similarity index */
> -	for (i = 0; i < main_cmds.cnt; ++i)
> -		main_cmds.names[i]->len =
> +	for (i = 0; i < main_cmds.cnt; ++i) {
> +		main_cmds.names[i]->len = 1 +
>  			levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
> +		for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) {
> +			if (!strcmp(main_cmds.names[i]->name,
> +			    common_cmds[n].name) &&
> +			    !prefixcmp(main_cmds.names[i]->name, cmd))
> +				main_cmds.names[i]->len = 0;
> +		}
> +	}

This is an error codepath so performance would not matter much, but this
is doing it in an unnecessarily slow way, no?  At this point, both arrays
are sorted the same way, so we should be able to walk common_cmds[]
alongside the main_cmds.names[] (see below).

> +	if (n < main_cmds.cnt) {
> +		best_similarity = main_cmds.names[n++]->len;
> +		while (n < main_cmds.cnt &&
> +		       best_similarity == main_cmds.names[n]->len)
> +			++n;
> +	} else
> +		best_similarity = 0;

Think about what does this case _means_... The end user input was so
ambiguous that it prefix matched all the common commands!  Is it really
similar enough?

Note that most of the time main_cmds[] has more than what common_cmds[]
has, and because prefix match is done only against common_cmds[],
"everything is a prefix-match" never happens.  You might want to mark it
as a BUG(), but someday we may change the rules to give 0 to non common
commands with prefix match under some condition, so thinking these rare
corner cases through would defend ourselves from future gotchas.

How about doing it this way instead?  Isn't it more readable?

diff --git a/help.c b/help.c
index 7f4928e..7654f1b 100644
--- a/help.c
+++ b/help.c
@@ -3,6 +3,7 @@
 #include "exec_cmd.h"
 #include "levenshtein.h"
 #include "help.h"
+#include "common-cmds.h"
 
 /* most GUI terminals set COLUMNS (although some don't export it) */
 static int term_columns(void)
@@ -298,7 +299,8 @@ static void add_cmd_list(struct cmdnames *cmds, struct cmdnames *old)
 }
 
 /* An empirically derived magic number */
-#define SIMILAR_ENOUGH(x) ((x) < 6)
+#define SIMILARITY_FLOOR 7
+#define SIMILAR_ENOUGH(x) ((x) < SIMILARITY_FLOOR)
 
 const char *help_unknown_cmd(const char *cmd)
 {
@@ -319,10 +321,28 @@ const char *help_unknown_cmd(const char *cmd)
 	      sizeof(main_cmds.names), cmdname_compare);
 	uniq(&main_cmds);
 
-	/* This reuses cmdname->len for similarity index */
-	for (i = 0; i < main_cmds.cnt; ++i)
+	/* This abuses cmdname->len for levenshtein distance */
+	for (i = 0, n = 0; i < main_cmds.cnt; i++) {
+		int cmp = 0; /* avoid compiler stupidity */
+		const char *candidate = main_cmds.names[i]->name;
+
+		/* Does the candidate appear in common_cmds list? */
+		while (n < ARRAY_SIZE(common_cmds) &&
+		       (cmp = strcmp(common_cmds[n].name, candidate)) < 0)
+			n++;
+		if ((n < ARRAY_SIZE(common_cmds)) && !cmp) {
+			/* Yes, this is one of the common commands */
+			n++; /* use the entry from common_cmds[] */
+			if (!prefixcmp(candidate, cmd)) {
+				/* Give prefix match a very good score */
+				main_cmds.names[i]->len = 0;
+				continue;
+			}
+		}
+
 		main_cmds.names[i]->len =
-			levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
+			levenshtein(cmd, candidate, 0, 2, 1, 4) + 1;
+	}
 
 	qsort(main_cmds.names, main_cmds.cnt,
 	      sizeof(*main_cmds.names), levenshtein_compare);
@@ -330,10 +350,21 @@ const char *help_unknown_cmd(const char *cmd)
 	if (!main_cmds.cnt)
 		die ("Uh oh. Your system reports no Git commands at all.");
 
-	best_similarity = main_cmds.names[0]->len;
-	n = 1;
-	while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
-		++n;
+	/* skip and count prefix matches */
+	for (n = 0; n < main_cmds.cnt && !main_cmds.names[n]->len; n++)
+		; /* still counting */
+
+	if (main_cmds.cnt <= n) {
+		/* prefix matches with everything? that is too ambiguous */
+		best_similarity = SIMILARITY_FLOOR + 1;
+	} else {
+		/* count all the most similar ones */
+		for (best_similarity = main_cmds.names[n++]->len;
+		     (n < main_cmds.cnt &&
+		      best_similarity == main_cmds.names[n]->len);
+		     n++)
+			; /* still counting */
+	}
 	if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
 		const char *assumed = main_cmds.names[0]->name;
 		main_cmds.names[0] = NULL;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] help: always suggest common-cmds if prefix of cmd
  2010-11-27  0:18                     ` Junio C Hamano
@ 2010-11-29 11:20                       ` Erik Faye-Lund
  2010-11-29 16:40                         ` Jonathan Nieder
  2010-11-29 17:44                         ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-29 11:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ziade.tarek

Sorry for the late reply, I've been out sick.

On Sat, Nov 27, 2010 at 1:18 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> writes:
>
>> @@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd)
>>       uniq(&main_cmds);
>>
>>       /* This reuses cmdname->len for similarity index */
>> -     for (i = 0; i < main_cmds.cnt; ++i)
>> -             main_cmds.names[i]->len =
>> +     for (i = 0; i < main_cmds.cnt; ++i) {
>> +             main_cmds.names[i]->len = 1 +
>>                       levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
>> +             for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) {
>> +                     if (!strcmp(main_cmds.names[i]->name,
>> +                         common_cmds[n].name) &&
>> +                         !prefixcmp(main_cmds.names[i]->name, cmd))
>> +                             main_cmds.names[i]->len = 0;
>> +             }
>> +     }
>
> This is an error codepath so performance would not matter much, but this
> is doing it in an unnecessarily slow way, no?  At this point, both arrays
> are sorted the same way, so we should be able to walk common_cmds[]
> alongside the main_cmds.names[] (see below).
>

I like it, thanks!

>> +     if (n < main_cmds.cnt) {
>> +             best_similarity = main_cmds.names[n++]->len;
>> +             while (n < main_cmds.cnt &&
>> +                    best_similarity == main_cmds.names[n]->len)
>> +                     ++n;
>> +     } else
>> +             best_similarity = 0;
>
> Think about what does this case _means_... The end user input was so
> ambiguous that it prefix matched all the common commands!  Is it really
> similar enough?
>
> Note that most of the time main_cmds[] has more than what common_cmds[]
> has, and because prefix match is done only against common_cmds[],
> "everything is a prefix-match" never happens.  You might want to mark it
> as a BUG(), but someday we may change the rules to give 0 to non common
> commands with prefix match under some condition, so thinking these rare
> corner cases through would defend ourselves from future gotchas.
>
> How about doing it this way instead?  Isn't it more readable?
>

Yes, this is better. But:

> diff --git a/help.c b/help.c
> index 7f4928e..7654f1b 100644
> --- a/help.c
> +++ b/help.c
> @@ -3,6 +3,7 @@
>  #include "exec_cmd.h"
>  #include "levenshtein.h"
>  #include "help.h"
> +#include "common-cmds.h"
>
>  /* most GUI terminals set COLUMNS (although some don't export it) */
>  static int term_columns(void)
> @@ -298,7 +299,8 @@ static void add_cmd_list(struct cmdnames *cmds, struct cmdnames *old)
>  }
>
>  /* An empirically derived magic number */
> -#define SIMILAR_ENOUGH(x) ((x) < 6)
> +#define SIMILARITY_FLOOR 7
> +#define SIMILAR_ENOUGH(x) ((x) < SIMILARITY_FLOOR)
>
>  const char *help_unknown_cmd(const char *cmd)
>  {
> @@ -319,10 +321,28 @@ const char *help_unknown_cmd(const char *cmd)
>              sizeof(main_cmds.names), cmdname_compare);
>        uniq(&main_cmds);
>
> -       /* This reuses cmdname->len for similarity index */
> -       for (i = 0; i < main_cmds.cnt; ++i)
> +       /* This abuses cmdname->len for levenshtein distance */
> +       for (i = 0, n = 0; i < main_cmds.cnt; i++) {
> +               int cmp = 0; /* avoid compiler stupidity */
> +               const char *candidate = main_cmds.names[i]->name;
> +
> +               /* Does the candidate appear in common_cmds list? */
> +               while (n < ARRAY_SIZE(common_cmds) &&
> +                      (cmp = strcmp(common_cmds[n].name, candidate)) < 0)
> +                       n++;
> +               if ((n < ARRAY_SIZE(common_cmds)) && !cmp) {
> +                       /* Yes, this is one of the common commands */
> +                       n++; /* use the entry from common_cmds[] */
> +                       if (!prefixcmp(candidate, cmd)) {
> +                               /* Give prefix match a very good score */
> +                               main_cmds.names[i]->len = 0;
> +                               continue;
> +                       }
> +               }
> +
>                main_cmds.names[i]->len =
> -                       levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
> +                       levenshtein(cmd, candidate, 0, 2, 1, 4) + 1;
> +       }
>
>        qsort(main_cmds.names, main_cmds.cnt,
>              sizeof(*main_cmds.names), levenshtein_compare);
> @@ -330,10 +350,21 @@ const char *help_unknown_cmd(const char *cmd)
>        if (!main_cmds.cnt)
>                die ("Uh oh. Your system reports no Git commands at all.");
>
> -       best_similarity = main_cmds.names[0]->len;
> -       n = 1;
> -       while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
> -               ++n;
> +       /* skip and count prefix matches */
> +       for (n = 0; n < main_cmds.cnt && !main_cmds.names[n]->len; n++)
> +               ; /* still counting */
> +
> +       if (main_cmds.cnt <= n) {
> +               /* prefix matches with everything? that is too ambiguous */
> +               best_similarity = SIMILARITY_FLOOR + 1;

For this code-path to trigger we would have to be able to prefix-match
every common command AND every "main command" must be included in
common commands. At the same time. The only possible way to
prefix-match all commands is if they all start with the same letter.
Do you really think this is a situation we could ever end up in? Every
git command being a common-command, starting with the same letter?

This is basically unreachable code. Perhaps it'd be even clearer just to die:

if (main_cmds.cnt <= n)
	die("Prefix-matched everyting, what's going on?");


> +       } else {
> +               /* count all the most similar ones */
> +               for (best_similarity = main_cmds.names[n++]->len;
> +                    (n < main_cmds.cnt &&
> +                     best_similarity == main_cmds.names[n]->len);
> +                    n++)
> +                       ; /* still counting */
> +       }
>        if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
>                const char *assumed = main_cmds.names[0]->name;
>                main_cmds.names[0] = NULL;
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] help: always suggest common-cmds if prefix of cmd
  2010-11-29 11:20                       ` Erik Faye-Lund
@ 2010-11-29 16:40                         ` Jonathan Nieder
  2010-11-29 16:53                           ` Erik Faye-Lund
  2010-11-29 17:44                         ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Jonathan Nieder @ 2010-11-29 16:40 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Junio C Hamano, git, ziade.tarek

Erik Faye-Lund wrote:

> For this code-path to trigger we would have to be able to prefix-match
> every common command AND every "main command" must be included in
> common commands. At the same time. The only possible way to
> prefix-match all commands is if they all start with the same letter.
> Do you really think this is a situation we could ever end up in? Every
> git command being a common-command, starting with the same letter?
> 
> This is basically unreachable code. Perhaps it'd be even clearer just to die:
> 
> if (main_cmds.cnt <= n)
> 	die("Prefix-matched everyting, what's going on?");

(I haven't checked.)  Maybe

	$ git ""

?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] help: always suggest common-cmds if prefix of cmd
  2010-11-29 16:40                         ` Jonathan Nieder
@ 2010-11-29 16:53                           ` Erik Faye-Lund
  0 siblings, 0 replies; 27+ messages in thread
From: Erik Faye-Lund @ 2010-11-29 16:53 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git, ziade.tarek

On Mon, Nov 29, 2010 at 5:40 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Erik Faye-Lund wrote:
>
>> For this code-path to trigger we would have to be able to prefix-match
>> every common command AND every "main command" must be included in
>> common commands. At the same time. The only possible way to
>> prefix-match all commands is if they all start with the same letter.
>> Do you really think this is a situation we could ever end up in? Every
>> git command being a common-command, starting with the same letter?
>>
>> This is basically unreachable code. Perhaps it'd be even clearer just to die:
>>
>> if (main_cmds.cnt <= n)
>>       die("Prefix-matched everyting, what's going on?");
>
> (I haven't checked.)  Maybe
>
>        $ git ""
>
> ?
>

Ah, yes. This does indeed work, both on Linux and Windows, and it does
weaken my point about the unlikeliness of the code ever reaching it.

But I must say that I think the most sane thing to do in this case
would be to just display the normal help-page (like "git" does).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] help: always suggest common-cmds if prefix of cmd
  2010-11-29 11:20                       ` Erik Faye-Lund
  2010-11-29 16:40                         ` Jonathan Nieder
@ 2010-11-29 17:44                         ` Junio C Hamano
  2010-12-01 14:33                           ` Erik Faye-Lund
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2010-11-29 17:44 UTC (permalink / raw)
  To: kusmabite; +Cc: git, ziade.tarek

Erik Faye-Lund <kusmabite@gmail.com> writes:

> For this code-path to trigger we would have to be able to prefix-match
> every common command AND every "main command" must be included in
> common commands. At the same time. The only possible way to
> prefix-match all commands is if they all start with the same letter.
> Do you really think this is a situation we could ever end up in? Every
> git command being a common-command, starting with the same letter?
>
> This is basically unreachable code. Perhaps it'd be even clearer just to die:
>
> if (main_cmds.cnt <= n)
> 	die("Prefix-matched everyting, what's going on?");

Well, the same letter can be an empty string:

	$ git ''

Didn't I already suggest BUG() there?  Also, saying "too ambiguous" would
make the codepath give "... See 'git --help'" message, I think.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] help: always suggest common-cmds if prefix of cmd
  2010-11-29 17:44                         ` Junio C Hamano
@ 2010-12-01 14:33                           ` Erik Faye-Lund
  0 siblings, 0 replies; 27+ messages in thread
From: Erik Faye-Lund @ 2010-12-01 14:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ziade.tarek

On Mon, Nov 29, 2010 at 6:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> writes:
>
>> For this code-path to trigger we would have to be able to prefix-match
>> every common command AND every "main command" must be included in
>> common commands. At the same time. The only possible way to
>> prefix-match all commands is if they all start with the same letter.
>> Do you really think this is a situation we could ever end up in? Every
>> git command being a common-command, starting with the same letter?
>>
>> This is basically unreachable code. Perhaps it'd be even clearer just to die:
>>
>> if (main_cmds.cnt <= n)
>>       die("Prefix-matched everyting, what's going on?");
>
> Well, the same letter can be an empty string:
>
>        $ git ''
>

Indeed, my bad.

> Didn't I already suggest BUG() there?  Also, saying "too ambiguous" would
> make the codepath give "... See 'git --help'" message, I think.
>

Sure. I was considering just taking your version verbatim. But let's
see what happens, perhaps I get inspired ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* help.autoCorrect prefix selection considered a bit dangerous
  2010-11-26 16:00                   ` [PATCH v3] " Erik Faye-Lund
  2010-11-27  0:18                     ` Junio C Hamano
@ 2018-11-19 20:35                     ` Ævar Arnfjörð Bjarmason
  2018-11-20  3:23                       ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-19 20:35 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git, ziade.tarek

Replying to this blast from the past:
https://public-inbox.org/git/1290787239-4508-1-git-send-email-kusmabite@gmail.com/

I apparently like to live dangerously and have help.autoCorrect
enabled. I just had:

    git puss

Auto-corrected to:

    git push

When I meant:

    git pull

(For those wondering how I could have mistyped that, "l" and "s" are
right next to each other on a Dvorak layout).

As seen in the E-Mail from 2010 this intentional, i.e. "pull" is pruned
since the "pu" prefix isn't matched, but "pus" is. This was meant to
correct e.g. "git st" to "git status".

I don't have time to poke at this now, but wonder if:

 1) The correction facility shouldn't at least have a list of "this does
    stuff over the wire" commands and would then use a more conservative
    estimate.

 2) Whether we can do better with typo detection. E.g. add commands like
    "pull" to the list if we have a long enough prefix for them, and if
    the number of characters entered matches the number of characters in
    another command.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: help.autoCorrect prefix selection considered a bit dangerous
  2018-11-19 20:35                     ` help.autoCorrect prefix selection considered a bit dangerous Ævar Arnfjörð Bjarmason
@ 2018-11-20  3:23                       ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2018-11-20  3:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Erik Faye-Lund, git, ziade.tarek

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I don't have time to poke at this now, but wonder if:
>
>  1) The correction facility shouldn't at least have a list of "this does
>     stuff over the wire" commands and would then use a more conservative
>     estimate.

Not limited to 'over the wire' but 'can have consequences that might
cause regret' would be a reasonable list to have.

On a similar topic, it would be a disaster for "git reset --h<RET>"
to complete to "--hard" instead of "--help", for example.  Perhaps
parse-options API also needs to learn a list of possibly regrettable
options.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-11-20  3:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-23 12:23 bug: unexpected output for "git st" + suggestion Tarek Ziadé
2010-11-23 12:40 ` Nguyen Thai Ngoc Duy
2010-11-23 12:49   ` Tarek Ziadé
2010-11-23 13:08     ` Nguyen Thai Ngoc Duy
2010-11-23 13:18       ` Tarek Ziadé
2010-11-23 13:26         ` Nguyen Thai Ngoc Duy
2010-11-23 20:38         ` Andreas Schwab
2010-11-23 12:43 ` Sylvain Rabot
2010-11-23 13:22 ` Erik Faye-Lund
2010-11-23 13:47   ` Tarek Ziadé
2010-11-23 13:56     ` Erik Faye-Lund
2010-11-23 13:58       ` Tarek Ziadé
2010-11-23 19:11         ` [PATCH] help: always suggest common-cmds if prefix of cmd Erik Faye-Lund
2010-11-24 19:49           ` Junio C Hamano
2010-11-24 20:20             ` Erik Faye-Lund
2010-11-24 23:53               ` Erik Faye-Lund
2010-11-25  4:49               ` Junio C Hamano
2010-11-25 10:39                 ` Erik Faye-Lund
2010-11-26 16:00                   ` [PATCH v3] " Erik Faye-Lund
2010-11-27  0:18                     ` Junio C Hamano
2010-11-29 11:20                       ` Erik Faye-Lund
2010-11-29 16:40                         ` Jonathan Nieder
2010-11-29 16:53                           ` Erik Faye-Lund
2010-11-29 17:44                         ` Junio C Hamano
2010-12-01 14:33                           ` Erik Faye-Lund
2018-11-19 20:35                     ` help.autoCorrect prefix selection considered a bit dangerous Ævar Arnfjörð Bjarmason
2018-11-20  3:23                       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).