git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Clemens Buchacher" <drizzd@gmx.net>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Manlio Perillo" <manlio.perillo@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH 10/11] completion: improve handling quoted paths in 'git ls-files's output
Date: Tue, 17 Apr 2018 00:42:35 +0200	[thread overview]
Message-ID: <20180416224236.17078-1-szeder.dev@gmail.com> (raw)
In-Reply-To: <20180416224113.16993-1-szeder.dev@gmail.com>

If any pathname contains backslash, double quote, tab, newline, or any
control characters, 'git ls-files' and 'git diff-index' will enclose
that pathname in double quotes and escape those special characters
using C-style one-character escape sequences or \nnn octal values.
This prevents those files from being listed during git-aware path
completion, because due to the quoting they will never match the
current word to be completed.

Extend __git_index_files()'s 'awk' script to remove all that quoting
and escaping from unique path components, so even paths containing
(almost all) such special characters can be completed.

Paths containing newline characters are still an issue, though.  We
use newlines as separator character when filling the COMPREPLY array,
so a path with one or more newline will end up split to two or more
elements in COMPREPLY, basically breaking completion.  There is
nothing we can do about it without a significant performance hit, so
let's just ignore such paths for now.  As far as paths with newlines
are concerned, this isn't any different from the previous behavior,
because those paths were always omitted, though in the past they were
omitted because due to the quoting they didn't match the current word
to be completed.  Anyway, Bash's own filename completion (Meta-/) can
complete even those paths, if need be.

Note:

  - We don't dequote path components right away as they are coming in,
    because then we would have to dequote each directory name
    repeatedly, as many times as it appears in the input, i.e. as many
    times as the number of listed paths it contains.  Instead, we
    dequote them at the end, as we print unique path components.

  - Even when a directory name itself does not contain any special
    characters, it will still be quoted if any of its trailing path
    components do.  If a directory contains paths both with and
    without special characters, then the name of that directory will
    appear both quoted and unquoted in the output of 'git ls-files'
    and 'git diff-index'.  Consequently, we will add such a directory
    name to the deduplicating associative array twice: once quoted and
    once unquoted.

    This means that we have to be careful after dequoting a directory
    name, and only print it if we haven't seen the same directory name
    unquoted.

  - It would be wonderful if we could just pass '-z' to those git
    commands to output \0-separated unquoted paths, and use \0 as
    record separator in the 'awk' script processing their output...
    this patch would be so much simpler, almost trivial even.
    Unfortunately, however, POSIX and most 'awk' implementations don't
    support \0 as record separator (GNU awk does support it).

  - This patch makes the earlier change to list paths with
    'core.quotePath=false' basically redundant, because this could
    decode any \nnn-escaped non-ASCII character just fine, as well.
    However, I suspect that 'git ls-files' can deal with those
    non-ASCII characters faster than this updated 'awk' script; just
    in case someone is burdened with tons of pathnames containing
    non-ASCII characters.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 contrib/completion/git-completion.bash | 66 +++++++++++++++++++++++++-
 t/t9902-completion.sh                  | 17 ++++++-
 2 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 70bc75dfc7..e97d57024b 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -459,8 +459,70 @@ __git_index_files ()
 		paths[$1] = 1
 	}
 	END {
-		for (p in paths)
-			print p
+		for (p in paths) {
+			if (substr(p, 1, 1) != "\"") {
+				# No special characters, easy!
+				print p
+				continue
+			}
+
+			# The path is quoted.
+			p = dequote(p)
+			if (p == "")
+				continue
+
+			# Even when a directory name itself does not contain
+			# any special characters, it will still be quoted if
+			# any of its (stripped) trailing path components do.
+			# Because of this we may have seen the same direcory
+			# both quoted and unquoted.
+			if (p in paths)
+				# We have seen the same directory unquoted,
+				# skip it.
+				continue
+			else
+				print p
+		}
+	}
+	function dequote(p,    bs_idx, out, esc, esc_idx, dec) {
+		# Skip opening double quote.
+		p = substr(p, 2)
+
+		# Interpret backslash escape sequences.
+		while ((bs_idx = index(p, "\\")) != 0) {
+			out = out substr(p, 1, bs_idx - 1)
+			esc = substr(p, bs_idx + 1, 1)
+			p = substr(p, bs_idx + 2)
+
+			if ((esc_idx = index("abtvfr\"\\", esc)) != 0) {
+				# C-style one-character escape sequence.
+				out = out substr("\a\b\t\v\f\r\"\\",
+						 esc_idx, 1)
+			} else if (esc == "n") {
+				# Uh-oh, a newline character.
+				# We cant reliably put a pathname
+				# containing a newline into COMPREPLY,
+				# and the newline would create a mess.
+				# Skip this path.
+				return ""
+			} else {
+				# Must be a \nnn octal value, then.
+				dec = esc             * 64 + \
+				      substr(p, 1, 1) * 8  + \
+				      substr(p, 2, 1)
+				out = out sprintf("%c", dec)
+				p = substr(p, 3)
+			}
+		}
+		# Drop closing double quote, if there is one.
+		# (There isnt any if this is a directory, as it was
+		# already stripped with the trailing path components.)
+		if (substr(p, length(p), 1) == "\"")
+			out = out substr(p, 1, length(p) - 1)
+		else
+			out = out p
+
+		return out
 	}'
 }
 
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index a747998d6c..b9879576ab 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -1527,7 +1527,7 @@ else
 	say "Your filesystem does not allow \\ and \" in filenames."
 	rm -rf 'New\Dir'
 fi
-test_expect_failure FUNNYNAMES_BS_DQ \
+test_expect_success FUNNYNAMES_BS_DQ \
     'complete files - C-style escapes in ls-files output' '
 	test_when_finished "rm -r \"New\\\\Dir\"" &&
 
@@ -1548,7 +1548,7 @@ else
 	say 'Your filesystem does not allow special separator characters (FS, GS, RS, US) in filenames.'
 	rm -rf $'New\034Special\035Dir'
 fi
-test_expect_failure FUNNYNAMES_SEPARATORS \
+test_expect_success FUNNYNAMES_SEPARATORS \
     'complete files - \nnn-escaped control characters in ls-files output' '
 	test_when_finished "rm -r '$'New\034Special\035Dir''" &&
 
@@ -1562,6 +1562,19 @@ test_expect_failure FUNNYNAMES_SEPARATORS \
 			"'$'New\034Special\035Dir/New\036Special\037File''"
 '
 
+test_expect_success FUNNYNAMES_BS_DQ \
+    'complete files - removing repeated quoted path components' '
+	test_when_finished rm -rf NewDir &&
+	mkdir NewDir &&    # A dirname which in itself would not be quoted ...
+	>NewDir/0-file &&
+	>NewDir/1\"file && # ... but here the file makes the dirname quoted ...
+	>NewDir/2-file &&
+	>NewDir/3\"file && # ... and here, too.
+
+	# Still, we should only list it once.
+	test_completion "git test-path-comp New" "NewDir"
+'
+
 
 test_expect_success "completion uses <cmd> completion for alias: !sh -c 'git <cmd> ...'" '
 	test_config alias.co "!sh -c '"'"'git checkout ...'"'"'" &&
-- 
2.17.0.366.gbe216a3084


  parent reply	other threads:[~2018-04-16 22:42 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-17  8:17 [PATCH 1/2] completion: improve ls-files filter performance Clemens Buchacher
2018-03-17  8:17 ` [PATCH 2/2] completion: simplify ls-files filter Clemens Buchacher
2018-03-18  0:16   ` Junio C Hamano
2018-03-18  1:26   ` SZEDER Gábor
2018-03-18  5:28     ` Junio C Hamano
2018-04-04  7:46     ` [PATCH] completion: improve ls-files filter performance Clemens Buchacher
2018-04-04 16:16       ` Johannes Schindelin
2018-04-16 22:41     ` [PATCH 00/11] completion: path completion improvements: speedup and quoted paths SZEDER Gábor
2018-04-16 22:41       ` [PATCH 01/11] t9902-completion: add tests demonstrating issues with quoted pathnames SZEDER Gábor
2018-04-17  3:48         ` Junio C Hamano
2018-04-17 23:32           ` SZEDER Gábor
2018-04-17 23:41             ` SZEDER Gábor
2018-04-18  1:22             ` Junio C Hamano
2018-04-26  0:25               ` SZEDER Gábor
2018-04-26  2:11                 ` Junio C Hamano
2018-05-18 14:17                   ` [PATCH 0/2] Test improvements for 'sg/complete-paths' SZEDER Gábor
2018-05-18 14:17                     ` [PATCH 1/2] completion: don't return with error from __gitcomp_file_direct() SZEDER Gábor
2018-05-18 14:17                     ` [PATCH 2/2] t9902-completion: exercise __git_complete_index_file() directly SZEDER Gábor
2018-05-18 19:25                       ` Eric Sunshine
2018-05-21 12:14                       ` Johannes Schindelin
2018-05-21 11:35                     ` [PATCH 0/2] Test improvements for 'sg/complete-paths' Johannes Schindelin
2018-05-21 12:17                       ` Johannes Schindelin
2018-04-18 12:31         ` [PATCH 01/11] t9902-completion: add tests demonstrating issues with quoted pathnames Johannes Schindelin
2018-04-19 19:08           ` SZEDER Gábor
2018-04-16 22:41       ` [PATCH 02/11] completion: move __git_complete_index_file() next to its helpers SZEDER Gábor
2018-04-16 22:41       ` [PATCH 03/11] completion: simplify prefix path component handling during path completion SZEDER Gábor
2018-04-16 22:41       ` [PATCH 04/11] completion: support completing non-ASCII pathnames SZEDER Gábor
2018-04-16 22:41       ` [PATCH 05/11] completion: improve handling quoted paths on the command line SZEDER Gábor
2018-04-16 22:41       ` [PATCH 06/11] completion: let 'ls-files' and 'diff-index' filter matching paths SZEDER Gábor
2018-04-16 22:41       ` [PATCH 07/11] completion: use 'awk' to strip trailing path components SZEDER Gábor
2018-04-16 22:41       ` [PATCH 08/11] t9902-completion: ignore COMPREPLY element order in some tests SZEDER Gábor
2018-04-16 22:41       ` [PATCH 09/11] completion: remove repeated dirnames with 'awk' during path completion SZEDER Gábor
2018-04-16 22:42       ` SZEDER Gábor [this message]
2018-04-16 22:42         ` [PATCH 11/11] completion: fill COMPREPLY directly when completing paths SZEDER Gábor
2018-03-18  0:13 ` [PATCH 1/2] completion: improve ls-files filter performance Junio C Hamano
2018-03-19 17:12 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180416224236.17078-1-szeder.dev@gmail.com \
    --to=szeder.dev@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=drizzd@gmx.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=manlio.perillo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).