git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Clemens Buchacher" <drizzd@gmx.net>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Manlio Perillo" <manlio.perillo@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH 09/11] completion: remove repeated dirnames with 'awk' during path completion
Date: Tue, 17 Apr 2018 00:41:13 +0200	[thread overview]
Message-ID: <20180416224113.16993-10-szeder.dev@gmail.com> (raw)
In-Reply-To: <20180416224113.16993-1-szeder.dev@gmail.com>

During git-aware path completion, after all the trailing path
components have been removed from the output of 'git ls-files' and
'git diff-index' (see previous patch), each directory name is repeated
as many times as the number of listed paths it contains.  This can be
a lot of repetitions, especially when invoking path completion close
to the root of a big worktree, which would cause a considerable
overhead downstream of __git_index_files(), in particular in the shell
loop that fills the COMPREPLY array.  To reduce this overhead,
__git_index_files() runs the classic '... |sort |uniq' pattern to
remove those repetitions from the function's output.

While removing repeated directory names is effective in reducing the
number of iterations in that shell loop, it still imposes the overhead
of fork()+exec()ing two external processes, and two additional stages
in the pipeline, where potentially relatively large amount of data can
be passed between two subsequent pipeline stages.

Extend __git_index_files()'s 'awk' script to remove repeated path
components by first creating and filling an associative array indexed
by all encountered path components (after the trailing path components
have been removed), and then iterating over this array and printing
the indices, i.e. unique path components.  This way we can remove the
'|sort |uniq' pipeline stages, and their eliminated overhead results
in faster path completion.

Listing all tracked files (12) and directories (23) at the top of the
worktree in linux.git (over 62k files), i.e. what's doing all the hard
work behind 'git rm <TAB>':

  Before this patch, best of five, using GNU awk on Linux:

    real    0m0.069s
    user    0m0.089s
    sys     0m0.026s

  After:

    real    0m0.052s
    user    0m0.072s
    sys     0m0.014s

  Difference: -24.6%

Note that this changes order of elements in __git_index_files()'s
output.  This is not an issue, because this function was only ever
intended to feed paths into the COMPREPLY array, and Bash will sort
its elements (according to the users locale) anyway.

Note also that using 'awk' to remove repeated path components is also
beneficial for the performance of the next two patches:

  - The first will extend this 'awk' script to dequote quoted paths in
    the output of 'git ls-files' and 'git diff-index'.  With this
    patch it will only have to dequote unique path components, not
    all.

  - The second will, among other things, extend this 'awk' script to
    prepend prefix path components from the command line to the
    currently completed path component.  Consequently, each line in
    'awk's output will grow longer.  Without this patch that '|sort
    |uniq' would have to exchange and process that much more data.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 contrib/completion/git-completion.bash | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 0abba88462..70bc75dfc7 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -456,8 +456,12 @@ __git_index_files ()
 
 	__git_ls_files_helper "$root" "$1" "$match" |
 	awk -F / '{
-		print $1
-	}' | sort | uniq
+		paths[$1] = 1
+	}
+	END {
+		for (p in paths)
+			print p
+	}'
 }
 
 # __git_complete_index_file requires 1 argument:
-- 
2.17.0.366.gbe216a3084


  parent reply	other threads:[~2018-04-16 22:41 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-17  8:17 [PATCH 1/2] completion: improve ls-files filter performance Clemens Buchacher
2018-03-17  8:17 ` [PATCH 2/2] completion: simplify ls-files filter Clemens Buchacher
2018-03-18  0:16   ` Junio C Hamano
2018-03-18  1:26   ` SZEDER Gábor
2018-03-18  5:28     ` Junio C Hamano
2018-04-04  7:46     ` [PATCH] completion: improve ls-files filter performance Clemens Buchacher
2018-04-04 16:16       ` Johannes Schindelin
2018-04-16 22:41     ` [PATCH 00/11] completion: path completion improvements: speedup and quoted paths SZEDER Gábor
2018-04-16 22:41       ` [PATCH 01/11] t9902-completion: add tests demonstrating issues with quoted pathnames SZEDER Gábor
2018-04-17  3:48         ` Junio C Hamano
2018-04-17 23:32           ` SZEDER Gábor
2018-04-17 23:41             ` SZEDER Gábor
2018-04-18  1:22             ` Junio C Hamano
2018-04-26  0:25               ` SZEDER Gábor
2018-04-26  2:11                 ` Junio C Hamano
2018-05-18 14:17                   ` [PATCH 0/2] Test improvements for 'sg/complete-paths' SZEDER Gábor
2018-05-18 14:17                     ` [PATCH 1/2] completion: don't return with error from __gitcomp_file_direct() SZEDER Gábor
2018-05-18 14:17                     ` [PATCH 2/2] t9902-completion: exercise __git_complete_index_file() directly SZEDER Gábor
2018-05-18 19:25                       ` Eric Sunshine
2018-05-21 12:14                       ` Johannes Schindelin
2018-05-21 11:35                     ` [PATCH 0/2] Test improvements for 'sg/complete-paths' Johannes Schindelin
2018-05-21 12:17                       ` Johannes Schindelin
2018-04-18 12:31         ` [PATCH 01/11] t9902-completion: add tests demonstrating issues with quoted pathnames Johannes Schindelin
2018-04-19 19:08           ` SZEDER Gábor
2018-04-16 22:41       ` [PATCH 02/11] completion: move __git_complete_index_file() next to its helpers SZEDER Gábor
2018-04-16 22:41       ` [PATCH 03/11] completion: simplify prefix path component handling during path completion SZEDER Gábor
2018-04-16 22:41       ` [PATCH 04/11] completion: support completing non-ASCII pathnames SZEDER Gábor
2018-04-16 22:41       ` [PATCH 05/11] completion: improve handling quoted paths on the command line SZEDER Gábor
2018-04-16 22:41       ` [PATCH 06/11] completion: let 'ls-files' and 'diff-index' filter matching paths SZEDER Gábor
2018-04-16 22:41       ` [PATCH 07/11] completion: use 'awk' to strip trailing path components SZEDER Gábor
2018-04-16 22:41       ` [PATCH 08/11] t9902-completion: ignore COMPREPLY element order in some tests SZEDER Gábor
2018-04-16 22:41       ` SZEDER Gábor [this message]
2018-04-16 22:42       ` [PATCH 10/11] completion: improve handling quoted paths in 'git ls-files's output SZEDER Gábor
2018-04-16 22:42         ` [PATCH 11/11] completion: fill COMPREPLY directly when completing paths SZEDER Gábor
2018-03-18  0:13 ` [PATCH 1/2] completion: improve ls-files filter performance Junio C Hamano
2018-03-19 17:12 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180416224113.16993-10-szeder.dev@gmail.com \
    --to=szeder.dev@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=drizzd@gmx.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=manlio.perillo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).