* Index files autocompletion too slow in big repositories (w / suggestion for improvement)
@ 2017-04-14 20:06 Carlos Pita
2017-04-14 22:08 ` Carlos Pita
0 siblings, 1 reply; 10+ messages in thread
From: Carlos Pita @ 2017-04-14 20:06 UTC (permalink / raw)
To: git
Hi all,
I'm currently using git annex to manage my entire file collection
(including tons of music and books) and I noticed how slow
autocompletion has become for files in the index (say for git add).
The main offender is a while-read-case-echo bash loop in
__git_index_files that can be readily substituted with a much faster
sed invocation, although I guess you didn't want the sed dependency in
the first place. Anyway, here is my benchmark:
__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | while read -r file; do
case "$file" in
?*/*)
echo "${file%%/*}"
;;
*)
echo "$file"
;;
esac;
done | sort | uniq;
fi
}
time __git_index_files > /dev/null
__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | \
sed -r 's@^"?([^/]+)/.*$@\1@' | sort | uniq
fi
}
time __git_index_files > /dev/null
real 0m0.830s
user 0m0.597s
sys 0m0.310s
real 0m0.345s
user 0m0.357s
sys 0m0.000s
Notice I'm also excluding the beginning double quote that appears in
escaped path names.
Best regards
--
Carlos
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-14 20:06 Index files autocompletion too slow in big repositories (w / suggestion for improvement) Carlos Pita
@ 2017-04-14 22:08 ` Carlos Pita
2017-04-14 22:33 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 10+ messages in thread
From: Carlos Pita @ 2017-04-14 22:08 UTC (permalink / raw)
To: “git@vger.kernel.org”
This is much faster (below 0.1s):
__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | \
sed -r 's@/.*@@' | uniq | sort | uniq
fi
}
time __git_index_files
real 0m0.075s
user 0m0.083s
sys 0m0.010s
Most of the improvement is due to the simpler, non-grouping, regex.
Since I expect most of the common prefixes to arrive consecutively,
running uniq before sort also improves things a bit. I'm not removing
leading double quotes anymore (this isn't being done by the current
version, anyway) but this doesn't seem to hurt.
Despite the dependence on sed this is ten times faster than the
original, maybe an option to enable fast index completion or something
like that might be desirable.
Best regards
--
Carlos
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-14 22:08 ` Carlos Pita
@ 2017-04-14 22:33 ` Ævar Arnfjörð Bjarmason
2017-04-15 1:37 ` Jacob Keller
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-14 22:33 UTC (permalink / raw)
To: Carlos Pita; +Cc: “git@vger.kernel.org”
On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita <carlosjosepita@gmail.com> wrote:
> This is much faster (below 0.1s):
>
> __git_index_files ()
> {
> local dir="$(__gitdir)" root="${2-.}" file;
> if [ -d "$dir" ]; then
> __git_ls_files_helper "$root" "$1" | \
> sed -r 's@/.*@@' | uniq | sort | uniq
> fi
> }
>
> time __git_index_files
>
> real 0m0.075s
> user 0m0.083s
> sys 0m0.010s
>
> Most of the improvement is due to the simpler, non-grouping, regex.
> Since I expect most of the common prefixes to arrive consecutively,
> running uniq before sort also improves things a bit. I'm not removing
> leading double quotes anymore (this isn't being done by the current
> version, anyway) but this doesn't seem to hurt.
>
> Despite the dependence on sed this is ten times faster than the
> original, maybe an option to enable fast index completion or something
> like that might be desirable.
>
> Best regards
It's fine to depend on sed, these shell-scripts are POSIX compatible,
and so is sed, we use sed in a lot of the built-in shellscripts.
I think you should submit this as a patch, see Documentation/SubmittingPatches.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-14 22:33 ` Ævar Arnfjörð Bjarmason
@ 2017-04-15 1:37 ` Jacob Keller
2017-04-15 7:52 ` Junio C Hamano
2017-04-15 11:59 ` Johannes Sixt
2017-04-15 12:30 ` Johannes Sixt
2 siblings, 1 reply; 10+ messages in thread
From: Jacob Keller @ 2017-04-15 1:37 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Carlos Pita, “git@vger.kernel.org”
On Fri, Apr 14, 2017 at 3:33 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>> This is much faster (below 0.1s):
>>
>> __git_index_files ()
>> {
>> local dir="$(__gitdir)" root="${2-.}" file;
>> if [ -d "$dir" ]; then
>> __git_ls_files_helper "$root" "$1" | \
>> sed -r 's@/.*@@' | uniq | sort | uniq
>> fi
>> }
>>
>> time __git_index_files
>>
>> real 0m0.075s
>> user 0m0.083s
>> sys 0m0.010s
>>
>> Most of the improvement is due to the simpler, non-grouping, regex.
>> Since I expect most of the common prefixes to arrive consecutively,
>> running uniq before sort also improves things a bit. I'm not removing
>> leading double quotes anymore (this isn't being done by the current
>> version, anyway) but this doesn't seem to hurt.
>>
>> Despite the dependence on sed this is ten times faster than the
>> original, maybe an option to enable fast index completion or something
>> like that might be desirable.
>>
>> Best regards
>
> It's fine to depend on sed, these shell-scripts are POSIX compatible,
> and so is sed, we use sed in a lot of the built-in shellscripts.
>
> I think you should submit this as a patch, see Documentation/SubmittingPatches.
Yea it should be fine to use sed.
Thanks,
Jake
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-15 1:37 ` Jacob Keller
@ 2017-04-15 7:52 ` Junio C Hamano
0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2017-04-15 7:52 UTC (permalink / raw)
To: Jacob Keller
Cc: Ævar Arnfjörð Bjarmason, Carlos Pita,
“git@vger.kernel.org”
Jacob Keller <jacob.keller@gmail.com> writes:
> On Fri, Apr 14, 2017 at 3:33 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>>> This is much faster (below 0.1s):
>>>
>>> __git_index_files ()
>>> {
>>> local dir="$(__gitdir)" root="${2-.}" file;
>>> if [ -d "$dir" ]; then
>>> __git_ls_files_helper "$root" "$1" | \
>>> sed -r 's@/.*@@' | uniq | sort | uniq
>>> fi
>>> }
>>>
>>> time __git_index_files
>>>
>>> real 0m0.075s
>>> user 0m0.083s
>>> sys 0m0.010s
>>>
>>> Most of the improvement is due to the simpler, non-grouping, regex.
>>> Since I expect most of the common prefixes to arrive consecutively,
>>> running uniq before sort also improves things a bit. I'm not removing
>>> leading double quotes anymore (this isn't being done by the current
>>> version, anyway) but this doesn't seem to hurt.
>>>
>>> Despite the dependence on sed this is ten times faster than the
>>> original, maybe an option to enable fast index completion or something
>>> like that might be desirable.
>>>
>>> Best regards
>>
>> It's fine to depend on sed, these shell-scripts are POSIX compatible,
>> and so is sed, we use sed in a lot of the built-in shellscripts.
>>
>> I think you should submit this as a patch, see Documentation/SubmittingPatches.
>
> Yea it should be fine to use sed.
As long as the use of "sed" is in line with POSIX.1; I do not think
you need the non-portable "-r" merely to strip out everything that
follow the first slash, so perhaps "s|-r|-e|" with the above (and do
not write backslash after pipe at the end of the line---shell knows
you haven't finished talking to it yet if you end a line with a
pipe, and there is no need for backslash), you'd be golden.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-14 22:33 ` Ævar Arnfjörð Bjarmason
2017-04-15 1:37 ` Jacob Keller
@ 2017-04-15 11:59 ` Johannes Sixt
2017-04-16 0:31 ` Jacob Keller
2017-04-17 4:05 ` Junio C Hamano
2017-04-15 12:30 ` Johannes Sixt
2 siblings, 2 replies; 10+ messages in thread
From: Johannes Sixt @ 2017-04-15 11:59 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, Carlos Pita
Cc: “git@vger.kernel.org”, SZEDER Gábor
Cc Gábor.
Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:
> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>> This is much faster (below 0.1s):
>>
>> __git_index_files ()
>> {
>> local dir="$(__gitdir)" root="${2-.}" file;
>> if [ -d "$dir" ]; then
>> __git_ls_files_helper "$root" "$1" | \
>> sed -r 's@/.*@@' | uniq | sort | uniq
>> fi
>> }
>>
>> time __git_index_files
>>
>> real 0m0.075s
>> user 0m0.083s
>> sys 0m0.010s
>>
>> Most of the improvement is due to the simpler, non-grouping, regex.
>> Since I expect most of the common prefixes to arrive consecutively,
>> running uniq before sort also improves things a bit. I'm not removing
>> leading double quotes anymore (this isn't being done by the current
>> version, anyway) but this doesn't seem to hurt.
>>
>> Despite the dependence on sed this is ten times faster than the
>> original, maybe an option to enable fast index completion or something
>> like that might be desirable.
>
> It's fine to depend on sed, these shell-scripts are POSIX compatible,
> and so is sed, we use sed in a lot of the built-in shellscripts.
This is about command line completion. We go a long way to avoid forking
processes there. What is 10x faster on Linux despite of forking a
process may not be so on Windows.
(I'm not using bash command line completion on Windows, so I can't tell
what the effect of your suggested change is on Windows. I hope Gábor can
comment on it.)
-- Hannes
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-14 22:33 ` Ævar Arnfjörð Bjarmason
2017-04-15 1:37 ` Jacob Keller
2017-04-15 11:59 ` Johannes Sixt
@ 2017-04-15 12:30 ` Johannes Sixt
2 siblings, 0 replies; 10+ messages in thread
From: Johannes Sixt @ 2017-04-15 12:30 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, Carlos Pita
Cc: “git@vger.kernel.org”, SZEDER Gábor
Cc Gábor, resent with working email (hopefully); please follow-up on
this mail.
Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:
> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>> This is much faster (below 0.1s):
>>
>> __git_index_files ()
>> {
>> local dir="$(__gitdir)" root="${2-.}" file;
>> if [ -d "$dir" ]; then
>> __git_ls_files_helper "$root" "$1" | \
>> sed -r 's@/.*@@' | uniq | sort | uniq
>> fi
>> }
>>
>> time __git_index_files
>>
>> real 0m0.075s
>> user 0m0.083s
>> sys 0m0.010s
>>
>> Most of the improvement is due to the simpler, non-grouping, regex.
>> Since I expect most of the common prefixes to arrive consecutively,
>> running uniq before sort also improves things a bit. I'm not removing
>> leading double quotes anymore (this isn't being done by the current
>> version, anyway) but this doesn't seem to hurt.
>>
>> Despite the dependence on sed this is ten times faster than the
>> original, maybe an option to enable fast index completion or something
>> like that might be desirable.
>
> It's fine to depend on sed, these shell-scripts are POSIX compatible,
> and so is sed, we use sed in a lot of the built-in shellscripts.
This is about command line completion. We go a long way to avoid forking
processes there. What is 10x faster on Linux despite of forking a
process may not be so on Windows.
(I'm not using bash command line completion on Windows, so I can't tell
what the effect of your suggested change is on Windows. I hope Gábor can
comment on it.)
-- Hannes
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-15 11:59 ` Johannes Sixt
@ 2017-04-16 0:31 ` Jacob Keller
2017-04-17 4:05 ` Junio C Hamano
1 sibling, 0 replies; 10+ messages in thread
From: Jacob Keller @ 2017-04-16 0:31 UTC (permalink / raw)
To: Johannes Sixt
Cc: Ævar Arnfjörð Bjarmason, Carlos Pita,
“git@vger.kernel.org”, SZEDER Gábor
On Sat, Apr 15, 2017 at 4:59 AM, Johannes Sixt <j6t@kdbg.org> wrote:
> Cc Gábor.
>
> Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita <carlosjosepita@gmail.com>
>> wrote:
>>>
>>> This is much faster (below 0.1s):
>>>
>>> __git_index_files ()
>>> {
>>> local dir="$(__gitdir)" root="${2-.}" file;
>>> if [ -d "$dir" ]; then
>>> __git_ls_files_helper "$root" "$1" | \
>>> sed -r 's@/.*@@' | uniq | sort | uniq
>>> fi
>>> }
>>>
>>> time __git_index_files
>>>
>>> real 0m0.075s
>>> user 0m0.083s
>>> sys 0m0.010s
>>>
>>> Most of the improvement is due to the simpler, non-grouping, regex.
>>> Since I expect most of the common prefixes to arrive consecutively,
>>> running uniq before sort also improves things a bit. I'm not removing
>>> leading double quotes anymore (this isn't being done by the current
>>> version, anyway) but this doesn't seem to hurt.
>>>
>>> Despite the dependence on sed this is ten times faster than the
>>> original, maybe an option to enable fast index completion or something
>>> like that might be desirable.
>>
>>
>> It's fine to depend on sed, these shell-scripts are POSIX compatible,
>> and so is sed, we use sed in a lot of the built-in shellscripts.
>
>
> This is about command line completion. We go a long way to avoid forking
> processes there. What is 10x faster on Linux despite of forking a process
> may not be so on Windows.
>
> (I'm not using bash command line completion on Windows, so I can't tell what
> the effect of your suggested change is on Windows. I hope Gábor can comment
> on it.)
>
> -- Hannes
>
In cases like this, might it be worth somehow splitting it so Linux
can use the best thing, and Windows can continue using what's best for
it, since it is a pretty significant advantage on Linux.
Thanks,
Jake
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-15 11:59 ` Johannes Sixt
2017-04-16 0:31 ` Jacob Keller
@ 2017-04-17 4:05 ` Junio C Hamano
2017-04-17 8:03 ` Johannes Sixt
1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2017-04-17 4:05 UTC (permalink / raw)
To: Johannes Sixt
Cc: Ævar Arnfjörð Bjarmason, Carlos Pita,
“git@vger.kernel.org”, SZEDER Gábor
Johannes Sixt <j6t@kdbg.org> writes:
> Cc Gábor.
>
> Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:
>> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>>> This is much faster (below 0.1s):
>>>
>>> __git_index_files ()
>>> {
>>> local dir="$(__gitdir)" root="${2-.}" file;
>>> if [ -d "$dir" ]; then
>>> __git_ls_files_helper "$root" "$1" | \
>>> sed -r 's@/.*@@' | uniq | sort | uniq
>>> fi
>>> }
>>>
>>> time __git_index_files
>>>
>>> real 0m0.075s
>>> user 0m0.083s
>>> sys 0m0.010s
>>>
>>> Most of the improvement is due to the simpler, non-grouping, regex.
>>> Since I expect most of the common prefixes to arrive consecutively,
>>> running uniq before sort also improves things a bit. I'm not removing
>>> leading double quotes anymore (this isn't being done by the current
>>> version, anyway) but this doesn't seem to hurt.
>>>
>>> Despite the dependence on sed this is ten times faster than the
>>> original, maybe an option to enable fast index completion or something
>>> like that might be desirable.
>>
>> It's fine to depend on sed, these shell-scripts are POSIX compatible,
>> and so is sed, we use sed in a lot of the built-in shellscripts.
>
> This is about command line completion. We go a long way to avoid
> forking processes there. What is 10x faster on Linux despite of
> forking a process may not be so on Windows.
Doesn't this depend on how many paths there are? If there are only
a few paths, the loop in shell would beat a pipe into sed even on
Linux, I suspect, and if there are tons of paths, at some number,
loop in shell would become slower than a single spawning of sed on
platforms with slower fork, no?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
2017-04-17 4:05 ` Junio C Hamano
@ 2017-04-17 8:03 ` Johannes Sixt
0 siblings, 0 replies; 10+ messages in thread
From: Johannes Sixt @ 2017-04-17 8:03 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Carlos Pita,
“git@vger.kernel.org”, SZEDER Gábor
Am 17.04.2017 um 06:05 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
>> This is about command line completion. We go a long way to avoid
>> forking processes there. What is 10x faster on Linux despite of
>> forking a process may not be so on Windows.
>
> Doesn't this depend on how many paths there are? If there are only
> a few paths, the loop in shell would beat a pipe into sed even on
> Linux, I suspect, and if there are tons of paths, at some number,
> loop in shell would become slower than a single spawning of sed on
> platforms with slower fork, no?
Absolutely. I just want to make sure a suggested change takes into
account the situation on Windows, not only the "YESSSS!" and "VERY
WELL!" votes of Linux users ;)
-- Hannes
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-04-17 8:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-14 20:06 Index files autocompletion too slow in big repositories (w / suggestion for improvement) Carlos Pita
2017-04-14 22:08 ` Carlos Pita
2017-04-14 22:33 ` Ævar Arnfjörð Bjarmason
2017-04-15 1:37 ` Jacob Keller
2017-04-15 7:52 ` Junio C Hamano
2017-04-15 11:59 ` Johannes Sixt
2017-04-16 0:31 ` Jacob Keller
2017-04-17 4:05 ` Junio C Hamano
2017-04-17 8:03 ` Johannes Sixt
2017-04-15 12:30 ` Johannes Sixt
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).