From: Jeff King <peff@peff.net>
To: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, John Bito <jwbito@gmail.com>,
git <git@vger.kernel.org>
Subject: Re: git diff looping?
Date: Wed, 17 Jun 2009 06:23:33 -0400 [thread overview]
Message-ID: <20090617102332.GA32353@coredump.intra.peff.net> (raw)
In-Reply-To: <4A38AD5D.6010404@gmail.com>
On Wed, Jun 17, 2009 at 10:46:21AM +0200, Paolo Bonzini wrote:
> 2) make sure that at least one space/tab is eaten on all but the last
> occurrence of the repeated subexpression. To this end the LHS of {2,} is
> duplicated, once with [ \t]+ and once with [ \t]*. The repetition itself
> becomes a + since the last occurrence is now separately handled:
>
> ^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*
> [ \t]*\([^;]*)$
Thanks, I can confirm that this is _much_ faster. Here are some timings
from my Solaris 8 box for the "git diff v0.4.0" case using the system
and compat engines, and using three regexes: the original that git is
using now, an updated one with your regex above[1] replacing the second
line of the stock pattern, and a baseline regex of "." which should take
virtually no time at all.
system, orig: infinite
system, paolo: 2.5s
system, ".": 0.6s
compat, orig: 288.0s
compat, paolo: 1.5s
compat, ".": 0.6s
So it goes from infinite to 2.5s. Which still spends 3 times as long
matching funcname regexes as it does actually calculating the diff. The
compat library is a little better, but still chokes pretty badly on the
original regex.
Let's compare compat to the glibc implementation on my Debian box:
system, orig: 0.22s
system, paolo: 0.22s
system, ".": 0.15s
compat, orig: 150.88s
compat, paolo: 0.43s
compat, ".": 0.15s
Besides the exponential behavior on the original regex, it is still
about twice as slow as the system one.
So I think there are three possible optimizations worth considering:
1. Replace the builtin diff.java.xfuncname pattern with what Paolo
suggested (though I haven't verified its correctness beyond a
cursory look at the results). This is easy to do, and will help
people with crappy system regex libraries and people on
compat/regex/ (right now just mingw) a _lot_. The downside is that
it's a little harder to read the regex, but not terribly so.
2. Recommend NO_REGEX for people with slow system regex libraries.
This is also easy to do, and will help people even if we do (1) for
two reasons:
a. we process user-defined regexes through diff.*.xfuncname
patterns, as well as through "git grep"; so we are protecting
against poor performance when they give us a complex regex
b. even on more reasonable regexps like Paolo's, we seem to get a
2:1 speedup over the Solaris system library
3. Replace compat/regex with something faster. It still produces
exponential behavior in complex cases where glibc does not, and it
seems to be about 1/3 as fast on Paolo's regex.
I haven't looked at how large or how portable the glibc
implementation is. Another alternative is that we could provide a
simple compat/ as now, and have better support for linking against
an external library like pcre, if it is available.
-Peff
[1] Note if you are cutting and pasting Paolo's regex into the C code,
the "\(" needs to be "\\(", which I screwed up in my initial
timings. :)
next prev parent reply other threads:[~2009-06-17 10:23 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-16 1:37 git diff looping? John Bito
2009-06-16 2:44 ` Jeff Epler
2009-06-16 2:53 ` John Bito
2009-06-16 11:47 ` Jeff King
2009-06-16 12:07 ` Jeff King
2009-06-16 12:11 ` [PATCH 1/2] Makefile: refactor regex compat support Jeff King
2009-06-16 18:47 ` Johannes Sixt
2009-06-16 19:05 ` Jeff King
2009-06-16 19:07 ` [PATCH v2 " Jeff King
2009-06-16 19:08 ` [PATCH v2 2/2] Makefile: use compat regex on Solaris Jeff King
2009-06-16 20:07 ` Brandon Casey
2009-06-17 13:15 ` Mike Ralphson
2009-06-17 13:55 ` Mike Ralphson
2009-06-16 12:14 ` [PATCH " Jeff King
2009-06-16 15:48 ` git diff looping? John Bito
2009-06-16 16:51 ` Junio C Hamano
2009-06-16 17:15 ` Jeff King
2009-06-16 17:35 ` Brandon Casey
2009-06-16 17:39 ` John Bito
2009-06-16 17:41 ` Jeff King
2009-06-16 20:22 ` Brandon Casey
2009-06-17 8:46 ` Paolo Bonzini
2009-06-17 10:23 ` Jeff King [this message]
2009-06-17 11:02 ` Paolo Bonzini
2009-06-17 11:31 ` Andreas Ericsson
2009-06-17 13:08 ` Paolo Bonzini
2009-06-17 13:16 ` Andreas Ericsson
2009-06-17 13:58 ` Paolo Bonzini
2009-06-17 14:26 ` [PATCH] avoid exponential regex match for java and objc function names Paolo Bonzini
2009-06-17 15:46 ` demerphq
2009-06-17 15:56 ` Jeff King
2009-06-17 16:00 ` demerphq
2009-06-17 16:04 ` Paolo Bonzini
2009-06-17 16:42 ` Junio C Hamano
2009-06-18 6:45 ` Paolo Bonzini
2009-06-16 17:16 ` git diff looping? John Bito
2009-06-16 17:24 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090617102332.GA32353@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jwbito@gmail.com \
--cc=paolo.bonzini@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).