git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: John Bito <jwbito@gmail.com>, git <git@vger.kernel.org>
Subject: Re: git diff looping?
Date: Tue, 16 Jun 2009 13:15:31 -0400	[thread overview]
Message-ID: <20090616171531.GA17538@coredump.intra.peff.net> (raw)
In-Reply-To: <7v3aa0dsvn.fsf@alter.siamese.dyndns.org>

On Tue, Jun 16, 2009 at 09:51:24AM -0700, Junio C Hamano wrote:

> > I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to
> > be caused by a horribly slow system regex implementation; it really
> > chokes on the regex we use to find the "funcname" line for java files.
> 
> Hmm.  Is running under LC_ALL=C LANG=C _with_ the slow system regex help?

No, it remains extremely slow (it is possible that it _is_ faster,
though, but I never managed to run either case to completion; they are
both clearly orders of magnitude off of acceptable).

> In this particular case it is clear that a good way to fix the problem is
> to replace Solaris's dumb regex implemention with what comes in compat/,
> but I at the same time have to wonder if that funcname pattern for java
> can somehow be simplified, so that it does not to require so sophisticated
> implementation of regexp?

That may be a possibility. The default pattern is actually two regexes
(one is a "do not match this" and the other is "match this"). The
problematic one seems to be (and that is a space and a tab between the
brackets):

  ^[      ]*(([   ]*[A-Za-z_][A-Za-z_0-9]*){2,}[  ]*\([^;]*)$

which I determined by setting diff.java.xfuncname just to that (and it
remains slow). Whereas setting it to:

  ^[     ]*(catch|do|for|if|instanceof|new|return|switch|throw|while)

completes in about 5 seconds of CPU time (in the actual pattern it is
negated, but that shouldn't matter as we do the negation ourselves).

Now that being said, 5 seconds is still embarrassingly bad. Watch this
(with the solaris system regex):

  $ git config diff.java.xfuncname '^[ 	]*(catch|do|for|if|instanceof|new|return|switch|throw|while)'
  $ time git diff v0.4.0 >/dev/null
  real    0m5.869s
  user    0m4.720s
  sys     0m0.200s

  $ git config diff.java.xfuncname foo
  $ time git diff v0.4.0 >/dev/null
  real    0m1.895s
  user    0m0.980s
  sys     0m0.210s

So besides learning that this machine is horribly slow, we can see that
running that relatively simple regex takes almost 4 seconds, compared to
a little over 1 second to do the entire rest of the diff. I am inclined
to say that regex performance like that is so bad that we shouldn't care
about optimizing for it, and just use something else.

Bear in mind that the same engine will be used for "grep", too. So you
aren't really doing "git grep" users any favors by linking against such
an awful library.

Really, that performance is so bad that I'm beginning to wonder if I am
somehow measuring something wrong. How could they ship something so
crappy through so many versions?

-Peff

  reply	other threads:[~2009-06-16 17:16 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16  1:37 git diff looping? John Bito
2009-06-16  2:44 ` Jeff Epler
2009-06-16  2:53   ` John Bito
2009-06-16 11:47 ` Jeff King
2009-06-16 12:07   ` Jeff King
2009-06-16 12:11     ` [PATCH 1/2] Makefile: refactor regex compat support Jeff King
2009-06-16 18:47       ` Johannes Sixt
2009-06-16 19:05         ` Jeff King
2009-06-16 19:07           ` [PATCH v2 " Jeff King
2009-06-16 19:08           ` [PATCH v2 2/2] Makefile: use compat regex on Solaris Jeff King
2009-06-16 20:07             ` Brandon Casey
2009-06-17 13:15             ` Mike Ralphson
2009-06-17 13:55               ` Mike Ralphson
2009-06-16 12:14     ` [PATCH " Jeff King
2009-06-16 15:48   ` git diff looping? John Bito
2009-06-16 16:51   ` Junio C Hamano
2009-06-16 17:15     ` Jeff King [this message]
2009-06-16 17:35       ` Brandon Casey
2009-06-16 17:39         ` John Bito
2009-06-16 17:41           ` Jeff King
2009-06-16 20:22         ` Brandon Casey
2009-06-17  8:46       ` Paolo Bonzini
2009-06-17 10:23         ` Jeff King
2009-06-17 11:02           ` Paolo Bonzini
2009-06-17 11:31           ` Andreas Ericsson
2009-06-17 13:08             ` Paolo Bonzini
2009-06-17 13:16               ` Andreas Ericsson
2009-06-17 13:58                 ` Paolo Bonzini
2009-06-17 14:26           ` [PATCH] avoid exponential regex match for java and objc function names Paolo Bonzini
2009-06-17 15:46             ` demerphq
2009-06-17 15:56               ` Jeff King
2009-06-17 16:00                 ` demerphq
2009-06-17 16:04                   ` Paolo Bonzini
2009-06-17 16:42             ` Junio C Hamano
2009-06-18  6:45               ` Paolo Bonzini
2009-06-16 17:16     ` git diff looping? John Bito
2009-06-16 17:24       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090616171531.GA17538@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jwbito@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).