git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Paolo Bonzini <paolo.bonzini@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, John Bito <jwbito@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: git diff looping?
Date: Wed, 17 Jun 2009 10:46:21 +0200	[thread overview]
Message-ID: <4A38AD5D.6010404@gmail.com> (raw)
In-Reply-To: <20090616171531.GA17538@coredump.intra.peff.net>


> Really, that performance is so bad that I'm beginning to wonder if I am
> somehow measuring something wrong. How could they ship something so
> crappy through so many versions?

Because without some care in the matcher, the regex can be exponential. 
This happens because you can backtrack arbitrarily from [A-Za-z_0-9]* 
into [A-Za-z_] and ironically it also causes the regex not to work as 
intended; for example "catch(" can match the complex part of the regex 
(e.g. the first repetition can be "c" and the second can be "atch".

We can make it faster and more correct at the expense of additional 
complication.

Starting from:

^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$

we have to:

1) move [ \t] at the end of the repeated subexpression so that it 
removes the need for the [ \t] after

^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]*){2,}\([^;]*)$

2) make sure that at least one space/tab is eaten on all but the last 
occurrence of the repeated subexpression.  To this end the LHS of {2,} 
is duplicated, once with [ \t]+ and once with [ \t]*.  The repetition 
itself becomes a + since the last occurrence is now separately handled:

^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*
[ \t]*\([^;]*)$

Paolo

  parent reply	other threads:[~2009-06-17  8:46 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16  1:37 git diff looping? John Bito
2009-06-16  2:44 ` Jeff Epler
2009-06-16  2:53   ` John Bito
2009-06-16 11:47 ` Jeff King
2009-06-16 12:07   ` Jeff King
2009-06-16 12:11     ` [PATCH 1/2] Makefile: refactor regex compat support Jeff King
2009-06-16 18:47       ` Johannes Sixt
2009-06-16 19:05         ` Jeff King
2009-06-16 19:07           ` [PATCH v2 " Jeff King
2009-06-16 19:08           ` [PATCH v2 2/2] Makefile: use compat regex on Solaris Jeff King
2009-06-16 20:07             ` Brandon Casey
2009-06-17 13:15             ` Mike Ralphson
2009-06-17 13:55               ` Mike Ralphson
2009-06-16 12:14     ` [PATCH " Jeff King
2009-06-16 15:48   ` git diff looping? John Bito
2009-06-16 16:51   ` Junio C Hamano
2009-06-16 17:15     ` Jeff King
2009-06-16 17:35       ` Brandon Casey
2009-06-16 17:39         ` John Bito
2009-06-16 17:41           ` Jeff King
2009-06-16 20:22         ` Brandon Casey
2009-06-17  8:46       ` Paolo Bonzini [this message]
2009-06-17 10:23         ` Jeff King
2009-06-17 11:02           ` Paolo Bonzini
2009-06-17 11:31           ` Andreas Ericsson
2009-06-17 13:08             ` Paolo Bonzini
2009-06-17 13:16               ` Andreas Ericsson
2009-06-17 13:58                 ` Paolo Bonzini
2009-06-17 14:26           ` [PATCH] avoid exponential regex match for java and objc function names Paolo Bonzini
2009-06-17 15:46             ` demerphq
2009-06-17 15:56               ` Jeff King
2009-06-17 16:00                 ` demerphq
2009-06-17 16:04                   ` Paolo Bonzini
2009-06-17 16:42             ` Junio C Hamano
2009-06-18  6:45               ` Paolo Bonzini
2009-06-16 17:16     ` git diff looping? John Bito
2009-06-16 17:24       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A38AD5D.6010404@gmail.com \
    --to=paolo.bonzini@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jwbito@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).