git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Jeffrey Walton" <noloader@gmail.com>,
	"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
	"J Smith" <dark.panda@gmail.com>,
	"Victor Leschuk" <vleschuk@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Fredrik Kuivinen" <frekui@gmail.com>,
	"Brandon Williams" <bmwill@google.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH/RFC 0/6] Speed up git-grep by using PCRE v2 as a backend
Date: Thu, 11 May 2017 17:51:09 +0000	[thread overview]
Message-ID: <20170511175115.648-1-avarab@gmail.com> (raw)

Thought I'd send this to the list too. This is first of my WIP
post-PCRE v2 inclusion series's.

In addition to the huge speed improvements for grep -P noted in the
culmination of that series[1], this speeds up all other types of grep
invocations (fixed string & POSIX basic/extended) by using an
experimental PCRE API to translate those patterns to PCRE syntax.

Fixed string grep is sped up by ~15-50%, and any greps containing
regexes by 40-70%, with 50% seeming to be the average for most normal
patterns.

It isn't ready for the reasons noted in the last patch in the series,
and currently brings most of PCRE into git in compat/pcre2 since it
uses an experimental API.

The 5/6 patch is pretty much ready though and works on stock PCRE, it
fixes TODO tests for patterns that contain a \0, and enables regex
metacharacters in such patterns (right now they're all implicitly
fixed strings), see the discussion in that patch for some of the
caveats.

That patch will most likely be dropped by the list, it can be
retrieved from https://github.com/avar/git
avar/grep-and-pcre-and-more, or the whole series viewed at
https://github.com/git/git/compare/master...avar:avar/grep-and-pcre-and-more.

1. <20170511170142.15934-8-avarab@gmail.com>
   (https://public-inbox.org/git/20170511170142.15934-8-avarab@gmail.com/)

Ævar Arnfjörð Bjarmason (6):
  Makefile & compat/pcre2: add ability to build an embedded PCRE
  Makefile & compat/pcre2: add dependency on pcre2_convert.c
  compat/pcre2: import pcre2 from svn trunk
  test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites
  grep: support regex patterns containing \0 via PCRE v2
  grep: use PCRE v2 under the hood for -G & -E for amazing speedup

 Makefile                                           |    53 +
 compat/pcre2/get-pcre2.sh                          |    68 +
 compat/pcre2/src/pcre2.h                           |   832 ++
 compat/pcre2/src/pcre2_auto_possess.c              |  1291 ++
 compat/pcre2/src/pcre2_chartables.c                |     1 +
 compat/pcre2/src/pcre2_chartables.c.dist           |   198 +
 compat/pcre2/src/pcre2_compile.c                   |  9626 +++++++++++++++
 compat/pcre2/src/pcre2_config.c                    |   222 +
 compat/pcre2/src/pcre2_context.c                   |   450 +
 compat/pcre2/src/pcre2_convert.c                   |   724 ++
 compat/pcre2/src/pcre2_error.c                     |   327 +
 compat/pcre2/src/pcre2_find_bracket.c              |   218 +
 compat/pcre2/src/pcre2_internal.h                  |  1967 +++
 compat/pcre2/src/pcre2_intmodedep.h                |   884 ++
 compat/pcre2/src/pcre2_jit_compile.c               | 12307 +++++++++++++++++++
 compat/pcre2/src/pcre2_jit_match.c                 |   189 +
 compat/pcre2/src/pcre2_jit_misc.c                  |   227 +
 compat/pcre2/src/pcre2_maketables.c                |   157 +
 compat/pcre2/src/pcre2_match.c                     |  6826 ++++++++++
 compat/pcre2/src/pcre2_match_data.c                |   147 +
 compat/pcre2/src/pcre2_newline.c                   |   243 +
 compat/pcre2/src/pcre2_ord2utf.c                   |   120 +
 compat/pcre2/src/pcre2_string_utils.c              |   201 +
 compat/pcre2/src/pcre2_study.c                     |  1624 +++
 compat/pcre2/src/pcre2_tables.c                    |   765 ++
 compat/pcre2/src/pcre2_ucd.c                       |  3761 ++++++
 compat/pcre2/src/pcre2_ucp.h                       |   268 +
 compat/pcre2/src/pcre2_valid_utf.c                 |   398 +
 compat/pcre2/src/pcre2_xclass.c                    |   271 +
 compat/pcre2/src/sljit/sljitConfig.h               |   145 +
 compat/pcre2/src/sljit/sljitConfigInternal.h       |   725 ++
 compat/pcre2/src/sljit/sljitExecAllocator.c        |   312 +
 compat/pcre2/src/sljit/sljitLir.c                  |  2224 ++++
 compat/pcre2/src/sljit/sljitLir.h                  |  1392 +++
 compat/pcre2/src/sljit/sljitNativeARM_32.c         |  2326 ++++
 compat/pcre2/src/sljit/sljitNativeARM_64.c         |  2104 ++++
 compat/pcre2/src/sljit/sljitNativeARM_T2_32.c      |  1987 +++
 compat/pcre2/src/sljit/sljitNativeMIPS_32.c        |   437 +
 compat/pcre2/src/sljit/sljitNativeMIPS_64.c        |   539 +
 compat/pcre2/src/sljit/sljitNativeMIPS_common.c    |  2110 ++++
 compat/pcre2/src/sljit/sljitNativePPC_32.c         |   276 +
 compat/pcre2/src/sljit/sljitNativePPC_64.c         |   447 +
 compat/pcre2/src/sljit/sljitNativePPC_common.c     |  2421 ++++
 compat/pcre2/src/sljit/sljitNativeSPARC_32.c       |   165 +
 compat/pcre2/src/sljit/sljitNativeSPARC_common.c   |  1471 +++
 compat/pcre2/src/sljit/sljitNativeTILEGX-encoder.c | 10159 +++++++++++++++
 compat/pcre2/src/sljit/sljitNativeTILEGX_64.c      |  2555 ++++
 compat/pcre2/src/sljit/sljitNativeX86_32.c         |   602 +
 compat/pcre2/src/sljit/sljitNativeX86_64.c         |   742 ++
 compat/pcre2/src/sljit/sljitNativeX86_common.c     |  2921 +++++
 compat/pcre2/src/sljit/sljitProtExecAllocator.c    |   421 +
 compat/pcre2/src/sljit/sljitUtils.c                |   334 +
 grep.c                                             |    73 +-
 grep.h                                             |     5 +
 t/README                                           |    18 +
 t/t7008-grep-binary.sh                             |    87 +-
 t/test-lib.sh                                      |     3 +
 57 files changed, 81335 insertions(+), 31 deletions(-)
 create mode 100755 compat/pcre2/get-pcre2.sh
 create mode 100644 compat/pcre2/src/pcre2.h
 create mode 100644 compat/pcre2/src/pcre2_auto_possess.c
 create mode 120000 compat/pcre2/src/pcre2_chartables.c
 create mode 100644 compat/pcre2/src/pcre2_chartables.c.dist
 create mode 100644 compat/pcre2/src/pcre2_compile.c
 create mode 100644 compat/pcre2/src/pcre2_config.c
 create mode 100644 compat/pcre2/src/pcre2_context.c
 create mode 100644 compat/pcre2/src/pcre2_convert.c
 create mode 100644 compat/pcre2/src/pcre2_error.c
 create mode 100644 compat/pcre2/src/pcre2_find_bracket.c
 create mode 100644 compat/pcre2/src/pcre2_internal.h
 create mode 100644 compat/pcre2/src/pcre2_intmodedep.h
 create mode 100644 compat/pcre2/src/pcre2_jit_compile.c
 create mode 100644 compat/pcre2/src/pcre2_jit_match.c
 create mode 100644 compat/pcre2/src/pcre2_jit_misc.c
 create mode 100644 compat/pcre2/src/pcre2_maketables.c
 create mode 100644 compat/pcre2/src/pcre2_match.c
 create mode 100644 compat/pcre2/src/pcre2_match_data.c
 create mode 100644 compat/pcre2/src/pcre2_newline.c
 create mode 100644 compat/pcre2/src/pcre2_ord2utf.c
 create mode 100644 compat/pcre2/src/pcre2_string_utils.c
 create mode 100644 compat/pcre2/src/pcre2_study.c
 create mode 100644 compat/pcre2/src/pcre2_tables.c
 create mode 100644 compat/pcre2/src/pcre2_ucd.c
 create mode 100644 compat/pcre2/src/pcre2_ucp.h
 create mode 100644 compat/pcre2/src/pcre2_valid_utf.c
 create mode 100644 compat/pcre2/src/pcre2_xclass.c
 create mode 100644 compat/pcre2/src/sljit/sljitConfig.h
 create mode 100644 compat/pcre2/src/sljit/sljitConfigInternal.h
 create mode 100644 compat/pcre2/src/sljit/sljitExecAllocator.c
 create mode 100644 compat/pcre2/src/sljit/sljitLir.c
 create mode 100644 compat/pcre2/src/sljit/sljitLir.h
 create mode 100644 compat/pcre2/src/sljit/sljitNativeARM_32.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeARM_64.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeARM_T2_32.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeMIPS_32.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeMIPS_64.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeMIPS_common.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativePPC_32.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativePPC_64.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativePPC_common.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeSPARC_32.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeSPARC_common.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeTILEGX-encoder.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeTILEGX_64.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeX86_32.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeX86_64.c
 create mode 100644 compat/pcre2/src/sljit/sljitNativeX86_common.c
 create mode 100644 compat/pcre2/src/sljit/sljitProtExecAllocator.c
 create mode 100644 compat/pcre2/src/sljit/sljitUtils.c

-- 
2.11.0


             reply	other threads:[~2017-05-11 17:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11 17:51 Ævar Arnfjörð Bjarmason [this message]
2017-05-11 17:51 ` [PATCH/RFC 1/6] Makefile & compat/pcre2: add ability to build an embedded PCRE Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 2/6] Makefile & compat/pcre2: add dependency on pcre2_convert.c Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 4/6] test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 5/6] grep: support regex patterns containing \0 via PCRE v2 Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 6/6] grep: use PCRE v2 under the hood for -G & -E for amazing speedup Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170511175115.648-1-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dark.panda@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michal.kiedrowicz@gmail.com \
    --cc=noloader@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=vleschuk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).