git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Jeffrey Walton" <noloader@gmail.com>,
	"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
	"J Smith" <dark.panda@gmail.com>,
	"Victor Leschuk" <vleschuk@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Fredrik Kuivinen" <frekui@gmail.com>,
	"Brandon Williams" <bmwill@google.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH/RFC 1/6] Makefile & compat/pcre2: add ability to build an embedded PCRE
Date: Thu, 11 May 2017 17:51:10 +0000	[thread overview]
Message-ID: <20170511175115.648-2-avarab@gmail.com> (raw)
In-Reply-To: <20170511175115.648-1-avarab@gmail.com>

Add a USE_LIBPCRE2_BUNDLED=YesIHaveNoPackagedVersion flag to the
Makefile which'll use the PCRE v2 shipped in compat/pcre2 instead of
trying to find it via -lpcre2-8 on the installed system.

As covered in a previous commits ("grep: add support for PCRE v2",
2017-04-08) there are major benefits to using a bleeding edge PCRE v2,
but more importantly I'd like to experiment with making PCRE a
mandatory dependency to power various internal features of grep/log
without the user being aware that they're using the library under the
hood, similar to how we use kwset now for fixed-string searches.

Imposing that hard dependency on everyone using git would bother a lot
of people, whereas if git itself ships PCRE it's no more bothersome
than the code using kwset, i.e. it can be invisible to the builder &
user, and allow git to target newer PCRE APIs without worrying about
versioning.

See [1] for a mostly one-sided pcre-dev mailing list thread discussing
how embed the library.

Implementation details:

 * I configured PCRE v2 with ./configure --enable-jit --enable-utf

 * It sets a lot of -DHAVE_* but these are used by the subset of the
   files I copied, many are either unused or only used by pcre2test.c
   which isn't brought in by the script.

 * These -DHAVE_* flags are something we have already by default &
   assume in other git.git code, so it should be fine to define it.

 * -DPCRE2_CODE_UNIT_WIDTH=8 only compiles the functions linking to
   -lpcre2-8 would have gotten us.

 * All the limits / sizes are the PCRE defaults, the
   MATCH_LIMIT_RECURSION define is a synonym for MATCH_LIMIT_DEPTH in
   older versions, it allows building against older (currently
   release) versions of the library.

 * -DNEWLINE_DEFAULT=2 means only \n is recognized as a newline. This
    corresponds to the --enable-newline-is-lf option. It's also
    possible to set this to CR, CRLF, any of CR, LF, or CRLF, or any
    Unicode newline character being recognized as \n.

    This *might* have to be customized on Windows, but I think the
    grep machinery always splits on newlines for us already, so this
    probably works on Windows as-is, but needs testing.

1. https://lists.exim.org/lurker/thread/20170507.223619.fbee8f00.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile                  | 52 ++++++++++++++++++++++++++++++++++++
 compat/pcre2/get-pcre2.sh | 67 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+)
 create mode 100755 compat/pcre2/get-pcre2.sh

diff --git a/Makefile b/Makefile
index d77ca4c1a5..b18867196e 100644
--- a/Makefile
+++ b/Makefile
@@ -34,6 +34,11 @@ all::
 # library. The USE_LIBPCRE flag will likely be changed to mean v2 by
 # default in future releases.
 #
+# Define USE_LIBPCRE2_BUNDLED=YesIHaveNoPackagedVersion in addition to
+# USE_LIBPCRE2=YesPlease if you'd like to use a copy of PCRE version 2
+# bunded with Git. This is for setups where getting a hold of a
+# packaged PCRE is inconvenient.
+#
 # Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
 #
@@ -1105,8 +1110,10 @@ endif
 
 ifdef USE_LIBPCRE2
 	BASIC_CFLAGS += -DUSE_LIBPCRE2
+ifndef USE_LIBPCRE2_BUNDLED
 	EXTLIBS += -lpcre2-8
 endif
+endif
 
 ifdef LIBPCREDIR
 	BASIC_CFLAGS += -I$(LIBPCREDIR)/include
@@ -1505,6 +1512,50 @@ ifdef NO_REGEX
 	COMPAT_CFLAGS += -Icompat/regex
 	COMPAT_OBJS += compat/regex/regex.o
 endif
+ifdef USE_LIBPCRE2_BUNDLED
+ifndef USE_LIBPCRE2
+$(error please set USE_LIBPCRE2=YesPlease when setting \
+USE_LIBPCRE2_BUNDLED=$(USE_LIBPCRE2_BUNDLED))
+endif
+	COMPAT_CFLAGS += \
+		-Icompat/pcre2/src \
+		-DHAVE_BCOPY=1 \
+		-DHAVE_INTTYPES_H=1 \
+		-DHAVE_MEMMOVE=1 \
+		-DHAVE_STDINT_H=1 \
+		-DPCRE2_CODE_UNIT_WIDTH=8 \
+		-DLINK_SIZE=2 \
+		-DHEAP_LIMIT=20000000 \
+		-DMATCH_LIMIT=10000000 \
+		-DMATCH_LIMIT_DEPTH=10000000 \
+		-DMATCH_LIMIT_RECURSION=10000000 \
+		-DMAX_NAME_COUNT=10000 \
+		-DMAX_NAME_SIZE=32 \
+		-DPARENS_NEST_LIMIT=250 \
+		-DNEWLINE_DEFAULT=2 \
+		-DSUPPORT_JIT \
+		-DSUPPORT_UNICODE
+	COMPAT_OBJS += \
+		compat/pcre2/src/pcre2_auto_possess.o \
+		compat/pcre2/src/pcre2_chartables.o \
+		compat/pcre2/src/pcre2_compile.o \
+		compat/pcre2/src/pcre2_config.o \
+		compat/pcre2/src/pcre2_context.o \
+		compat/pcre2/src/pcre2_error.o \
+		compat/pcre2/src/pcre2_find_bracket.o \
+		compat/pcre2/src/pcre2_jit_compile.o \
+		compat/pcre2/src/pcre2_maketables.o \
+		compat/pcre2/src/pcre2_match.o \
+		compat/pcre2/src/pcre2_match_data.o \
+		compat/pcre2/src/pcre2_newline.o \
+		compat/pcre2/src/pcre2_ord2utf.o \
+		compat/pcre2/src/pcre2_string_utils.o \
+		compat/pcre2/src/pcre2_study.o \
+		compat/pcre2/src/pcre2_tables.o \
+		compat/pcre2/src/pcre2_ucd.o \
+		compat/pcre2/src/pcre2_valid_utf.o \
+		compat/pcre2/src/pcre2_xclass.o
+endif
 ifdef NATIVE_CRLF
 	BASIC_CFLAGS += -DNATIVE_CRLF
 endif
@@ -2259,6 +2310,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
 	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
 	@echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
+	@echo USE_LIBPCRE2_BUNDLED=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2_BUNDLED)))'\' >>$@+
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
diff --git a/compat/pcre2/get-pcre2.sh b/compat/pcre2/get-pcre2.sh
new file mode 100755
index 0000000000..f1796cb518
--- /dev/null
+++ b/compat/pcre2/get-pcre2.sh
@@ -0,0 +1,67 @@
+#!/bin/sh -e
+
+# Usage:
+# ./get-pcre2.sh '' 'trunk'
+# ./get-pcre2.sh '' 'tags/pcre2-10.23'
+# ./get-pcre2.sh ~/g/pcre2 ''
+
+srcdir=$1
+version=$2
+if test -z "$version"
+then
+	version="tags/pcre2-10.23"
+fi
+
+echo Getting PCRE v2 version $version
+rm -rfv src
+mkdir src src/sljit
+
+for srcfile in \
+	pcre2.h \
+	pcre2_internal.h \
+	pcre2_intmodedep.h \
+	pcre2_ucp.h \
+	pcre2_auto_possess.c \
+	pcre2_chartables.c.dist \
+	pcre2_compile.c \
+	pcre2_config.c \
+	pcre2_context.c \
+	pcre2_error.c \
+	pcre2_find_bracket.c \
+	pcre2_jit_compile.c \
+	pcre2_jit_match.c \
+	pcre2_jit_misc.c \
+	pcre2_maketables.c \
+	pcre2_match.c \
+	pcre2_match_data.c \
+	pcre2_newline.c \
+	pcre2_ord2utf.c \
+	pcre2_string_utils.c \
+	pcre2_study.c \
+	pcre2_tables.c \
+	pcre2_ucd.c \
+	pcre2_valid_utf.c \
+	pcre2_xclass.c
+do
+	if test -z "$srcdir"
+	then
+		svn cat svn://vcs.exim.org/pcre2/code/$version/src/$srcfile >src/$srcfile
+	else
+		cp "$srcdir/src/$srcfile" src/$srcfile
+	fi
+	wc -l src/$srcfile
+done
+
+(cd src && ln -sf pcre2_chartables.c.dist pcre2_chartables.c)
+
+if test -z "$srcdir"
+then
+	for srcfile in $(svn ls svn://vcs.exim.org/pcre2/code/tags/pcre2-10.23/src/sljit)
+	do
+		svn cat svn://vcs.exim.org/pcre2/code/$version/src/sljit/$srcfile >src/sljit/$srcfile
+		wc -l src/sljit/$srcfile
+	done
+else
+	cp -R "$srcdir/src/sljit" src/
+	wc -l src/sljit/*
+fi
-- 
2.11.0


  reply	other threads:[~2017-05-11 17:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11 17:51 [PATCH/RFC 0/6] Speed up git-grep by using PCRE v2 as a backend Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` Ævar Arnfjörð Bjarmason [this message]
2017-05-11 17:51 ` [PATCH/RFC 2/6] Makefile & compat/pcre2: add dependency on pcre2_convert.c Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 4/6] test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 5/6] grep: support regex patterns containing \0 via PCRE v2 Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 6/6] grep: use PCRE v2 under the hood for -G & -E for amazing speedup Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170511175115.648-2-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dark.panda@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michal.kiedrowicz@gmail.com \
    --cc=noloader@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=vleschuk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).