git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Compile-time extensions for list-object-filter
@ 2021-09-05 23:51 Andrew Olsen via GitGitGadget
  2021-09-05 23:51 ` [PATCH 1/4] " Andrew Olsen via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Andrew Olsen via GitGitGadget @ 2021-09-05 23:51 UTC (permalink / raw)
  To: git; +Cc: Andrew Olsen

Adds an extension: option to list-object-filters, these are implemented by
static libraries that must be compiled into Git. The Makefile argument
FILTER_EXTENSIONS makes it easier to compile these extensions into a custom
build of Git. When no custom filter-extensions are supplied, Git works as
normal.

Andrew Olsen (4):
  Compile-time extensions for list-object-filter
  Makefile for list-object-filter extensions
  Sample list-object-filter extensions
  Documentation for list-object-filter extensions

 .gitignore                                    |   1 +
 Documentation/config/uploadpack.txt           |   7 +-
 Documentation/rev-list-options.txt            |   4 +
 Makefile                                      |  35 +++-
 compat/vcbuild/README                         |   5 +-
 config.mak.uname                              |   6 +-
 contrib/buildsystems/CMakeLists.txt           |   7 +
 contrib/filter-extensions/README.txt          | 153 ++++++++++++++++++
 contrib/filter-extensions/rand/.gitignore     |   2 +
 contrib/filter-extensions/rand/Makefile       |  28 ++++
 contrib/filter-extensions/rand/rand.c         | 103 ++++++++++++
 contrib/filter-extensions/rand_cpp/.gitignore |   2 +
 contrib/filter-extensions/rand_cpp/Makefile   |  34 ++++
 .../rand_cpp/adapter_functions.c              |   6 +
 .../rand_cpp/adapter_functions.h              |  10 ++
 contrib/filter-extensions/rand_cpp/rand.cpp   | 103 ++++++++++++
 generate-list-objects-filter-extensions.sh    |  53 ++++++
 list-objects-filter-extensions.h              | 107 ++++++++++++
 list-objects-filter-options.c                 |  47 ++++++
 list-objects-filter-options.h                 |   6 +
 list-objects-filter.c                         |  84 ++++++++++
 21 files changed, 793 insertions(+), 10 deletions(-)
 create mode 100644 contrib/filter-extensions/README.txt
 create mode 100644 contrib/filter-extensions/rand/.gitignore
 create mode 100644 contrib/filter-extensions/rand/Makefile
 create mode 100644 contrib/filter-extensions/rand/rand.c
 create mode 100644 contrib/filter-extensions/rand_cpp/.gitignore
 create mode 100644 contrib/filter-extensions/rand_cpp/Makefile
 create mode 100644 contrib/filter-extensions/rand_cpp/adapter_functions.c
 create mode 100644 contrib/filter-extensions/rand_cpp/adapter_functions.h
 create mode 100644 contrib/filter-extensions/rand_cpp/rand.cpp
 create mode 100755 generate-list-objects-filter-extensions.sh
 create mode 100644 list-objects-filter-extensions.h


base-commit: e0a2f5cbc585657e757385ad918f167f519cfb96
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1031%2Fkoordinates%2Flist-objects-filter-extensions-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1031/koordinates/list-objects-filter-extensions-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1031
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] Compile-time extensions for list-object-filter
  2021-09-05 23:51 [PATCH 0/4] Compile-time extensions for list-object-filter Andrew Olsen via GitGitGadget
@ 2021-09-05 23:51 ` Andrew Olsen via GitGitGadget
  2021-09-05 23:51 ` [PATCH 2/4] Makefile for list-object-filter extensions Andrew Olsen via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Andrew Olsen via GitGitGadget @ 2021-09-05 23:51 UTC (permalink / raw)
  To: git; +Cc: Andrew Olsen, Andrew Olsen

From: Andrew Olsen <andrew.olsen@koordinates.com>

Adds an extension:<custom-filter> option to list-object-filters,
these are implemented by static libraries that must be compiled into
Git. C code changes only - Makefile changes follow.

Signed-off-by: Andrew Olsen <andrew.olsen@koordinates.com>
---
 .gitignore                                 |   1 +
 generate-list-objects-filter-extensions.sh |  53 ++++++++++
 list-objects-filter-extensions.h           | 107 +++++++++++++++++++++
 list-objects-filter-options.c              |  47 +++++++++
 list-objects-filter-options.h              |   6 ++
 list-objects-filter.c                      |  84 ++++++++++++++++
 6 files changed, 298 insertions(+)
 create mode 100755 generate-list-objects-filter-extensions.sh
 create mode 100644 list-objects-filter-extensions.h

diff --git a/.gitignore b/.gitignore
index 311841f9bed..3564cb01ad7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -190,6 +190,7 @@
 /gitweb/static/gitweb.min.*
 /config-list.h
 /command-list.h
+/list-objects-filter-extensions.c
 *.tar.gz
 *.dsc
 *.deb
diff --git a/generate-list-objects-filter-extensions.sh b/generate-list-objects-filter-extensions.sh
new file mode 100755
index 00000000000..422b1ce837f
--- /dev/null
+++ b/generate-list-objects-filter-extensions.sh
@@ -0,0 +1,53 @@
+#!/bin/sh
+
+if [ $# -gt 0 ]; then
+
+	# ARGS has one argument per line
+	ARGS=$(echo "$@" | xargs printf '%s\n')
+
+	# Every argument should be path to a filter extension library.
+	INVALID_ARGS=$(echo "$ARGS" | grep -v '\.a$')
+	if [ -n "$INVALID_ARGS" ] ; then
+		printf "Error: all arguments must be paths to .a files: \n%s\n" \
+			"${INVALID_ARGS}" >&2
+		exit 1
+	fi
+
+	# qux/foo.a -> foo
+	NAMES=$(echo "$ARGS" | sed -e 's!.*/!!' -e 's!.a$!!')
+
+	# Filter extension names must be valid C symbols so they can be linked by name.
+	INVALID_NAMES=$(echo "$NAMES" | grep -v '^[A-Za-z0-9_]\+$')
+	if [ -n "$INVALID_NAMES" ] ; then
+		printf "Error: all library names must also be valid C symbols: \n%s\n" \
+			"${INVALID_NAMES}" >&2
+		exit 1
+	fi
+
+	# foo -> filter_extension_foo
+	EXTS=$(echo "$NAMES" | sed -e 's!^!filter_extension_!')
+
+	# filter_extension_foo -> [\t]filter_extension_foo,
+	DECLARATIONS=$(echo "$EXTS" | sed -e 's!^!\t!' -e 's!$!,!')
+
+	# filter_extension_foo -> [\t]&filter_extension_foo,
+	ARRAY=$(echo "$EXTS" | sed -e 's!^!\t\&!' -e 's!$!,!')
+fi
+
+echo '/* Automatically generated by generate-list-objects-filter-extensions.sh */'
+echo
+echo '#include "git-compat-util.h"'
+echo '#include "list-objects-filter-extensions.h"'
+echo
+
+if [ $# -gt 0 ]; then
+	echo 'extern const struct filter_extension'
+	echo "${DECLARATIONS%?}"
+	echo ';'
+	echo
+fi
+
+echo 'const struct filter_extension *filter_extensions[] = {'
+echo "${ARRAY}"
+echo '	NULL,'
+echo '};'
\ No newline at end of file
diff --git a/list-objects-filter-extensions.h b/list-objects-filter-extensions.h
new file mode 100644
index 00000000000..35ebe1ead31
--- /dev/null
+++ b/list-objects-filter-extensions.h
@@ -0,0 +1,107 @@
+#ifndef GIT_LIST_OBJECTS_FILTER_EXTENSIONS_H
+#define GIT_LIST_OBJECTS_FILTER_EXTENSIONS_H
+
+/**
+ * The List-Objects-Filter Extensions API can be used to develop filter
+ * extensions for git-upload-pack/git-rev-list/etc.
+ *
+ * See contrib/filter-extensions/README.md for more details and examples.
+ *
+ * The API defines three functions to implement a filter operation. Note that
+ * each filter implementing this API must compiled into Git as a static library.
+ * There is some plumbing in the Makefile to help with this via
+ * FILTER_EXTENSIONS.
+ *
+ * 1. You write a filter and compile it into your custom build of git.
+ *    See list_objects_filter_ext_filter_fn.
+ * 2. A filter request is received that specifically names the filter extension
+ *    that you have written, ie: "--filter=extension:<name>[=<arg>]"
+ * 3. Your list_objects_filter_ext_init_fn() is called.
+ * 4. Your list_objects_filter_ext_filter_fn() is called for each object
+ *    at least once.
+ * 5. Your list_objects_filter_ext_free_fn() is called.
+ */
+
+#include "list-objects-filter.h"
+
+
+/* Whether to add or remove a specific object from any current omitset. */
+enum list_objects_filter_omit {
+       LOFO_KEEP = -1,
+       LOFO_IGNORE = 0,
+       LOFO_OMIT = 1,
+};
+
+/*
+ * This is a corollary to `list_objects_filter__init()` and constructs the
+ * filter, parsing and validating any user-provided `filter_arg` (via
+ * `--filter=extension:<name>=<arg>`). Use `context` for any filter-allocated
+ * context data.
+ *
+ * Return 0 on success and non-zero on error.
+ */
+typedef
+int list_objects_filter_ext_init_fn(
+    const struct repository *r,
+    const char* filter_arg,
+    void **context
+);
+
+/*
+ * This is a corollary to `list_objects_filter__free()`, destroying the filter
+ * and any filter-allocated context data.
+ */
+typedef
+void list_objects_filter_ext_free_fn(
+    const struct repository *r,
+    void *context
+);
+
+/*
+ * This is a corollary to `list_objects_filter__filter_object()`, and
+ * decides how to handle the object `obj`.
+ *
+ * omit provides a flag determining whether to explicitly add or remove
+ * the object from any current omitset.
+ */
+typedef
+enum list_objects_filter_result list_objects_filter_ext_filter_fn(
+	const struct repository *r,
+	const enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	enum list_objects_filter_omit *omit,
+	void *context
+);
+
+/*
+ * To implement a filter extension called "mine", you should define
+ * a const struct filter_extension called filter_extension_mine,
+ * in the following manner:
+ *
+ * const struct filter_extension filter_extension_mine = {
+ *     "mine",
+ *     &my_init_fn,
+ *     &my_filter_object_fn,
+ *     &my_free_fn
+ * };
+ *
+ * See contrib/filter-extensions/README.md for more details and examples.
+ */
+
+struct filter_extension {
+    const char *name;
+    list_objects_filter_ext_init_fn* init_fn;
+    list_objects_filter_ext_filter_fn* filter_object_fn;
+    list_objects_filter_ext_free_fn* free_fn;
+};
+
+/*
+ * The filter_extensions array is defined in list_objects_filter_extensions.c
+ * which is generated at compile time from the FILTER_EXTENSIONS variable.
+ */
+extern const struct filter_extension *filter_extensions[];
+
+
+#endif /* GIT_LIST_OBJECTS_FILTER_EXTENSIONS_H */
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index fd8d59f653a..e92499f29c2 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -15,6 +15,11 @@ static int parse_combine_filter(
 	const char *arg,
 	struct strbuf *errbuf);
 
+static int parse_extension_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf);
+
 const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 {
 	switch (c) {
@@ -31,6 +36,8 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 		return "sparse:oid";
 	case LOFC_OBJECT_TYPE:
 		return "object:type";
+	case LOFC_EXTENSION:
+		return "extension";
 	case LOFC_COMBINE:
 		return "combine";
 	case LOFC__COUNT:
@@ -91,6 +98,9 @@ static int gently_parse_list_objects_filter(
 		filter_options->choice = LOFC_SPARSE_OID;
 		return 0;
 
+	} else if (skip_prefix(arg, "extension:", &v0)) {
+		return parse_extension_filter(filter_options, v0, errbuf);
+
 	} else if (skip_prefix(arg, "sparse:path=", &v0)) {
 		if (errbuf) {
 			strbuf_addstr(
@@ -209,6 +219,41 @@ cleanup:
 	return result;
 }
 
+static int parse_extension_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf)
+{
+	int result = 0;
+	struct strbuf **params = strbuf_split_str(arg, '=', 2);
+
+	if (!params[0]) {
+		strbuf_addstr(errbuf, _("expected 'extension:<name>[=<parameter>]'"));
+		result = 1;
+		goto cleanup;
+	}
+
+	if (params[1]) {
+		// This extension has a parameter. Remove trailing "=" from the name.
+		size_t last = params[0]->len - 1;
+		assert(params[0]->buf[last] == '=');
+		strbuf_remove(params[0], last, 1);
+
+		filter_options->extension_value = xstrdup(params[1]->buf);
+	}
+
+	filter_options->extension_name = xstrdup(params[0]->buf);
+	filter_options->choice = LOFC_EXTENSION;
+
+cleanup:
+	strbuf_list_free(params);
+	if (result) {
+		list_objects_filter_release(filter_options);
+		memset(filter_options, 0, sizeof(*filter_options));
+	}
+	return result;
+}
+
 static int allow_unencoded(char ch)
 {
 	if (ch <= ' ' || ch == '%' || ch == '+')
@@ -349,6 +394,8 @@ void list_objects_filter_release(
 		return;
 	string_list_clear(&filter_options->filter_spec, /*free_util=*/0);
 	free(filter_options->sparse_oid_name);
+	free(filter_options->extension_name);
+	free(filter_options->extension_value);
 	for (sub = 0; sub < filter_options->sub_nr; sub++)
 		list_objects_filter_release(&filter_options->sub[sub]);
 	free(filter_options->sub);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index da5b6737e27..df3e360324e 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -15,6 +15,7 @@ enum list_objects_filter_choice {
 	LOFC_TREE_DEPTH,
 	LOFC_SPARSE_OID,
 	LOFC_OBJECT_TYPE,
+	LOFC_EXTENSION,
 	LOFC_COMBINE,
 	LOFC__COUNT /* must be last */
 };
@@ -58,6 +59,11 @@ struct list_objects_filter_options {
 	unsigned long tree_exclude_depth;
 	enum object_type object_type;
 
+	/* LOFC_EXTENSION values */
+
+	char *extension_name;
+	char *extension_value;
+
 	/* LOFC_COMBINE values */
 
 	/* This array contains all the subfilters which this filter combines. */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 1c1ee3d1bb1..037c674b1c3 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -10,6 +10,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "list-objects-filter-extensions.h"
 #include "oidmap.h"
 #include "oidset.h"
 #include "object-store.h"
@@ -620,6 +621,88 @@ static void filter_object_type__init(
 	filter->free_fn = free;
 }
 
+/*
+ * A filter which passes the objects to a compile-time extension.
+ * The extension needs to implement the filter_extension interface
+ * defined in list-objects-filter-extension.h.
+ * See contrib/filter-extensions/README.md
+ */
+
+struct filter_extension_data {
+	const struct filter_extension *extension;
+	void *context;
+};
+
+static enum list_objects_filter_result filter_extension_filter_object(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct oidset *omits,
+	void *filter_data)
+{
+	struct filter_extension_data *d = filter_data;
+
+	enum list_objects_filter_omit omit_it = LOFO_IGNORE;
+
+	enum list_objects_filter_result ret =
+		d->extension->filter_object_fn(
+			r,
+			filter_situation,
+			obj,
+			pathname,
+			filename,
+			&omit_it,
+			d->context);
+
+	if (omits) {
+		if (omit_it == LOFO_KEEP)
+			oidset_remove(omits, &obj->oid);
+		else if (omit_it == LOFO_OMIT)
+			oidset_insert(omits, &obj->oid);
+	}
+	return ret;
+}
+
+static void filter_extension_free(void *filter_data)
+{
+	struct filter_extension_data *d = filter_data;
+	d->extension->free_fn(the_repository, d->context);
+	free(d);
+}
+
+static void filter_extension__init(
+	struct list_objects_filter_options *filter_options,
+	struct filter *filter)
+{
+	struct filter_extension_data *d = xcalloc(1, sizeof(*d));
+	int i, r;
+
+	for (i = 0; filter_extensions[i] != NULL; i++) {
+		if (!strcmp(
+			filter_options->extension_name,
+			filter_extensions[i]->name))
+			break;
+	}
+	if (filter_extensions[i] == NULL) {
+		die(_("No filter extension found with name %s"),
+			filter_options->extension_name);
+	}
+	d->extension = filter_extensions[i];
+
+	r = d->extension->init_fn(
+		the_repository, filter_options->extension_value, &d->context);
+	if (r) {
+		die(_("Error initialising filter extension %s: %d"),
+			filter_options->extension_name, r);
+	}
+
+	filter->filter_data = d;
+	filter->filter_object_fn = &filter_extension_filter_object;
+	filter->free_fn = &filter_extension_free;
+}
+
 /* A filter which only shows objects shown by all sub-filters. */
 struct combine_filter_data {
 	struct subfilter *sub;
@@ -767,6 +850,7 @@ static filter_init_fn s_filters[] = {
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
 	filter_object_type__init,
+	filter_extension__init,
 	filter_combine__init,
 };
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] Makefile for list-object-filter extensions
  2021-09-05 23:51 [PATCH 0/4] Compile-time extensions for list-object-filter Andrew Olsen via GitGitGadget
  2021-09-05 23:51 ` [PATCH 1/4] " Andrew Olsen via GitGitGadget
@ 2021-09-05 23:51 ` Andrew Olsen via GitGitGadget
  2021-09-06  6:15   ` Bagas Sanjaya
  2021-09-05 23:51 ` [PATCH 3/4] Sample " Andrew Olsen via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Andrew Olsen via GitGitGadget @ 2021-09-05 23:51 UTC (permalink / raw)
  To: git; +Cc: Andrew Olsen, Andrew Olsen

From: Andrew Olsen <andrew.olsen@koordinates.com>

Custom list-object-filter extensions can be compiled into Git using the
FILTER_EXTENSIONS Makefile argument.

Signed-off-by: Andrew Olsen <andrew.olsen@koordinates.com>
---
 Makefile                            | 35 +++++++++++++++++++++++++++--
 compat/vcbuild/README               |  5 +++--
 config.mak.uname                    |  6 ++---
 contrib/buildsystems/CMakeLists.txt |  7 ++++++
 4 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 429c276058d..0b1d0be81a5 100644
--- a/Makefile
+++ b/Makefile
@@ -471,6 +471,11 @@ all::
 # directory, and the JSON compilation database 'compile_commands.json' will be
 # created at the root of the repository.
 #
+# Define FILTER_EXTENSIONS to a space-separated list of static library plugins
+# that implement the filter-object-list extension API. Each of this filter
+# extensions will then be available in addition to the builtin ones such as
+# "blob:limit" and "object:type". See contrib/filter-extensions/README.txt
+#
 # Define DEVELOPER to enable more compiler warnings. Compiler version
 # and family are auto detected, but could be overridden by defining
 # COMPILER_FEATURES (see config.mak.dev). You can still set
@@ -824,6 +829,7 @@ XDIFF_LIB = xdiff/lib.a
 
 GENERATED_H += command-list.h
 GENERATED_H += config-list.h
+GENERATED_C += list-objects-filter-extensions.c
 
 LIB_H := $(sort $(patsubst ./%,%,$(shell git ls-files '*.h' ':!t/' ':!Documentation/' 2>/dev/null || \
 	$(FIND) . \
@@ -916,6 +922,7 @@ LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
 LIB_OBJS += linear-assignment.o
+LIB_OBJS += list-objects-filter-extensions.o
 LIB_OBJS += list-objects-filter-options.o
 LIB_OBJS += list-objects-filter.o
 LIB_OBJS += list-objects.o
@@ -2116,6 +2123,19 @@ ifdef DEFAULT_HELP_FORMAT
 BASIC_CFLAGS += -DDEFAULT_HELP_FORMAT='"$(DEFAULT_HELP_FORMAT)"'
 endif
 
+ifneq ($(FILTER_EXTENSIONS),)
+FILTER_EXT_PATHS = $(dir $(FILTER_EXTENSIONS))
+
+$(FILTER_EXTENSIONS): $(FILTER_EXT_PATHS)
+	$(QUIET_SUBDIR0)$(@D) $(QUIET_SUBDIR1) \
+		ALL_CFLAGS='$(subst ','\'',$(ALL_CFLAGS))' \
+		ALL_LDFLAGS='$(subst ','\'',$(ALL_LDFLAGS))' \
+		PROFILE_DIR='$(subst ','\'',$(PROFILE_DIR))' \
+		$(@F)
+
+GITLIBS += $(FILTER_EXTENSIONS)
+endif
+
 PAGER_ENV_SQ = $(subst ','\'',$(PAGER_ENV))
 PAGER_ENV_CQ = "$(subst ",\",$(subst \,\\,$(PAGER_ENV)))"
 PAGER_ENV_CQ_SQ = $(subst ','\'',$(PAGER_ENV_CQ))
@@ -2222,7 +2242,7 @@ git.sp git.s git.o: EXTRA_CPPFLAGS = \
 	'-DGIT_MAN_PATH="$(mandir_relative_SQ)"' \
 	'-DGIT_INFO_PATH="$(infodir_relative_SQ)"'
 
-git$X: git.o GIT-LDFLAGS $(BUILTIN_OBJS) $(GITLIBS)
+git$X: git.o GIT-LDFLAGS $(BUILTIN_OBJS) $(GITLIBS) $(EXTENSION_LIBS)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) \
 		$(filter %.o,$^) $(LIBS)
 
@@ -2261,6 +2281,10 @@ command-list.h: $(wildcard Documentation/git*.txt)
 		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
 		command-list.txt >$@+ && mv $@+ $@
 
+list-objects-filter-extensions.c: generate-list-objects-filter-extensions.sh GIT-BUILD-OPTIONS
+	$(QUIET_GEN)$(SHELL_PATH) ./generate-list-objects-filter-extensions.sh \
+		$(FILTER_EXTENSIONS) > $@+ && mv $@+ $@
+
 SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
 	$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
 	$(gitwebdir_SQ):$(PERL_PATH_SQ):$(SANE_TEXT_GREP):$(PAGER_ENV):\
@@ -2612,6 +2636,7 @@ $(LIB_FILE): $(LIB_OBJS)
 $(XDIFF_LIB): $(XDIFF_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -2857,6 +2882,9 @@ ifdef RUNTIME_PREFIX
 	@echo RUNTIME_PREFIX=\'true\' >>$@+
 else
 	@echo RUNTIME_PREFIX=\'false\' >>$@+
+endif
+ifdef FILTER_EXTENSIONS
+	@echo FILTER_EXTENSIONS=\''$(subst ','\'',$(subst ','\'',$(FILTER_EXTENSIONS)))'\' >>$@+
 endif
 	@if cmp $@+ $@ >/dev/null 2>&1; then $(RM) $@+; else mv $@+ $@; fi
 
@@ -3241,7 +3269,7 @@ clean: profile-clean coverage-clean cocciclean
 	$(RM) $(HCC)
 	$(RM) -r bin-wrappers $(dep_dirs) $(compdb_dir) compile_commands.json
 	$(RM) -r po/build/
-	$(RM) *.pyc *.pyo */*.pyc */*.pyo $(GENERATED_H) $(ETAGS_TARGET) tags cscope*
+	$(RM) *.pyc *.pyo */*.pyc */*.pyo $(GENERATED_H) $(GENERATED_C) $(ETAGS_TARGET) tags cscope*
 	$(RM) -r .dist-tmp-dir .doc-tmp-dir
 	$(RM) $(GIT_TARNAME).tar.gz
 	$(RM) $(htmldocs).tar.gz $(manpages).tar.gz
@@ -3256,6 +3284,9 @@ endif
 ifndef NO_TCLTK
 	$(MAKE) -C gitk-git clean
 	$(MAKE) -C git-gui clean
+endif
+ifneq ($(FILTER_EXTENSIONS),)
+	$(foreach FP,$(FILTER_EXTENSIONS),$(MAKE) -C $(dir $(FP)) clean && ) true
 endif
 	$(RM) GIT-VERSION-FILE GIT-CFLAGS GIT-LDFLAGS GIT-BUILD-OPTIONS
 	$(RM) GIT-USER-AGENT GIT-PREFIX
diff --git a/compat/vcbuild/README b/compat/vcbuild/README
index 51fb083dbbe..5e39022eade 100644
--- a/compat/vcbuild/README
+++ b/compat/vcbuild/README
@@ -92,8 +92,9 @@ The Steps of Build Git with VS2008
    the git operations.
 
 3. Inside Git's directory run the command:
-       make command-list.h config-list.h
-   to generate the header file needed to compile git.
+       make command-list.h config-list.h list-objects-filter-extensions.c
+   to generate those source files that are not included in the repo, but
+   instead are automatically generated from other files.
 
 4. Then either build Git with the GNU Make Makefile in the Git projects
    root
diff --git a/config.mak.uname b/config.mak.uname
index 76516aaa9a5..405e7d91e7a 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -735,9 +735,9 @@ vcxproj:
 	 echo '</Project>') >git-remote-http/LinkOrCopyRemoteHttp.targets
 	git add -f git/LinkOrCopyBuiltins.targets git-remote-http/LinkOrCopyRemoteHttp.targets
 
-	# Add command-list.h and config-list.h
-	$(MAKE) MSVC=1 SKIP_VCPKG=1 prefix=/mingw64 config-list.h command-list.h
-	git add -f config-list.h command-list.h
+	# Add command-list.h, config-list.h list-objects-filter-extensions.c
+	$(MAKE) MSVC=1 SKIP_VCPKG=1 prefix=/mingw64 config-list.h command-list.h list-objects-filter-extensions.c
+	git add -f config-list.h command-list.h list-objects-filter-extensions.c
 
 	# Add scripts
 	rm -f perl/perl.mak
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index 171b4124afe..60627a2892f 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -624,6 +624,13 @@ if(NOT EXISTS ${CMAKE_BINARY_DIR}/config-list.h)
 			OUTPUT_FILE ${CMAKE_BINARY_DIR}/config-list.h)
 endif()
 
+if(NOT EXISTS ${CMAKE_BINARY_DIR}/list-objects-filter-extensions.c)
+	message("Generating list-objects-filter-extensions.c")
+	execute_process(COMMAND ${SH_EXE} ${CMAKE_SOURCE_DIR}/generate-list-objects-filter-extensions.sh
+			WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
+			OUTPUT_FILE ${CMAKE_BINARY_DIR}/list-objects-filter-extensions.c)
+endif()
+
 include_directories(${CMAKE_BINARY_DIR})
 
 #build
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] Sample list-object-filter extensions
  2021-09-05 23:51 [PATCH 0/4] Compile-time extensions for list-object-filter Andrew Olsen via GitGitGadget
  2021-09-05 23:51 ` [PATCH 1/4] " Andrew Olsen via GitGitGadget
  2021-09-05 23:51 ` [PATCH 2/4] Makefile for list-object-filter extensions Andrew Olsen via GitGitGadget
@ 2021-09-05 23:51 ` Andrew Olsen via GitGitGadget
  2021-09-05 23:51 ` [PATCH 4/4] Documentation for " Andrew Olsen via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Andrew Olsen via GitGitGadget @ 2021-09-05 23:51 UTC (permalink / raw)
  To: git; +Cc: Andrew Olsen, Andrew Olsen

From: Andrew Olsen <andrew.olsen@koordinates.com>

Basic filter extension example which filters to a random subset of
blobs, and another example which shows how to do the same in C++ and
how to link in another library required by a filter extension.
Documentation changes follow.

Signed-off-by: Andrew Olsen <andrew.olsen@koordinates.com>
---
 contrib/filter-extensions/rand/.gitignore     |   2 +
 contrib/filter-extensions/rand/Makefile       |  28 +++++
 contrib/filter-extensions/rand/rand.c         | 103 ++++++++++++++++++
 contrib/filter-extensions/rand_cpp/.gitignore |   2 +
 contrib/filter-extensions/rand_cpp/Makefile   |  34 ++++++
 .../rand_cpp/adapter_functions.c              |   6 +
 .../rand_cpp/adapter_functions.h              |  10 ++
 contrib/filter-extensions/rand_cpp/rand.cpp   | 103 ++++++++++++++++++
 8 files changed, 288 insertions(+)
 create mode 100644 contrib/filter-extensions/rand/.gitignore
 create mode 100644 contrib/filter-extensions/rand/Makefile
 create mode 100644 contrib/filter-extensions/rand/rand.c
 create mode 100644 contrib/filter-extensions/rand_cpp/.gitignore
 create mode 100644 contrib/filter-extensions/rand_cpp/Makefile
 create mode 100644 contrib/filter-extensions/rand_cpp/adapter_functions.c
 create mode 100644 contrib/filter-extensions/rand_cpp/adapter_functions.h
 create mode 100644 contrib/filter-extensions/rand_cpp/rand.cpp

diff --git a/contrib/filter-extensions/rand/.gitignore b/contrib/filter-extensions/rand/.gitignore
new file mode 100644
index 00000000000..9eca6c88cf2
--- /dev/null
+++ b/contrib/filter-extensions/rand/.gitignore
@@ -0,0 +1,2 @@
+*.a
+*.o
diff --git a/contrib/filter-extensions/rand/Makefile b/contrib/filter-extensions/rand/Makefile
new file mode 100644
index 00000000000..267221ee952
--- /dev/null
+++ b/contrib/filter-extensions/rand/Makefile
@@ -0,0 +1,28 @@
+# Run this via `FILTER_EXTENSIONS=contrib/filter-extensions/rand/rand.a make`
+# from the main git directory. That way we inherit useful variables.
+
+ifneq ($(findstring s,$(MAKEFLAGS)),s)
+ifndef V
+	QUIET_CC       = @echo '   ' CC $@;
+	QUIET_AR       = @echo '   ' AR $@;
+endif
+endif
+
+FILTER_STATIC_LIB = rand.a
+
+all: $(FILTER_STATIC_LIB)
+ifeq ($(MAKELEVEL),0)
+	$(error "Run via parent git make")
+endif
+	@:
+
+$(FILTER_STATIC_LIB): rand.o
+	$(QUIET_AR)$(AR) $(ARFLAGS) $@ $^
+
+rand.o: rand.c
+	$(QUIET_CC)$(CC) -c $(ALL_CFLAGS) $<
+
+clean:
+	$(RM) $(FILTER_STATIC_LIB) rand.o
+
+.PHONY: all clean
diff --git a/contrib/filter-extensions/rand/rand.c b/contrib/filter-extensions/rand/rand.c
new file mode 100644
index 00000000000..af153709345
--- /dev/null
+++ b/contrib/filter-extensions/rand/rand.c
@@ -0,0 +1,103 @@
+#include "../../../git-compat-util.h"
+#include "../../../list-objects-filter-extensions.h"
+#include "../../../object.h"
+#include "../../../hash.h"
+#include "../../../trace.h"
+
+
+static struct trace_key trace_filter = TRACE_KEY_INIT(FILTER);
+
+struct rand_context {
+	int percentageMatch;
+	int matchCount;
+	int blobCount;
+	int treeCount;
+	uint64_t started_at;
+};
+
+static int rand_init(
+	const struct repository *r,
+	const char *filter_arg,
+	void **context)
+{
+	struct rand_context *ctx = calloc(1, sizeof(struct rand_context));
+
+	ctx->percentageMatch = atoi(filter_arg);
+	if (ctx->percentageMatch > 100 || ctx->percentageMatch < 0) {
+	fprintf(stderr, "filter-rand: warning: invalid match %%: %s\n",
+		filter_arg);
+	ctx->percentageMatch = 1;  // default 1%
+	}
+	fprintf(stderr, "filter-rand: matching %d%%\n", ctx->percentageMatch);
+	ctx->started_at = getnanotime();
+	(*context) = ctx;
+
+	return 0;
+}
+
+static enum list_objects_filter_result rand_filter_object(
+	const struct repository *r,
+	const enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	enum list_objects_filter_omit *omit,
+	void *context)
+{
+	struct rand_context *ctx = (struct rand_context*)(context);
+
+	if ((ctx->blobCount + ctx->treeCount + 1) % 100000 == 0) {
+		fprintf(stderr, "filter-rand: %d...\n",
+			(ctx->blobCount + ctx->treeCount + 1));
+	}
+
+	switch (filter_situation) {
+	default:
+		die("filter-rand: unknown filter_situation: %d", filter_situation);
+
+	case LOFS_BEGIN_TREE:
+		ctx->treeCount++;
+		/* always include all tree objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_END_TREE:
+		return LOFR_ZERO;
+
+	case LOFS_BLOB:
+		ctx->blobCount++;
+
+		if ((rand() % 100) < ctx->percentageMatch) {
+			ctx->matchCount++;
+			trace_printf_key(&trace_filter,
+				"match: %s %s\n",
+				oid_to_hex(&obj->oid),
+				pathname
+			);
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		} else {
+			*omit = LOFO_OMIT;
+			return LOFR_MARK_SEEN; /* hard omit */
+		}
+	}
+}
+
+static void rand_free(const struct repository *r, void *context)
+{
+	struct rand_context *ctx = (struct rand_context*)(context);
+	double elapsed = (getnanotime() - ctx->started_at)/1E9;
+	int count = ctx->blobCount + ctx->treeCount;
+
+	fprintf(stderr, "filter-rand: done: count=%d (blob=%d tree=%d) "
+		"matched=%d elapsed=%fs rate=%0.1f/s average=%0.1fus\n",
+		count, ctx->blobCount, ctx->treeCount, ctx->matchCount,
+		elapsed, count/elapsed, elapsed/count*1E6);
+
+	free(ctx);
+}
+
+const struct filter_extension filter_extension_rand = {
+	"rand",
+	&rand_init,
+	&rand_filter_object,
+	&rand_free,
+};
diff --git a/contrib/filter-extensions/rand_cpp/.gitignore b/contrib/filter-extensions/rand_cpp/.gitignore
new file mode 100644
index 00000000000..9eca6c88cf2
--- /dev/null
+++ b/contrib/filter-extensions/rand_cpp/.gitignore
@@ -0,0 +1,2 @@
+*.a
+*.o
diff --git a/contrib/filter-extensions/rand_cpp/Makefile b/contrib/filter-extensions/rand_cpp/Makefile
new file mode 100644
index 00000000000..278121e3d5a
--- /dev/null
+++ b/contrib/filter-extensions/rand_cpp/Makefile
@@ -0,0 +1,34 @@
+# Run this via `FILTER_EXTENSIONS=contrib/filter-extensions/rand_cpp/rand_cpp.a make`
+# from the main git directory. That way we inherit useful variables.
+
+ifneq ($(findstring s,$(MAKEFLAGS)),s)
+ifndef V
+	QUIET_CC       = @echo '   ' CC $@;
+	QUIET_CXX       = @echo '   ' CXX $@;
+	QUIET_AR       = @echo '   ' AR $@;
+endif
+endif
+
+FILTER_STATIC_LIB = rand_cpp.a
+
+ALL_CXXFLAGS += -std=c++11
+
+all: $(FILTER_STATIC_LIB)
+ifeq ($(MAKELEVEL),0)
+	$(error "Run via parent git make")
+endif
+	@:
+
+$(FILTER_STATIC_LIB): rand.o adapter_functions.o
+	$(QUIET_AR)$(AR) $(ARFLAGS) $@ $^
+
+rand.o: rand.cpp
+	$(QUIET_CXX)$(CXX) -c $(ALL_CFLAGS) $(ALL_CXXFLAGS) $<
+
+adapter_functions.o: adapter_functions.c
+	$(QUIET_CC)$(CC) -c $(ALL_CFLAGS) $<
+
+clean:
+	$(RM) $(FILTER_STATIC_LIB) rand.o
+
+.PHONY: all clean
diff --git a/contrib/filter-extensions/rand_cpp/adapter_functions.c b/contrib/filter-extensions/rand_cpp/adapter_functions.c
new file mode 100644
index 00000000000..0d9d2a2aa96
--- /dev/null
+++ b/contrib/filter-extensions/rand_cpp/adapter_functions.c
@@ -0,0 +1,6 @@
+#include "../../../git-compat-util.h"
+#include "../../../object.h"
+
+char *obj_to_hex_oid(struct object *obj) {
+    return oid_to_hex(&obj->oid);
+}
diff --git a/contrib/filter-extensions/rand_cpp/adapter_functions.h b/contrib/filter-extensions/rand_cpp/adapter_functions.h
new file mode 100644
index 00000000000..1150c21a258
--- /dev/null
+++ b/contrib/filter-extensions/rand_cpp/adapter_functions.h
@@ -0,0 +1,10 @@
+#ifndef RAND_CPP_ADAPTER_FUNCTIONS_H
+#define RAND_CPP_ADAPTER_FUNCTIONS_H
+
+struct object;
+
+uint64_t getnanotime(void);
+
+char *obj_to_hex_oid(struct object *obj);
+
+#endif /* RAND_CPP_ADAPTER_FUNCTIONS_H */
diff --git a/contrib/filter-extensions/rand_cpp/rand.cpp b/contrib/filter-extensions/rand_cpp/rand.cpp
new file mode 100644
index 00000000000..cb608d14ed9
--- /dev/null
+++ b/contrib/filter-extensions/rand_cpp/rand.cpp
@@ -0,0 +1,103 @@
+#include <iomanip>
+#include <iostream>
+#include <sstream>
+
+#include <time.h>
+
+extern "C" {
+	#include "../../../list-objects-filter-extensions.h"
+	#include "adapter_functions.h"
+}
+
+namespace {
+
+struct rand_context {
+	int percentageMatch = 0;
+	int matchCount = 0;
+	int blobCount = 0;
+	int treeCount = 0;
+	uint64_t started_at = 0;
+};
+
+static int rand_init(
+	const struct repository *r,
+	const char *filter_arg,
+	void **context)
+{
+	struct rand_context *ctx = new rand_context();
+
+	ctx->percentageMatch = atoi(filter_arg);
+	if (ctx->percentageMatch > 100 || ctx->percentageMatch < 0) {
+		std::cerr << "filter-rand-cpp: warning: invalid match %: " << filter_arg << "\n";
+		ctx->percentageMatch = 1;  // default 1%
+	}
+	std::cerr << "filter-rand-cpp: matching " << ctx->percentageMatch << "%\n";
+	ctx->started_at = getnanotime();
+
+	return 0;
+}
+
+enum list_objects_filter_result rand_filter_object(
+	const struct repository *r,
+	const enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	enum list_objects_filter_omit *omit,
+	void *context)
+{
+	struct rand_context *ctx = static_cast<struct rand_context*>(context);
+
+	if ((ctx->blobCount + ctx->treeCount + 1) % 100000 == 0) {
+		std::cerr << "filter-rand-cpp: " << (ctx->blobCount + ctx->treeCount + 1) << "...\n";
+	}
+	switch (filter_situation) {
+	default:
+		std::cerr << "filter-rand-cpp: unknown filter_situation: " << filter_situation << "\n";
+		abort();
+
+	case LOFS_BEGIN_TREE:
+		ctx->treeCount++;
+		/* always include all tree objects */
+		return static_cast<list_objects_filter_result>(LOFR_MARK_SEEN | LOFR_DO_SHOW);
+
+	case LOFS_END_TREE:
+		return LOFR_ZERO;
+
+	case LOFS_BLOB:
+		ctx->blobCount++;
+
+		if ((rand() % 100) < ctx->percentageMatch) {
+			ctx->matchCount++;
+			std::cout << "match: " << obj_to_hex_oid(obj) << pathname << "\n";
+			return static_cast<list_objects_filter_result>(LOFR_MARK_SEEN | LOFR_DO_SHOW);
+		} else {
+			*omit = LOFO_OMIT;
+			return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+		}
+	}
+}
+
+void rand_free(const struct repository *r, void *context) {
+	struct rand_context *ctx = static_cast<struct rand_context*>(context);
+	double elapsed = (getnanotime() - ctx->started_at)/1E9;
+	int count = ctx->blobCount + ctx->treeCount;
+
+	std::cerr << "filter-rand-cpp: done: count=" << count
+		<< " (blob=" << ctx->blobCount << " tree=" << ctx->treeCount << ")"
+		<< " matched=" << ctx->matchCount
+		<< " elapsed=" << elapsed << "s"
+		<< " rate=" << count/elapsed << "/s"
+		<< " average=" << elapsed/count*1E6 << "us\n";
+
+	delete ctx;
+}
+
+} // namespace
+
+extern const struct filter_extension filter_extension_rand_cpp = {
+	"rand_cpp",
+	&rand_init,
+	&rand_filter_object,
+	&rand_free,
+};
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] Documentation for list-object-filter extensions
  2021-09-05 23:51 [PATCH 0/4] Compile-time extensions for list-object-filter Andrew Olsen via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-09-05 23:51 ` [PATCH 3/4] Sample " Andrew Olsen via GitGitGadget
@ 2021-09-05 23:51 ` Andrew Olsen via GitGitGadget
  2021-09-06  0:49 ` [PATCH 0/4] Compile-time extensions for list-object-filter Ævar Arnfjörð Bjarmason
  2021-09-06  6:18 ` Bagas Sanjaya
  5 siblings, 0 replies; 11+ messages in thread
From: Andrew Olsen via GitGitGadget @ 2021-09-05 23:51 UTC (permalink / raw)
  To: git; +Cc: Andrew Olsen, Andrew Olsen

From: Andrew Olsen <andrew.olsen@koordinates.com>

Explains how to develop a custom extension for list-objects-filter
behavior, and how to compile it into a custom build of Git using the
FILTER_EXTENSIONS Makefile argument.

Signed-off-by: Andrew Olsen <andrew.olsen@koordinates.com>
---
 Documentation/config/uploadpack.txt  |   7 +-
 Documentation/rev-list-options.txt   |   4 +
 contrib/filter-extensions/README.txt | 153 +++++++++++++++++++++++++++
 3 files changed, 161 insertions(+), 3 deletions(-)
 create mode 100644 contrib/filter-extensions/README.txt

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index 32fad5bbe81..b2ef2421a6d 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -66,9 +66,10 @@ uploadpackfilter.allow::
 uploadpackfilter.<filter>.allow::
 	Explicitly allow or ban the object filter corresponding to
 	`<filter>`, where `<filter>` may be one of: `blob:none`,
-	`blob:limit`, `object:type`, `tree`, `sparse:oid`, or `combine`.
-	If using combined filters, both `combine` and all of the nested
-	filter kinds must be allowed. Defaults to `uploadpackfilter.allow`.
+	`blob:limit`, `tree`, `sparse:oid`, `combine`, or a named filter extension
+	`extension:<name>`. If using combined filters, both `combine` and all of
+	the nested filter kinds must be allowed. Defaults to
+	`uploadpackfilter.allow`.
 
 uploadpackfilter.tree.maxDepth::
 	Only allow `--filter=tree:<n>` when `<n>` is no more than the value of
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index b7bd27e1713..d7a317f0aa1 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -914,6 +914,10 @@ Note that the form '--filter=sparse:path=<path>' that wants to read
 from an arbitrary path on the filesystem has been dropped for security
 reasons.
 +
+The form '--filter=extension:<name>[=<arg>]' uses a compile-time extension
+to implement a named filter. Filter extensions may take an argument string
+which is passed via `<arg>`.
++
 Multiple '--filter=' flags can be specified to combine filters. Only
 objects which are accepted by every filter are included.
 +
diff --git a/contrib/filter-extensions/README.txt b/contrib/filter-extensions/README.txt
new file mode 100644
index 00000000000..3d5921cda9b
--- /dev/null
+++ b/contrib/filter-extensions/README.txt
@@ -0,0 +1,153 @@
+= List-Objects-Filter Extensions API
+:pp: {plus}{plus}
+
+This API can be used to develop filter extensions used for custom filtering
+behaviour with `git-upload-pack` and `git-rev-list`. The API is defined in
+link:../../list-objects-filter-extensions.h[list-objects-filter-extensions.h]
+and defines three functions to implement a filter operation.
+
+NOTE: Each filter implementing this API must compiled into Git as a
+static library. There is some plumbing in the Makefile to help with this
+via `FILTER_EXTENSIONS`.
+
+== Overview
+
+. You write a filter and compile it into your custom build of git.
+. A filter request is received that specifically names the filter extension
+that you have written, ie: `--filter=extension:<name>[=<arg>]`
+. The `init_fn` function of your filter is called.
+. The `filter_object_fn` function of your filter is called for each object
+at least once.
+. The `free_fn` function of your filter is called.
+
+== Examples
+
+*link:./rand/[`rand`]* is a filter that matches all trees and a random
+percentage of blobs, where the percentage is parsed from the filter arg. It
+imports and uses the `oid_to_hex()` and `trace_key_printf()` functions from the
+Git API.
+
+Build via:
+
+[,console]
+----
+$ make FILTER_EXTENSIONS=contrib/filter-extensions/rand/rand.a
+    ...
+    SUBDIR contrib/filter-extensions/rand
+    ...
+----
+
+We can run against git's own repo:
+
+[,console]
+----
+$ ./git rev-list refs/heads/master --objects --max-count 1 --filter=extension:rand=3 --filter-print-omitted | grep -c '^~'
+filter-rand: matching 3%
+filter-rand: done: count=4068 (blob=3866 tree=202) matched=117 elapsed=0.005017s rate=810843.1/s average=1.2us
+3749  # number of omitted blobs = 3866 - 117
+----
+
+== Development
+
+See the examples for a basic implementation. The comments in
+link:../../list-objects-filter.h[`list-objects-filter.h`] and the built-in
+filter implementations in
+link:../../list-objects-filter.c[`list-objects-filter.c`] are important to
+understand how filters are implemented - `filter_blobs_limit()` provides a
+simple example, and `filter_sparse()` is more complex.
+
+The API differences between the built-in filters and the filter extensions:
+
+. Filter extensions don't handle ``omitset``s directly, instead setting `omit`.
+. Filter extensions receive a void pointer they can use for context.
+
+== Building
+
+There is some plumbing in the Git Makefile to help with this via
+`FILTER_EXTENSIONS`, setting it to space-separated paths of the filter extension
+static libraries indicates that these filters should be compiled into git.
+For example:
+
+[,console]
+----
+make FILTER_EXTENSIONS=contrib/filter-extensions/rand/rand.a
+----
+
+Filter extensions don't need to be within the Git source tree. A filter
+extension static library should either exist at the given path - ie, `rand.a`
+should exist - or there should be a Makefile in that directory which will create
+it when `make rand.a` is run. (Such a Makefile should also have a `clean` target
+which deletes all object files and brings the directory back to its initial
+state).
+
+The static library should define a struct of type `filter_extension` called
+`filter_extension_NAME` where `NAME` is the name of your extension (ie `rand`
+for `rand.a`). See
+link:../../list-objects-filter-extensions.h[list-objects-filter-extensions.h]
+
+This definition should follow the following pattern:
+
+[,C]
+----
+#include "list-objects-filter-extensions.h"
+
+/* Definitions of rand_init, rand_filter_object, rand_free ... */
+
+const struct filter_extension filter_extension_rand = {
+    "rand",
+    &rand_init,
+    &rand_filter_object,
+    &rand_free,
+};
+----
+
+(The names of your `init_fn`, `filter_object_fn` and `free_fn` are not
+important, but the string literal should again be the the name of your extension
+- `"rand"` for the filter extension in `rand.a`.)
+
+You may use library functions from Git if you include the relevant Git headers,
+since the filter extensions and Git itself will be linked together into a single
+binary.
+
+You may depend on other libraries if you indicate that they are to be linked
+into the Git binary using `LDFLAGS`. See the C{pp} example below.
+
+== Developing in C{pp} (and other languages)
+
+You can develop filter extensions with C{pp}, but many Git header files are not
+compatible with modern C{pp}, so you won't be able to directly use Git library
+functions. However, you can use them if you create wrapper functions in C that
+delegates to the Git library functions you need, but which are also C{pp}
+compatible. See link:./rand_cpp/[`rand_cpp`] for a simple example. A similar
+solution would be to implement the extension itself in C, and have the
+extension do any operations that require Git library functions, but have it
+delegate to a C wrapper API that you add to a C{pp} library that already
+contains the domain-specific operations that you need. In either case, remember
+to wrap any functions that must be C-compatible with `extern C` when declaring
+or defining them from within C{pp}.
+
+To build the C{pp} example:
+
+[,console]
+----
+make FILTER_EXTENSIONS=contrib/filter-extensions/rand_cpp/rand_cpp.a \
+     LDFLAGS=-lstdc++
+----
+
+For other languages you'll either need to port definitions of some internal Git
+structs (at a minimum, `object`, `object_id`, `repository`, and `hash_algo`) -
+or again, you could write the extension in C but have it delegate to a domain
+specific library in the language of your choice that has a C-compatible API.
+Extra libraries can be required using `LDFLAGS`.
+
+== Linking more than one filter extension
+
+To link in more than one extension, set `FILTER_EXTENSIONS` to the
+space-separated paths of all the extensions you want linked. For example, to
+link in both example filters at once:
+
+[,console]
+----
+make FILTER_EXTENSIONS="contrib/filter-extensions/rand/rand.a contrib/filter-extensions/rand_cpp/rand_cpp.a" \
+     LDFLAGS=-lstdc++
+----
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Compile-time extensions for list-object-filter
  2021-09-05 23:51 [PATCH 0/4] Compile-time extensions for list-object-filter Andrew Olsen via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-09-05 23:51 ` [PATCH 4/4] Documentation for " Andrew Olsen via GitGitGadget
@ 2021-09-06  0:49 ` Ævar Arnfjörð Bjarmason
  2021-09-06  6:18 ` Bagas Sanjaya
  5 siblings, 0 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-06  0:49 UTC (permalink / raw)
  To: Andrew Olsen via GitGitGadget; +Cc: git, Andrew Olsen


On Sun, Sep 05 2021, Andrew Olsen via GitGitGadget wrote:

> Adds an extension: option to list-object-filters, these are implemented by
> static libraries that must be compiled into Git. The Makefile argument
> FILTER_EXTENSIONS makes it easier to compile these extensions into a custom
> build of Git. When no custom filter-extensions are supplied, Git works as
> normal.

Having skimmed this and the added documentation I think what's really
missing is a "why"? What concrete use-case is this going to serve?

I.e. what is an extension you have in mind that's useful, but not so
useful as to even suggest it for inclusion in git.git before coming up
with this plug-in API mechanism?

Also, for such plug-ins the license is going to be GPL-v2 too I assume?
But that aspect isn't covered at all.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] Makefile for list-object-filter extensions
  2021-09-05 23:51 ` [PATCH 2/4] Makefile for list-object-filter extensions Andrew Olsen via GitGitGadget
@ 2021-09-06  6:15   ` Bagas Sanjaya
  0 siblings, 0 replies; 11+ messages in thread
From: Bagas Sanjaya @ 2021-09-06  6:15 UTC (permalink / raw)
  To: Andrew Olsen via GitGitGadget, git; +Cc: Andrew Olsen, Andrew Olsen

On 06/09/21 06.51, Andrew Olsen via GitGitGadget wrote:
> From: Andrew Olsen <andrew.olsen@koordinates.com>
> 
> Custom list-object-filter extensions can be compiled into Git using the
> FILTER_EXTENSIONS Makefile argument.
> 

This can be squashed to previous patch.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Compile-time extensions for list-object-filter
  2021-09-05 23:51 [PATCH 0/4] Compile-time extensions for list-object-filter Andrew Olsen via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-09-06  0:49 ` [PATCH 0/4] Compile-time extensions for list-object-filter Ævar Arnfjörð Bjarmason
@ 2021-09-06  6:18 ` Bagas Sanjaya
  2021-09-07  0:37   ` Andrew Olsen
  5 siblings, 1 reply; 11+ messages in thread
From: Bagas Sanjaya @ 2021-09-06  6:18 UTC (permalink / raw)
  To: Andrew Olsen via GitGitGadget, git; +Cc: Andrew Olsen

On 06/09/21 06.51, Andrew Olsen via GitGitGadget wrote:
> Adds an extension: option to list-object-filters, these are implemented by
> static libraries that must be compiled into Git. The Makefile argument
> FILTER_EXTENSIONS makes it easier to compile these extensions into a custom
> build of Git. When no custom filter-extensions are supplied, Git works as
> normal.

I don't see why this series is useful (use cases?).

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Compile-time extensions for list-object-filter
  2021-09-06  6:18 ` Bagas Sanjaya
@ 2021-09-07  0:37   ` Andrew Olsen
  2021-09-07  8:59     ` Ævar Arnfjörð Bjarmason
  2021-09-08 14:23     ` Robert Coup
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Olsen @ 2021-09-07  0:37 UTC (permalink / raw)
  To: Bagas Sanjaya, Robert Coup
  Cc: Andrew Olsen via GitGitGadget, git, Andrew Olsen

Good point - sorry I sent this out without accompanying explanation. I'm still
learning about contributing to Git.

The filter extension that I want to implement is a spatial filter - it will
return blobs that store geometries that intersect with a given geometry, eg,
"only return blobs in North America". This is useful to us at kartproject.org,
"distributed version control for geospatial data", which is built on Git. But
safe to say that this functionality is not generally useful to Git users.

However, the idea we have is that there will be others who want to implement
custom filters also - perhaps like the spatial filter, these could be
domain-specific filters that are not useful to most Git users, but allow for
a custom Git to be more powerful when storing data from a particular domain.
We could just fork git and do what we want with the fork, but defining a plugin
interface makes it possible for us to keep using Git at master, instead of
maintaining a fork indefinitely.

My colleague Robert Coup coded this up once already as a plugin library
interface that could be loaded at runtime, and I've been tasked with rewriting
it as a compile-time interface, which he thought was "more likely" (but of
course not guaranteed) to be accepted as a worthwhile change to Git. He's
unfortunately on the other side of the world to me and not working today, but
I hope when he reappears he'll be able to say something more in defence of this
idea, and perhaps give a history of the reasoning for this particular solution.

Regarding licenses: the sample extensions I'm contributing will be covered by
Git's GPL-v2 (I assume), if they make it into the Git repository. Any other
extensions that may be written by third party authors and are maintained
elsewhere could be licensed as those authors see fit, as long as they take care
not to violate the terms of Git's GPL-v2 when they distribute the extension or
Git and the extension together. I could add a link to the GPL-v2 in the README
warning developers to check it before distributing any kind of extension to Git.
I'm not a lawyer and wouldn't want to give more specific advice than that.

On Tue, Sep 7, 2021 at 11:24 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On 06/09/21 06.51, Andrew Olsen via GitGitGadget wrote:
> > Adds an extension: option to list-object-filters, these are implemented by
> > static libraries that must be compiled into Git. The Makefile argument
> > FILTER_EXTENSIONS makes it easier to compile these extensions into a custom
> > build of Git. When no custom filter-extensions are supplied, Git works as
> > normal.
>
> I don't see why this series is useful (use cases?).
>
> --
> An old man doll... just what I always wanted! - Clara
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Compile-time extensions for list-object-filter
  2021-09-07  0:37   ` Andrew Olsen
@ 2021-09-07  8:59     ` Ævar Arnfjörð Bjarmason
  2021-09-08 14:23     ` Robert Coup
  1 sibling, 0 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07  8:59 UTC (permalink / raw)
  To: Andrew Olsen
  Cc: Bagas Sanjaya, Robert Coup, Andrew Olsen via GitGitGadget, git,
	Andrew Olsen


On Tue, Sep 07 2021, Andrew Olsen wrote:

> Good point - sorry I sent this out without accompanying explanation. I'm still
> learning about contributing to Git.
>
> The filter extension that I want to implement is a spatial filter - it will
> return blobs that store geometries that intersect with a given geometry, eg,
> "only return blobs in North America". This is useful to us at kartproject.org,
> "distributed version control for geospatial data", which is built on Git. But
> safe to say that this functionality is not generally useful to Git users.

That's interesting. I think you're probably right that returning blobs
by a GEO filter is not going to be generally useful to git (I assume
it's aware of some specially-encoded blobs?), but a mechanism that
enables that might be...

> However, the idea we have is that there will be others who want to implement
> custom filters also - perhaps like the spatial filter, these could be
> domain-specific filters that are not useful to most Git users, but allow for
> a custom Git to be more powerful when storing data from a particular domain.
> We could just fork git and do what we want with the fork, but defining a plugin
> interface makes it possible for us to keep using Git at master, instead of
> maintaining a fork indefinitely.
>
> [...]
>
> My colleague Robert Coup coded this up once already as a plugin library
> interface that could be loaded at runtime, and I've been tasked with rewriting
> it as a compile-time interface, which he thought was "more likely" (but of
> course not guaranteed) to be accepted as a worthwhile change to Git. He's
> unfortunately on the other side of the world to me and not working today, but
> I hope when he reappears he'll be able to say something more in defence of this
> idea, and perhaps give a history of the reasoning for this particular solution.

While it would be easier for you it would leave this project stuck
maintaining a C API interface, and indeed your documentation suggests
that not only should users use the narrow C API provided here, but any
arbitrary internal structs in git.git.

Personally I'm not per-se opposed to such a thing, but I think that we
should really be considering and trying something like the clean/smudge
hook interface first rather than a full C API.

This seems like a perfect fit for such an IPC interface, i.e. we'd have
a hook to register custom filters, and when it came to filtering objects
git would communicate with that hook, which in turn would query data
with something like the "git cat-file --batch" interface.

> Regarding licenses: the sample extensions I'm contributing will be covered by
> Git's GPL-v2 (I assume), if they make it into the Git repository. Any other
> extensions that may be written by third party authors and are maintained
> elsewhere could be licensed as those authors see fit, as long as they take care
> not to violate the terms of Git's GPL-v2 when they distribute the extension or
> Git and the extension together. I could add a link to the GPL-v2 in the README
> warning developers to check it before distributing any kind of extension to Git.
> I'm not a lawyer and wouldn't want to give more specific advice than that.

I believe you've misunderstood how the GPL works, those third party
authors would not be free to license their plugins as they see fit. The
reason the LGPL license exists is to allow what you're describing, but
git uses the full GPL v2.

See https://en.wikipedia.org/wiki/GPL_linking_exception and
https://www.gnu.org/licenses/gpl-faq.html#LinkingWithGPL

The project you've linked to even has a GPL linking exception of its own
(but git.git does not):
https://github.com/koordinates/kart/blob/2934f2b951d61233cbaab9ff627aa3c8cbfb82bc/COPYING#L7-L16

> On Tue, Sep 7, 2021 at 11:24 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>>
>> On 06/09/21 06.51, Andrew Olsen via GitGitGadget wrote:
>> > Adds an extension: option to list-object-filters, these are implemented by
>> > static libraries that must be compiled into Git. The Makefile argument
>> > FILTER_EXTENSIONS makes it easier to compile these extensions into a custom
>> > build of Git. When no custom filter-extensions are supplied, Git works as
>> > normal.
>>
>> I don't see why this series is useful (use cases?).
>>
>> --
>> An old man doll... just what I always wanted! - Clara
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Compile-time extensions for list-object-filter
  2021-09-07  0:37   ` Andrew Olsen
  2021-09-07  8:59     ` Ævar Arnfjörð Bjarmason
@ 2021-09-08 14:23     ` Robert Coup
  1 sibling, 0 replies; 11+ messages in thread
From: Robert Coup @ 2021-09-08 14:23 UTC (permalink / raw)
  To: Bagas Sanjaya, Ævar Arnfjörð Bjarmason
  Cc: Andrew Olsen via GitGitGadget, git, Andrew Olsen, Jeff Hostetler,
	Andrew Olsen, Derrick Stolee

Hi all,

Sorry, life got in the way at an unfortunate moment. And it should
very much be tagged "RFC" — thanks Ævar and Bagas for reading. Here's
the additional background you could have used earlier on — I've
bundled it together, but I'll happily follow up specific questions
individually. I've CCed in a couple of other people who might find it
interesting too.

So Andrew & my motivation here is to provide some specialised
filtering at clone/fetch time. In Kart[1] datasets are organised
(simplistically) by primary key, but for spatial data we want to
provide an orthogonal spatial extent filter which isn't part of the
tree path, so we can't reuse the work done in the sparse filters. For
a fetch obviously the server-side will require support for any
indexing and ultimately deciding whether a particular blob should be
part of the tree or not.

In the original filter implementation [2], various "profiles" were
alluded to as a case where the server operator might know a lot more
about how the developer would want to use the repository than the
client does, and a named profile for the server to interpret would be
a reasonably clean approach. Referred again in [3]. Sparse filters,
subject to the performance issues hopefully being improved by the
cone-mode changes, cater to a lot of them. The existing built-in
filters are fairly simple and there's a relatively simple interface
for them to implement, extending them seems like a reasonable approach
to me — potentially allowing people doing interesting things with
partial clones to take it and run in a general way without too much
effort.

So the key element to clarify/understand for this proposal is that the
main change to Git is the ability to use
`--filter=extension:<name>[=<param>]` which passes through to
git-upload-pack on the server side, to rev-list, which looks up /
validates the filter name/parameter and applies it. So if you want to
offer a custom filter, you build & set it up on the server and any Git
client (if this is merged) can make use of it without any additional
code.

Wrt IPC, my very first proof of concept used an external process that
rev-list launched, passed a series of oids/types via stdin, receiving
yes/no responses via stdout. Even after quite a lot of OS-specific
efforts to optimise the data flow across the pipes it was slow for
non-trivial sized repositories (where it matters) — essentially
boiling down to too much context switching between processes.
Reorganising the existing filtering approach to do batching with
deferred responses and parallelising the filtering into threads seemed
like an awful lot of effort for potentially little gain, in a niche
use case.

Moving it in-process made it perform well: CPU use moves into the
"deciding whether this object is in or out" phase rather than burning
it in IPC & context-switching. I did build up a basic runtime-loadable
plugin approach, but there was a reasonable amount of the internal git
API that the filters need/touched (even things like hash sizes add a
pile of complexity to it) unless it was reduced back to passing
oids+types. My approach for plugins was basically "could I potentially
implement the existing filters?" Without more of the git API I don't
think this would be feasible. Plus Git would have to agree on and
support a public ABI going forward, which for a potentially niche use
case didn't seem reasonable to propose.

Hence compile time: simpler; no ABI issues; the internal API doesn't
change that much wrt things that filters are likely to do — if someone
creates a plugin then it's on them to keep it building across git
upgrades on their server; platform support is simpler; and if others
find exciting uses for it then a runtime-loadable plugin API is always
possible in future. And only the server ever needs any custom
binaries.

Licensing — yes, any filters would need to be GPL-licensed since
they're compiled with Git. Only the server operator needs to concern
themselves with complying with this (& associated licensing for any
external libraries/etc a plugin might need) since that's where the
plugin code is linked & runs. With the usual issue around internal use
within an organisation not qualifying as "distribution" under the GPL.
FWIW, for Kart we'll be GPL-licensing the server-side spatial filter
plugin code for anyone who's interested.

Hope this clarifies a bit.

Rob :)

[1] https://kartproject.org — building on Git to version geospatial
datasets. Not sure if the videos ever got released (thanks Covid), but
I did a talk at Git Merge 2020 on it when we released the first alpha.
[2] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/
[3] https://public-inbox.org/git/79b06312-75ca-5a50-c337-dc6715305edb@jeffhostetler.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-08 14:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-05 23:51 [PATCH 0/4] Compile-time extensions for list-object-filter Andrew Olsen via GitGitGadget
2021-09-05 23:51 ` [PATCH 1/4] " Andrew Olsen via GitGitGadget
2021-09-05 23:51 ` [PATCH 2/4] Makefile for list-object-filter extensions Andrew Olsen via GitGitGadget
2021-09-06  6:15   ` Bagas Sanjaya
2021-09-05 23:51 ` [PATCH 3/4] Sample " Andrew Olsen via GitGitGadget
2021-09-05 23:51 ` [PATCH 4/4] Documentation for " Andrew Olsen via GitGitGadget
2021-09-06  0:49 ` [PATCH 0/4] Compile-time extensions for list-object-filter Ævar Arnfjörð Bjarmason
2021-09-06  6:18 ` Bagas Sanjaya
2021-09-07  0:37   ` Andrew Olsen
2021-09-07  8:59     ` Ævar Arnfjörð Bjarmason
2021-09-08 14:23     ` Robert Coup

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).