From: larsxschneider@gmail.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, jnareb@gmail.com, peff@peff.net,
ramsay@ramsayjones.plus.com, tboegi@web.de,
Lars Schneider <larsxschneider@gmail.com>
Subject: [PATCH v11 00/14] Git filter protocol
Date: Sun, 16 Oct 2016 16:20:24 -0700 [thread overview]
Message-ID: <20161016232038.84951-1-larsxschneider@gmail.com> (raw)
From: Lars Schneider <larsxschneider@gmail.com>
The goal of this series is to avoid launching a new clean/smudge filter
process for each file that is filtered.
A short summary about v1 to v5 can be found here:
https://git.github.io/rev_news/2016/08/17/edition-18/
This series is also published on web:
https://github.com/larsxschneider/git/pull/15
Patches 1 and 2 are cleanups and not strictly necessary for the series.
Patches 3 to 12 are required preparation. Patch 13 is the main patch.
Patch 14 adds an example how to use the Git filter protocol in contrib.
Thanks a lot to
Jakub, Junio, Ramsay, Dscho, Torsten and Peff
for very helpful reviews,
Lars
## Changes since v10
* clarify documentation (Jakub)
* change "<capability>=true" to "capability=<capability>" in protocol (Jakub)
* make stop_multi_file_filter() static (Ramsay)
* add flush packet after version negotiation for consistency (Jakub)
* fix pid_t translation error on Windows (Dscho)
* fix Unquoted references in tests (Junio)
* remove smudge invocation count in tests (Junio)
Lars Schneider (14):
convert: quote filter names in error messages
convert: modernize tests
run-command: move check_pipe() from write_or_die to run_command
run-command: add clean_on_exit_handler
pkt-line: rename packet_write() to packet_write_fmt()
pkt-line: extract set_packet_header()
pkt-line: add packet_write_fmt_gently()
pkt-line: add packet_flush_gently()
pkt-line: add packet_write_gently()
pkt-line: add functions to read/write flush terminated packet streams
convert: make apply_filter() adhere to standard Git error handling
convert: prepare filter.<driver>.process option
convert: add filter.<driver>.process option
contrib/long-running-filter: add long running filter example
Documentation/gitattributes.txt | 157 ++++++++++-
builtin/archive.c | 4 +-
builtin/receive-pack.c | 4 +-
builtin/remote-ext.c | 4 +-
builtin/upload-archive.c | 4 +-
connect.c | 2 +-
contrib/long-running-filter/example.pl | 128 +++++++++
convert.c | 375 +++++++++++++++++++++---
daemon.c | 2 +-
http-backend.c | 2 +-
pkt-line.c | 152 +++++++++-
pkt-line.h | 12 +-
run-command.c | 39 ++-
run-command.h | 4 +-
shallow.c | 2 +-
t/t0021-conversion.sh | 502 ++++++++++++++++++++++++++++++---
t/t0021/rot13-filter.pl | 192 +++++++++++++
upload-pack.c | 30 +-
write_or_die.c | 13 -
19 files changed, 1495 insertions(+), 133 deletions(-)
create mode 100755 contrib/long-running-filter/example.pl
create mode 100755 t/t0021/rot13-filter.pl
## Interdiff (v10..v11)
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index a182ef2..976243a 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -297,9 +297,11 @@ upon checkin. By default these commands process only a single
blob and terminate. If a long running `process` filter is used
in place of `clean` and/or `smudge` filters, then Git can process
all blobs with a single filter command invocation for the entire
-life of a single Git command, for example `git add --all`. See
-section below for the description of the protocol used to
-communicate with a `process` filter.
+life of a single Git command, for example `git add --all`. If a
+long running `process` filter is configured then it always takes
+precedence over a configured single blob filter. See section
+below for the description of the protocol used to communicate with
+a `process` filter.
One use of the content filtering is to massage the content into a shape
that is more convenient for the platform, filesystem, and the user to use.
@@ -393,10 +395,10 @@ text and therefore are terminated by a LF.
Git starts the filter when it encounters the first file
that needs to be cleaned or smudged. After the filter started
-Git sends a welcome message ("git-filter-client"), a list of
-supported protocol version numbers, and a flush packet. Git expects
-to read a welcome response message ("git-filter-server") and exactly
-one protocol version number from the previously sent list. All further
+Git sends a welcome message ("git-filter-client"), a list of supported
+protocol version numbers, and a flush packet. Git expects to read a welcome
+response message ("git-filter-server"), exactly one protocol version number
+from the previously sent list, and a flush packet. All further
communication will be based on the selected version. The remaining
protocol description below documents "version=2". Please note that
"version=42" in the example below does not exist and is only there
@@ -414,12 +416,13 @@ packet: git> version=42
packet: git> 0000
packet: git< git-filter-server
packet: git< version=2
-packet: git> clean=true
-packet: git> smudge=true
-packet: git> not-yet-invented=true
+packet: git< 0000
+packet: git> capability=clean
+packet: git> capability=smudge
+packet: git> capability=not-yet-invented
packet: git> 0000
-packet: git< clean=true
-packet: git< smudge=true
+packet: git< capability=clean
+packet: git< capability=smudge
packet: git< 0000
------------------------
Supported filter capabilities in version 2 are "clean" and
@@ -428,7 +431,7 @@ Supported filter capabilities in version 2 are "clean" and
Afterwards Git sends a list of "key=value" pairs terminated with
a flush packet. The list will contain at least the filter command
(based on the supported capabilities) and the pathname of the file
-to filter relative to the repository root. Right after these packets
+to filter relative to the repository root. Right after the flush packet
Git sends the content split in zero or more pkt-line packets and a
flush packet to terminate content. Please note, that the filter
must not send any response before it received the content and the
@@ -447,7 +450,10 @@ problems then the list must contain a "success" status. Right after
these packets the filter is expected to send the content in zero
or more pkt-line packets and a flush packet at the end. Finally, a
second list of "key=value" pairs terminated with a flush packet
-is expected. The filter can change the status in the second list.
+is expected. The filter can change the status in the second list
+or keep the status as is with an empty list. Please note that the
+empty list must be terminated with a flush packet regardless.
+
------------------------
packet: git< status=success
packet: git< 0000
@@ -457,7 +463,7 @@ packet: git< 0000 # empty list, keep "status=success" unchanged!
------------------------
If the result content is empty then the filter is expected to respond
-with a "success" status and an empty list.
+with a "success" status and a flush packet to signal the empty content.
------------------------
packet: git< status=success
packet: git< 0000
@@ -466,9 +472,7 @@ packet: git< 0000 # empty list, keep "status=success" unchanged!
------------------------
In case the filter cannot or does not want to process the content,
-it is expected to respond with an "error" status. Depending on the
-`filter.<driver>.required` flag Git will interpret that as error
-but it will not stop or restart the filter process.
+it is expected to respond with an "error" status.
------------------------
packet: git< status=error
packet: git< 0000
@@ -476,9 +480,7 @@ packet: git< 0000
If the filter experiences an error during processing, then it can
send the status "error" after the content was (partially or
-completely) sent. Depending on the `filter.<driver>.required` flag
-Git will interpret that as error but it will not stop or restart the
-filter process.
+completely) sent.
------------------------
packet: git< status=success
packet: git< 0000
@@ -488,31 +490,31 @@ packet: git< status=error
packet: git< 0000
------------------------
-If the filter dies during the communication or does not adhere to
-the protocol then Git will stop the filter process and restart it
-with the next file that needs to be processed. Depending on the
-`filter.<driver>.required` flag Git will interpret that as error.
-
-The error handling for all cases above mimic the behavior of
-the `filter.<driver>.clean` / `filter.<driver>.smudge` error
-handling.
-
In case the filter cannot or does not want to process the content
as well as any future content for the lifetime of the Git process,
-it is expected to respond with an "abort" status at any point in
-the protocol. Depending on the `filter.<driver>.required` flag Git
-will interpret that as error for the content as well as any future
-content for the lifetime of the Git process but it will not stop or
-restart the filter process.
+then it is expected to respond with an "abort" status at any point
+in the protocol.
------------------------
packet: git< status=abort
packet: git< 0000
------------------------
+Git neither stops nor restarts the filter process in case the
+"error"/"abort" status is set. However, Git sets its exit code
+according to the `filter.<driver>.required` flag, mimicking the
+behavior of the `filter.<driver>.clean` / `filter.<driver>.smudge`
+mechanism.
+
+If the filter dies during the communication or does not adhere to
+the protocol then Git will stop the filter process and restart it
+with the next file that needs to be processed. Depending on the
+`filter.<driver>.required` flag Git will interpret that as error.
+
After the filter has processed a blob it is expected to wait for
the next "key=value" list containing a command. Git will close
the command pipe on exit. The filter is expected to detect EOF
-and exit gracefully on its own.
+and exit gracefully on its own. Git will wait until the filter
+process has stopped.
A long running filter demo implementation can be found in
`contrib/long-running-filter/example.pl` located in the Git
@@ -520,10 +522,6 @@ core repository. If you develop your own long running filter
process then the `GIT_TRACE_PACKET` environment variables can be
very helpful for debugging (see linkgit:git[1]).
-If a `filter.<driver>.process` command is configured then it
-always takes precedence over a configured `filter.<driver>.clean`
-or `filter.<driver>.smudge` command.
-
Please note that you cannot use an existing `filter.<driver>.clean`
or `filter.<driver>.smudge` command with `filter.<driver>.process`
because the former two use a different inter process communication
diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
index f4102d2..3945705 100755
--- a/contrib/long-running-filter/example.pl
+++ b/contrib/long-running-filter/example.pl
@@ -70,13 +70,14 @@ sub packet_flush {
packet_txt_write("git-filter-server");
packet_txt_write("version=2");
+packet_flush();
-( packet_txt_read() eq ( 0, "clean=true" ) ) || die "bad capability";
-( packet_txt_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=clean" ) ) || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
-packet_txt_write("clean=true");
-packet_txt_write("smudge=true");
+packet_txt_write("capability=clean");
+packet_txt_write("capability=smudge");
packet_flush();
while (1) {
diff --git a/convert.c b/convert.c
index 1d89632..9d2aa68 100644
--- a/convert.c
+++ b/convert.c
@@ -567,7 +567,7 @@ static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *
free(entry);
}
-void stop_multi_file_filter(struct child_process *process)
+static void stop_multi_file_filter(struct child_process *process)
{
sigchain_push(SIGPIPE, SIG_IGN);
/* Closing the pipe signals the filter to initiate a shutdown. */
@@ -622,8 +622,11 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
err = strcmp(packet_read_line(process->out, NULL), "version=2");
if (err)
goto done;
+ err = packet_read_line(process->out, NULL) != NULL;
+ if (err)
+ goto done;
- err = packet_write_list(process->in, "clean=true", "smudge=true", NULL);
+ err = packet_write_list(process->in, "capability=clean", "capability=smudge", NULL);
for (;;) {
cap_buf = packet_read_line(process->out, NULL);
@@ -631,10 +634,10 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
break;
string_list_split_in_place(&cap_list, cap_buf, '=', 1);
- if (cap_list.nr != 2 || strcmp(cap_list.items[1].string, "true"))
+ if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, "capability"))
continue;
- cap_name = cap_list.items[0].string;
+ cap_name = cap_list.items[1].string;
if (!strcmp(cap_name, "clean")) {
entry->supported_capabilities |= CAP_CLEAN;
} else if (!strcmp(cap_name, "smudge")) {
diff --git a/run-command.c b/run-command.c
index e5fd6ff..ca905a9 100644
--- a/run-command.c
+++ b/run-command.c
@@ -36,7 +36,10 @@ static void cleanup_children(int sig, int in_signal)
if (p->process && !in_signal) {
struct child_process *process = p->process;
if (process->clean_on_exit_handler) {
- trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
+ trace_printf(
+ "trace: run_command: running exit handler for pid %"
+ PRIuMAX, (uintmax_t)p->pid
+ );
process->clean_on_exit_handler(process);
}
}
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 9f892c0..a20b9f5 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -31,38 +31,35 @@ filter_git () {
rm -f git-stderr.log
}
-# Count unique lines in two files and compare them.
-test_cmp_count () {
- for FILE in $@
- do
- sort $FILE | uniq -c | sed "s/^[ ]*//" >$FILE.tmp
- cat $FILE.tmp >$FILE
- done &&
- test_cmp $@
-}
-
-# Count unique lines except clean invocations in two files and compare
-# them. Clean invocations are not counted because their number can vary.
+# Compare two files and ensure that `clean` and `smudge` respectively are
+# called at least once if specified in the `expect` file. The actual
+# invocation count is not relevant because their number can vary.
# c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
-test_cmp_count_except_clean () {
- for FILE in $@
+test_cmp_count () {
+ expect=$1
+ actual=$2
+ for FILE in "$expect" "$actual"
do
- sort $FILE | uniq -c | sed "s/^[ ]*//" |
- sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
- cat $FILE.tmp >$FILE
+ sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
+ sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
+ sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" >"$FILE.tmp" &&
+ mv "$FILE.tmp" "$FILE"
done &&
- test_cmp $@
+ test_cmp "$expect" "$actual"
}
-# Compare two files but exclude clean invocations because they can vary.
+# Compare two files but exclude all `clean` invocations because Git can
+# call `clean` zero or more times.
# c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
test_cmp_exclude_clean () {
- for FILE in $@
+ expect=$1
+ actual=$2
+ for FILE in "$expect" "$actual"
do
- grep -v "IN: clean" $FILE >$FILE.tmp
- cat $FILE.tmp >$FILE
+ grep -v "IN: clean" "$FILE" >"$FILE.tmp" &&
+ mv "$FILE.tmp" "$FILE"
done &&
- test_cmp $@
+ test_cmp "$expect" "$actual"
}
# Check that the contents of two files are equal and that their rot13 version
@@ -395,7 +392,7 @@ test_expect_success PERL 'required process filter should filter data' '
IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- OUT: $S3 . [OK]
STOP
EOF
- test_cmp_count_except_clean expected.log rot13-filter.log &&
+ test_cmp_count expected.log rot13-filter.log &&
rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x.r" &&
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 1a6959c..ae4c50f 100755
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -95,13 +95,14 @@ $debug->flush();
packet_txt_write("git-filter-server");
packet_txt_write("version=2");
+packet_flush();
-( packet_txt_read() eq ( 0, "clean=true" ) ) || die "bad capability";
-( packet_txt_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=clean" ) ) || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
foreach (@capabilities) {
- packet_txt_write( $_ . "=true" );
+ packet_txt_write( "capability=" . $_ );
}
packet_flush();
print $debug "init handshake complete\n";
--
2.10.0
next reply other threads:[~2016-10-16 23:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-16 23:20 larsxschneider [this message]
2016-10-16 23:20 ` [PATCH v11 01/14] convert: quote filter names in error messages larsxschneider
2016-10-16 23:20 ` [PATCH v11 02/14] convert: modernize tests larsxschneider
2016-10-16 23:20 ` [PATCH v11 03/14] run-command: move check_pipe() from write_or_die to run_command larsxschneider
2016-10-16 23:20 ` [PATCH v11 04/14] run-command: add clean_on_exit_handler larsxschneider
2016-10-16 23:20 ` [PATCH v11 05/14] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-10-16 23:20 ` [PATCH v11 06/14] pkt-line: extract set_packet_header() larsxschneider
2016-10-16 23:20 ` [PATCH v11 07/14] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-10-16 23:20 ` [PATCH v11 08/14] pkt-line: add packet_flush_gently() larsxschneider
2016-10-16 23:20 ` [PATCH v11 09/14] pkt-line: add packet_write_gently() larsxschneider
2016-10-16 23:20 ` [PATCH v11 10/14] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-10-16 23:20 ` [PATCH v11 11/14] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-10-16 23:20 ` [PATCH v11 12/14] convert: prepare filter.<driver>.process option larsxschneider
2016-10-16 23:20 ` [PATCH v11 13/14] convert: add " larsxschneider
2016-11-02 18:03 ` Johannes Sixt
2016-11-03 0:41 ` Lars Schneider
2016-11-03 20:12 ` [PATCH] t0021: expect more variations in the output of uniq -c Johannes Sixt
2016-11-03 20:22 ` [PATCH (optional)] t0021: use arithmetic expansion to trim whitespace from wc -c output Johannes Sixt
2016-11-06 15:45 ` Lars Schneider
2016-11-06 19:31 ` Johannes Sixt
2016-11-06 19:43 ` Lars Schneider
2016-11-06 15:31 ` [PATCH] t0021: expect more variations in the output of uniq -c Lars Schneider
2016-11-06 21:40 ` Johannes Sixt
2016-10-16 23:20 ` [PATCH v11 14/14] contrib/long-running-filter: add long running filter example larsxschneider
2016-10-17 18:47 ` [PATCH v11 00/14] Git filter protocol Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161016232038.84951-1-larsxschneider@gmail.com \
--to=larsxschneider@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).