[PATCH v5 00/15] Git filter protocol

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH v5 00/15] Git filter protocol
       [not found] <20160803164225.46355-1-larsxschneider@gmail.com/>
@ 2016-08-10 13:03 ` larsxschneider
  2016-08-10 13:03   ` [PATCH v5 01/15] pkt-line: extract set_packet_header() larsxschneider
                     ` (14 more replies)
  0 siblings, 15 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:03 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Hi all,

## Notable changes since v4
* drop the shutdown capability
* add error-all response to signal that the filter does not want to filter anymore
* extend init sequence to negotiate version number and capabilities
  (plus detect wrongly configured version 1 filters)
* improve status response format according to Peff's suggestions
* fix Git for Window support


## Patches
* 01-09: Make pktline ready to be used for filters.
* 01 - @Junio: I know you told me twice that you don't like this patch but
               please look closely: the patch does not add a new interface to
               pktline. It is a private functions used 03.
* 08 - This patch changes the pktline interface as it renames packet_write() to
       packet_write_fmt() as suggested by Junio.
       (This patch is not required for the the series and could be dropped)
* 10-13: Prepare convert code.
*    14: Main contribution.
*    15: Proposed fix to make the long running filters work on Windows. See
         the discussion here:
         https://github.com/git-for-windows/git/issues/770#issuecomment-238361829
         (This patch is not required in Git core and could be dropped)


## Jakub
* http://public-inbox.org/git/607c07fe-5b6f-fd67-13e1-705020c267ee%40gmail.com/
  * make Git start the handshake with the filter to detect one-shot filter
    mis configured as process-filter
  * filter driver refusing to filter further

* http://public-inbox.org/git/e8b550ed-1765-764f-49e5-72e5a609d936%40gmail.com/
  * improve commit messages

* http://public-inbox.org/git/0b7d7d96-dfdc-54a4-2c24-2aead6743ae1%40gmail.com/
  * handle sigpipe correctly
  * use cp instead of cat where appropriate
  * explicitly name the test with a binary file (containing \0)


## Junio
* http://public-inbox.org/git/xmqq8twd8uld.fsf%40gitster.mtv.corp.google.com/
  * shorten variable names
  * do not initialize statics explicitly to 0
  * no // comments
  * replace "ret" with "!error"

* http://public-inbox.org/git/CAPc5daV3Tke4qHjtpri%3D6QCRaOax_K3uYhpFzRcd271%3DGHj1%2BQ%40mail.gmail.com/
  * rename packet_write() to packet_write_fmt()


## Peff
* http://public-inbox.org/git/20160803224619.bwtbvmslhuicx2qi%40sigill.intra.peff.net/
  * change capabilities format

* http://public-inbox.org/git/20160808150255.2otm3z5fluimpiqw%40sigill.intra.peff.net/
  * change status response format


## Eric
* http://public-inbox.org/git/20160805185559.GB463%40starla/
  * die if larger-than-4GB files are processed on 32-bit


Thanks a lot for the valuable reviews!

Best,
Lars

You can find a branch with the source on GitHub here:
https://github.com/larsxschneider/git/tree/protocol-filter/v5

I also rebased the series on top of the latest Git for Windows master:
https://github.com/larsxschneider/git/tree/protocol-filter/v5-win


Lars Schneider (15):
  pkt-line: extract set_packet_header()
  pkt-line: call packet_trace() only if a packet is actually send
  pkt-line: add `gentle` parameter to format_packet()
  pkt-line: add packet_write_gently()
  pkt-line: add packet_write_gently_fmt()
  pkt-line: add packet_flush_gently()
  pkt-line: add functions to read/write flush terminated packet streams
  pkt-line: rename packet_write() to packet_write_fmt()
  pack-protocol: fix maximum pkt-line size
  convert: quote filter names in error messages
  convert: modernize tests
  convert: generate large test files only once
  convert: make apply_filter() adhere to standard Git error handling
  convert: add filter.<driver>.process option
  read-cache: make sure file handles are not inherited by child
    processes

 Documentation/gitattributes.txt             | 139 +++++++-
 Documentation/technical/protocol-common.txt |   6 +-
 builtin/archive.c                           |   4 +-
 builtin/receive-pack.c                      |   4 +-
 builtin/remote-ext.c                        |   4 +-
 builtin/upload-archive.c                    |   4 +-
 connect.c                                   |   2 +-
 convert.c                                   | 367 ++++++++++++++++++---
 daemon.c                                    |   2 +-
 http-backend.c                              |   2 +-
 pkt-line.c                                  | 155 ++++++++-
 pkt-line.h                                  |  12 +-
 read-cache.c                                |   2 +-
 shallow.c                                   |   2 +-
 t/t0021-conversion.sh                       | 482 +++++++++++++++++++++++++---
 t/t0021/rot13-filter.pl                     | 176 ++++++++++
 unpack-trees.c                              |   1 +
 upload-pack.c                               |  30 +-
 18 files changed, 1267 insertions(+), 127 deletions(-)
 create mode 100755 t/t0021/rot13-filter.pl

--
2.9.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 01/15] pkt-line: extract set_packet_header()
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
@ 2016-08-10 13:03   ` larsxschneider
  2016-08-10 13:03   ` [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:03 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

set_packet_header() converts an integer to a 4 byte hex string. Make
this function locally available so that other pkt-line functions can
use it.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 62fdb37..177dc73 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -97,10 +97,19 @@ void packet_buf_flush(struct strbuf *buf)
 	strbuf_add(buf, "0000", 4);
 }
 
-#define hex(a) (hexchar[(a) & 15])
-static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+static void set_packet_header(char *buf, const int size)
 {
 	static char hexchar[] = "0123456789abcdef";
+	#define hex(a) (hexchar[(a) & 15])
+	buf[0] = hex(size >> 12);
+	buf[1] = hex(size >> 8);
+	buf[2] = hex(size >> 4);
+	buf[3] = hex(size);
+	#undef hex
+}
+
+static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+{
 	size_t orig_len, n;
 
 	orig_len = out->len;
@@ -111,10 +120,7 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 	if (n > LARGE_PACKET_MAX)
 		die("protocol error: impossibly long line");
 
-	out->buf[orig_len + 0] = hex(n >> 12);
-	out->buf[orig_len + 1] = hex(n >> 8);
-	out->buf[orig_len + 2] = hex(n >> 4);
-	out->buf[orig_len + 3] = hex(n);
+	set_packet_header(&out->buf[orig_len], n);
 	packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
  2016-08-10 13:03   ` [PATCH v5 01/15] pkt-line: extract set_packet_header() larsxschneider
@ 2016-08-10 13:03   ` larsxschneider
  2016-08-10 13:13     ` Jeff King
  2016-08-10 13:03   ` [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet() larsxschneider
                     ` (12 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:03 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

The packet_trace() call is not ideal in format_packet() as we would print
a trace when a packet is formatted and (potentially) when the packet is
actually send. This was no problem up until now because format_packet()
was only used by one function. Fix it by moving the trace call into the
function that actually sends the packet.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pkt-line.c b/pkt-line.c
index 177dc73..9400b47 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -121,7 +121,6 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 		die("protocol error: impossibly long line");
 
 	set_packet_header(&out->buf[orig_len], n);
-	packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
 void packet_write(int fd, const char *fmt, ...)
@@ -133,6 +132,7 @@ void packet_write(int fd, const char *fmt, ...)
 	va_start(args, fmt);
 	format_packet(&buf, fmt, args);
 	va_end(args);
+	packet_trace(buf.buf + 4, buf.len - 4, 1);
 	write_or_die(fd, buf.buf, buf.len);
 }
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
  2016-08-10 13:03   ` [PATCH v5 01/15] pkt-line: extract set_packet_header() larsxschneider
  2016-08-10 13:03   ` [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
@ 2016-08-10 13:03   ` larsxschneider
  2016-08-10 13:15     ` Jeff King
  2016-08-10 13:04   ` [PATCH v5 04/15] pkt-line: add packet_write_gently() larsxschneider
                     ` (11 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:03 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

format_packet() dies if the caller wants to format a packet larger than
LARGE_PACKET_MAX. Certain callers might prefer an error response instead.

Add a parameter `gentle` to define if the function should signal an error
with the return value (gentle=1) or die (gentle=0).

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 9400b47..e6b8410 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -108,7 +108,7 @@ static void set_packet_header(char *buf, const int size)
 	#undef hex
 }
 
-static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+static int format_packet(int gentle, struct strbuf *out, const char *fmt, va_list args)
 {
 	size_t orig_len, n;
 
@@ -117,10 +117,15 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 	strbuf_vaddf(out, fmt, args);
 	n = out->len - orig_len;
 
-	if (n > LARGE_PACKET_MAX)
-		die("protocol error: impossibly long line");
+	if (n > LARGE_PACKET_MAX) {
+		if (gentle)
+			return -1;
+		else
+			die("protocol error: impossibly long line");
+	}
 
 	set_packet_header(&out->buf[orig_len], n);
+	return 0;
 }
 
 void packet_write(int fd, const char *fmt, ...)
@@ -130,7 +135,7 @@ void packet_write(int fd, const char *fmt, ...)
 
 	strbuf_reset(&buf);
 	va_start(args, fmt);
-	format_packet(&buf, fmt, args);
+	format_packet(0, &buf, fmt, args);
 	va_end(args);
 	packet_trace(buf.buf + 4, buf.len - 4, 1);
 	write_or_die(fd, buf.buf, buf.len);
@@ -141,7 +146,7 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 	va_list args;
 
 	va_start(args, fmt);
-	format_packet(buf, fmt, args);
+	format_packet(0, buf, fmt, args);
 	va_end(args);
 }
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 04/15] pkt-line: add packet_write_gently()
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (2 preceding siblings ...)
  2016-08-10 13:03   ` [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet() larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:28     ` Jeff King
  2016-08-10 13:04   ` [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt() larsxschneider
                     ` (10 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write() has two shortcomings. First, it uses format_packet() which
lets the caller only send string data via "%s". That means it cannot be
used for arbitrary data that may contain NULs. Second, it will always
die on error.

Add packet_write_gently() which writes arbitrary data and returns `0` for
success and `-1` for an error.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 12 ++++++++++++
 pkt-line.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index e6b8410..4f25748 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -3,6 +3,7 @@
 #include "run-command.h"
 
 char packet_buffer[LARGE_PACKET_MAX];
+char packet_write_buffer[LARGE_PACKET_MAX];
 static const char *packet_trace_prefix = "git";
 static struct trace_key trace_packet = TRACE_KEY_INIT(PACKET);
 static struct trace_key trace_pack = TRACE_KEY_INIT(PACKFILE);
@@ -141,6 +142,17 @@ void packet_write(int fd, const char *fmt, ...)
 	write_or_die(fd, buf.buf, buf.len);
 }
 
+int packet_write_gently(const int fd_out, const char *buf, size_t size)
+{
+	if (size > PKTLINE_DATA_MAXLEN)
+		return -1;
+	packet_trace(buf, size, 1);
+	memmove(packet_write_buffer + 4, buf, size);
+	size += 4;
+	set_packet_header(packet_write_buffer, size);
+	return (write_in_full(fd_out, packet_write_buffer, size) == size ? 0 : -1);
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
 	va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 3cb9d91..88584f1 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -77,6 +77,7 @@ char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
+#define PKTLINE_DATA_MAXLEN (LARGE_PACKET_MAX - 4)
 extern char packet_buffer[LARGE_PACKET_MAX];
 
 #endif
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (3 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 04/15] pkt-line: add packet_write_gently() larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:43     ` Jeff King
  2016-08-10 13:04   ` [PATCH v5 06/15] pkt-line: add packet_flush_gently() larsxschneider
                     ` (9 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write() would die in case of a write error even though for some callers
an error would be acceptable. Add packet_write_gently_fmt() which writes a
formatted pkt-line and returns `0` for success and `-1` for an error.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 13 +++++++++++++
 pkt-line.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 4f25748..a8207dd 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -142,6 +142,19 @@ void packet_write(int fd, const char *fmt, ...)
 	write_or_die(fd, buf.buf, buf.len);
 }
 
+int packet_write_gently_fmt(int fd, const char *fmt, ...)
+{
+	static struct strbuf buf = STRBUF_INIT;
+	va_list args;
+
+	strbuf_reset(&buf);
+	va_start(args, fmt);
+	format_packet(1, &buf, fmt, args);
+	va_end(args);
+	packet_trace(buf.buf + 4, buf.len - 4, 1);
+	return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
+}
+
 int packet_write_gently(const int fd_out, const char *buf, size_t size)
 {
 	if (size > PKTLINE_DATA_MAXLEN)
diff --git a/pkt-line.h b/pkt-line.h
index 88584f1..82676f4 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_write_gently_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 06/15] pkt-line: add packet_flush_gently()
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (4 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt() larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:04   ` [PATCH v5 07/15] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_flush() would die in case of a write error even though for some callers
an error would be acceptable. Add packet_flush_gently() which writes a pkt-line
flush packet and returns `0` for success and `-1` for failure.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 6 ++++++
 pkt-line.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index a8207dd..e6a0924 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -92,6 +92,12 @@ void packet_flush(int fd)
 	write_or_die(fd, "0000", 4);
 }
 
+int packet_flush_gently(int fd)
+{
+	packet_trace("0000", 4, 1);
+	return (write_in_full(fd, "0000", 4) == 4 ? 0 : -1);
+}
+
 void packet_buf_flush(struct strbuf *buf)
 {
 	packet_trace("0000", 4, 1);
diff --git a/pkt-line.h b/pkt-line.h
index 82676f4..b6c8bcd 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_flush_gently(int fd);
 int packet_write_gently_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
 /*
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 07/15] pkt-line: add functions to read/write flush terminated packet streams
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (5 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 06/15] pkt-line: add packet_flush_gently() larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:04   ` [PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write_stream_with_flush_from_fd() and
packet_write_stream_with_flush_from_buf() write a stream of packets. All
content packets use the maximal packet size except for the last one.
After the last content packet a `flush` control packet is written.

packet_read_till_flush() reads arbitary sized packets until it detects
a `flush` packet.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pkt-line.h |  7 +++++
 2 files changed, 96 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index e6a0924..4b526e1 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -181,6 +181,45 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 	va_end(args);
 }
 
+int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out)
+{
+	int err = 0;
+	ssize_t bytes_to_write;
+	while (!err) {
+		bytes_to_write = xread(fd_in, packet_write_buffer, PKTLINE_DATA_MAXLEN);
+		if (bytes_to_write < 0)
+			return COPY_READ_ERROR;
+		if (bytes_to_write == 0)
+			break;
+		if (bytes_to_write > PKTLINE_DATA_MAXLEN)
+			return COPY_WRITE_ERROR;
+		err = packet_write_gently(fd_out, packet_write_buffer, bytes_to_write);
+	}
+	if (!err)
+		err = packet_flush_gently(fd_out);
+	return err;
+}
+
+int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, int fd_out)
+{
+	int err = 0;
+	size_t bytes_written = 0;
+	size_t bytes_to_write;
+	while (!err) {
+		if ((len - bytes_written) > PKTLINE_DATA_MAXLEN)
+			bytes_to_write = PKTLINE_DATA_MAXLEN;
+		else
+			bytes_to_write = len - bytes_written;
+		if (bytes_to_write == 0)
+			break;
+		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
+		bytes_written += bytes_to_write;
+	}
+	if (!err)
+		err = packet_flush_gently(fd_out);
+	return err;
+}
+
 static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 			   void *dst, unsigned size, int options)
 {
@@ -290,3 +329,53 @@ char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
 {
 	return packet_read_line_generic(-1, src, src_len, dst_len);
 }
+
+ssize_t packet_read_till_flush(int fd_in, struct strbuf *sb_out)
+{
+	int len, ret;
+	int options = PACKET_READ_GENTLE_ON_EOF;
+	char linelen[4];
+
+	size_t oldlen = sb_out->len;
+	size_t oldalloc = sb_out->alloc;
+
+	for (;;) {
+		/* Read packet header */
+		ret = get_packet_data(fd_in, NULL, NULL, linelen, 4, options);
+		if (ret < 0)
+			goto done;
+		len = packet_length(linelen);
+		if (len < 0)
+			die("protocol error: bad line length character: %.4s", linelen);
+		if (!len) {
+			/* Found a flush packet - Done! */
+			packet_trace("0000", 4, 0);
+			break;
+		}
+		len -= 4;
+
+		/* Read packet content */
+		strbuf_grow(sb_out, len);
+		ret = get_packet_data(fd_in, NULL, NULL, sb_out->buf + sb_out->len, len, options);
+		if (ret < 0)
+			goto done;
+
+		if (ret != len) {
+			error("protocol error: incomplete read (expected %d, got %d)", len, ret);
+			goto done;
+		}
+
+		packet_trace(sb_out->buf + sb_out->len, len, 0);
+		sb_out->len += len;
+	}
+
+done:
+	if (ret < 0) {
+		if (oldalloc == 0)
+			strbuf_release(sb_out);
+		else
+			strbuf_setlen(sb_out, oldlen);
+		return ret;  /* unexpected EOF */
+	}
+	return sb_out->len - oldlen;
+}
diff --git a/pkt-line.h b/pkt-line.h
index b6c8bcd..89063ee 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,8 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_gently_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out);
+int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, int fd_out);
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
@@ -77,6 +79,11 @@ char *packet_read_line(int fd, int *size);
  */
 char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 
+/*
+ * Reads a stream of variable sized packets until a flush packet is detected.
+ */
+ssize_t packet_read_till_flush(int fd_in, struct strbuf *sb_out);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 #define PKTLINE_DATA_MAXLEN (LARGE_PACKET_MAX - 4)
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt()
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (6 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 07/15] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:04   ` [PATCH v5 09/15] pack-protocol: fix maximum pkt-line size larsxschneider
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write() should be called packet_write_fmt() as the string
parameter can be formatted.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 builtin/archive.c        |  4 ++--
 builtin/receive-pack.c   |  4 ++--
 builtin/remote-ext.c     |  4 ++--
 builtin/upload-archive.c |  4 ++--
 connect.c                |  2 +-
 daemon.c                 |  2 +-
 http-backend.c           |  2 +-
 pkt-line.c               |  2 +-
 pkt-line.h               |  2 +-
 shallow.c                |  2 +-
 upload-pack.c            | 30 +++++++++++++++---------------
 11 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index a1e3b94..49f4914 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -47,10 +47,10 @@ static int run_remote_archiver(int argc, const char **argv,
 	if (name_hint) {
 		const char *format = archive_format_from_filename(name_hint);
 		if (format)
-			packet_write(fd[1], "argument --format=%s\n", format);
+			packet_write_fmt(fd[1], "argument --format=%s\n", format);
 	}
 	for (i = 1; i < argc; i++)
-		packet_write(fd[1], "argument %s\n", argv[i]);
+		packet_write_fmt(fd[1], "argument %s\n", argv[i]);
 	packet_flush(fd[1]);
 
 	buf = packet_read_line(fd[0], NULL);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 15c323a..d60a412 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -199,7 +199,7 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 static void show_ref(const char *path, const unsigned char *sha1)
 {
 	if (sent_capabilities) {
-		packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
+		packet_write_fmt(1, "%s %s\n", sha1_to_hex(sha1), path);
 	} else {
 		struct strbuf cap = STRBUF_INIT;
 
@@ -212,7 +212,7 @@ static void show_ref(const char *path, const unsigned char *sha1)
 		if (push_cert_nonce)
 			strbuf_addf(&cap, " push-cert=%s", push_cert_nonce);
 		strbuf_addf(&cap, " agent=%s", git_user_agent_sanitized());
-		packet_write(1, "%s %s%c%s\n",
+		packet_write_fmt(1, "%s %s%c%s\n",
 			     sha1_to_hex(sha1), path, 0, cap.buf);
 		strbuf_release(&cap);
 		sent_capabilities = 1;
diff --git a/builtin/remote-ext.c b/builtin/remote-ext.c
index 88eb8f9..11b48bf 100644
--- a/builtin/remote-ext.c
+++ b/builtin/remote-ext.c
@@ -128,9 +128,9 @@ static void send_git_request(int stdin_fd, const char *serv, const char *repo,
 	const char *vhost)
 {
 	if (!vhost)
-		packet_write(stdin_fd, "%s %s%c", serv, repo, 0);
+		packet_write_fmt(stdin_fd, "%s %s%c", serv, repo, 0);
 	else
-		packet_write(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
+		packet_write_fmt(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
 			     vhost, 0);
 }
 
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 2caedf1..dc872f6 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -88,11 +88,11 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
 	writer.git_cmd = 1;
 	if (start_command(&writer)) {
 		int err = errno;
-		packet_write(1, "NACK unable to spawn subprocess\n");
+		packet_write_fmt(1, "NACK unable to spawn subprocess\n");
 		die("upload-archive: %s", strerror(err));
 	}
 
-	packet_write(1, "ACK\n");
+	packet_write_fmt(1, "ACK\n");
 	packet_flush(1);
 
 	while (1) {
diff --git a/connect.c b/connect.c
index 722dc3f..5330d9c 100644
--- a/connect.c
+++ b/connect.c
@@ -730,7 +730,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 		 * Note: Do not add any other headers here!  Doing so
 		 * will cause older git-daemon servers to crash.
 		 */
-		packet_write(fd[1],
+		packet_write_fmt(fd[1],
 			     "%s %s%chost=%s%c",
 			     prog, path, 0,
 			     target_host, 0);
diff --git a/daemon.c b/daemon.c
index e647254..2646d0f 100644
--- a/daemon.c
+++ b/daemon.c
@@ -281,7 +281,7 @@ static int daemon_error(const char *dir, const char *msg)
 {
 	if (!informative_errors)
 		msg = "access denied or repository not exported";
-	packet_write(1, "ERR %s: %s", msg, dir);
+	packet_write_fmt(1, "ERR %s: %s", msg, dir);
 	return -1;
 }
 
diff --git a/http-backend.c b/http-backend.c
index 0d59499..aa61c18 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -460,7 +460,7 @@ static void get_info_refs(char *arg)
 		hdr_str(content_type, buf.buf);
 		end_headers();
 
-		packet_write(1, "# service=git-%s\n", svc->name);
+		packet_write_fmt(1, "# service=git-%s\n", svc->name);
 		packet_flush(1);
 
 		argv[0] = svc->name;
diff --git a/pkt-line.c b/pkt-line.c
index 4b526e1..b98e291 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -135,7 +135,7 @@ static int format_packet(int gentle, struct strbuf *out, const char *fmt, va_lis
 	return 0;
 }
 
-void packet_write(int fd, const char *fmt, ...)
+void packet_write_fmt(int fd, const char *fmt, ...)
 {
 	static struct strbuf buf = STRBUF_INIT;
 	va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 89063ee..ddd6041 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -20,7 +20,7 @@
  * side can't, we stay with pure read/write interfaces.
  */
 void packet_flush(int fd);
-void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
diff --git a/shallow.c b/shallow.c
index 4d554ca..1826114 100644
--- a/shallow.c
+++ b/shallow.c
@@ -260,7 +260,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
 {
 	int fd = *(int *)cb;
 	if (graft->nr_parent == -1)
-		packet_write(fd, "shallow %s\n", oid_to_hex(&graft->oid));
+		packet_write_fmt(fd, "shallow %s\n", oid_to_hex(&graft->oid));
 	return 0;
 }
 
diff --git a/upload-pack.c b/upload-pack.c
index d4cc414..8e77b00 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -393,13 +393,13 @@ static int get_common_commits(void)
 			if (multi_ack == 2 && got_common
 			    && !got_other && ok_to_give_up()) {
 				sent_ready = 1;
-				packet_write(1, "ACK %s ready\n", last_hex);
+				packet_write_fmt(1, "ACK %s ready\n", last_hex);
 			}
 			if (have_obj.nr == 0 || multi_ack)
-				packet_write(1, "NAK\n");
+				packet_write_fmt(1, "NAK\n");
 
 			if (no_done && sent_ready) {
-				packet_write(1, "ACK %s\n", last_hex);
+				packet_write_fmt(1, "ACK %s\n", last_hex);
 				return 0;
 			}
 			if (stateless_rpc)
@@ -416,20 +416,20 @@ static int get_common_commits(void)
 					const char *hex = sha1_to_hex(sha1);
 					if (multi_ack == 2) {
 						sent_ready = 1;
-						packet_write(1, "ACK %s ready\n", hex);
+						packet_write_fmt(1, "ACK %s ready\n", hex);
 					} else
-						packet_write(1, "ACK %s continue\n", hex);
+						packet_write_fmt(1, "ACK %s continue\n", hex);
 				}
 				break;
 			default:
 				got_common = 1;
 				memcpy(last_hex, sha1_to_hex(sha1), 41);
 				if (multi_ack == 2)
-					packet_write(1, "ACK %s common\n", last_hex);
+					packet_write_fmt(1, "ACK %s common\n", last_hex);
 				else if (multi_ack)
-					packet_write(1, "ACK %s continue\n", last_hex);
+					packet_write_fmt(1, "ACK %s continue\n", last_hex);
 				else if (have_obj.nr == 1)
-					packet_write(1, "ACK %s\n", last_hex);
+					packet_write_fmt(1, "ACK %s\n", last_hex);
 				break;
 			}
 			continue;
@@ -437,10 +437,10 @@ static int get_common_commits(void)
 		if (!strcmp(line, "done")) {
 			if (have_obj.nr > 0) {
 				if (multi_ack)
-					packet_write(1, "ACK %s\n", last_hex);
+					packet_write_fmt(1, "ACK %s\n", last_hex);
 				return 0;
 			}
-			packet_write(1, "NAK\n");
+			packet_write_fmt(1, "NAK\n");
 			return -1;
 		}
 		die("git upload-pack: expected SHA1 list, got '%s'", line);
@@ -650,7 +650,7 @@ static void receive_needs(void)
 		while (result) {
 			struct object *object = &result->item->object;
 			if (!(object->flags & (CLIENT_SHALLOW|NOT_SHALLOW))) {
-				packet_write(1, "shallow %s",
+				packet_write_fmt(1, "shallow %s",
 						oid_to_hex(&object->oid));
 				register_shallow(object->oid.hash);
 				shallow_nr++;
@@ -662,7 +662,7 @@ static void receive_needs(void)
 			struct object *object = shallows.objects[i].item;
 			if (object->flags & NOT_SHALLOW) {
 				struct commit_list *parents;
-				packet_write(1, "unshallow %s",
+				packet_write_fmt(1, "unshallow %s",
 					oid_to_hex(&object->oid));
 				object->flags &= ~CLIENT_SHALLOW;
 				/* make sure the real parents are parsed */
@@ -741,7 +741,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		struct strbuf symref_info = STRBUF_INIT;
 
 		format_symref_info(&symref_info, cb_data);
-		packet_write(1, "%s %s%c%s%s%s%s%s agent=%s\n",
+		packet_write_fmt(1, "%s %s%c%s%s%s%s%s agent=%s\n",
 			     oid_to_hex(oid), refname_nons,
 			     0, capabilities,
 			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
@@ -753,11 +753,11 @@ static int send_ref(const char *refname, const struct object_id *oid,
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 	} else {
-		packet_write(1, "%s %s\n", oid_to_hex(oid), refname_nons);
+		packet_write_fmt(1, "%s %s\n", oid_to_hex(oid), refname_nons);
 	}
 	capabilities = NULL;
 	if (!peel_ref(refname, peeled.hash))
-		packet_write(1, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);
+		packet_write_fmt(1, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);
 	return 0;
 }
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 09/15] pack-protocol: fix maximum pkt-line size
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (7 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:04   ` [PATCH v5 10/15] convert: quote filter names in error messages larsxschneider
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

According to LARGE_PACKET_MAX in pkt-line.h the maximal length of a
pkt-line packet is 65520 bytes. The pkt-line header takes 4 bytes and
therefore the pkt-line data component must not exceed 65516 bytes.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/technical/protocol-common.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/protocol-common.txt b/Documentation/technical/protocol-common.txt
index bf30167..ecedb34 100644
--- a/Documentation/technical/protocol-common.txt
+++ b/Documentation/technical/protocol-common.txt
@@ -67,9 +67,9 @@ with non-binary data the same whether or not they contain the trailing
 LF (stripping the LF if present, and not complaining when it is
 missing).
 
-The maximum length of a pkt-line's data component is 65520 bytes.
-Implementations MUST NOT send pkt-line whose length exceeds 65524
-(65520 bytes of payload + 4 bytes of length data).
+The maximum length of a pkt-line's data component is 65516 bytes.
+Implementations MUST NOT send pkt-line whose length exceeds 65520
+(65516 bytes of payload + 4 bytes of length data).
 
 Implementations SHOULD NOT send an empty pkt-line ("0004").
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 10/15] convert: quote filter names in error messages
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (8 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 09/15] pack-protocol: fix maximum pkt-line size larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:04   ` [PATCH v5 11/15] convert: modernize tests larsxschneider
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to
read in error messages. Quote them to improve the readability.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 convert.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index b1614bf..522e2c5 100644
--- a/convert.c
+++ b/convert.c
@@ -397,7 +397,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	child_process.out = out;
 
 	if (start_command(&child_process))
-		return error("cannot fork to run external filter %s", params->cmd);
+		return error("cannot fork to run external filter '%s'", params->cmd);
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -415,13 +415,13 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	if (close(child_process.in))
 		write_err = 1;
 	if (write_err)
-		error("cannot feed the input to external filter %s", params->cmd);
+		error("cannot feed the input to external filter '%s'", params->cmd);
 
 	sigchain_pop(SIGPIPE);
 
 	status = finish_command(&child_process);
 	if (status)
-		error("external filter %s failed %d", params->cmd, status);
+		error("external filter '%s' failed %d", params->cmd, status);
 
 	strbuf_release(&cmd);
 	return (write_err || status);
@@ -462,15 +462,15 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 		return 0;	/* error was already reported */
 
 	if (strbuf_read(&nbuf, async.out, len) < 0) {
-		error("read from external filter %s failed", cmd);
+		error("read from external filter '%s' failed", cmd);
 		ret = 0;
 	}
 	if (close(async.out)) {
-		error("read from external filter %s failed", cmd);
+		error("read from external filter '%s' failed", cmd);
 		ret = 0;
 	}
 	if (finish_async(&async)) {
-		error("external filter %s failed", cmd);
+		error("external filter '%s' failed", cmd);
 		ret = 0;
 	}
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 11/15] convert: modernize tests
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (9 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 10/15] convert: quote filter names in error messages larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:04   ` [PATCH v5 12/15] convert: generate large test files only once larsxschneider
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Use `test_config` to set the config, check that files are empty with
`test_must_be_empty`, compare files with `test_cmp`, and remove spaces
after ">" and "<".

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh | 62 +++++++++++++++++++++++++--------------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..7b45136 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -13,8 +13,8 @@ EOF
 chmod +x rot13.sh
 
 test_expect_success setup '
-	git config filter.rot13.smudge ./rot13.sh &&
-	git config filter.rot13.clean ./rot13.sh &&
+	test_config filter.rot13.smudge ./rot13.sh &&
+	test_config filter.rot13.clean ./rot13.sh &&
 
 	{
 	    echo "*.t filter=rot13"
@@ -38,8 +38,8 @@ script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
 
 test_expect_success check '
 
-	cmp test.o test &&
-	cmp test.o test.t &&
+	test_cmp test.o test &&
+	test_cmp test.o test.t &&
 
 	# ident should be stripped in the repository
 	git diff --raw --exit-code :test :test.i &&
@@ -47,10 +47,10 @@ test_expect_success check '
 	embedded=$(sed -ne "$script" test.i) &&
 	test "z$id" = "z$embedded" &&
 
-	git cat-file blob :test.t > test.r &&
+	git cat-file blob :test.t >test.r &&
 
-	./rot13.sh < test.o > test.t &&
-	cmp test.r test.t
+	./rot13.sh <test.o >test.t &&
+	test_cmp test.r test.t
 '
 
 # If an expanded ident ever gets into the repository, we want to make sure that
@@ -130,7 +130,7 @@ test_expect_success 'filter shell-escaped filenames' '
 
 	# delete the files and check them out again, using a smudge filter
 	# that will count the args and echo the command-line back to us
-	git config filter.argc.smudge "sh ./argc.sh %f" &&
+	test_config filter.argc.smudge "sh ./argc.sh %f" &&
 	rm "$normal" "$special" &&
 	git checkout -- "$normal" "$special" &&
 
@@ -141,7 +141,7 @@ test_expect_success 'filter shell-escaped filenames' '
 	test_cmp expect "$special" &&
 
 	# do the same thing, but with more args in the filter expression
-	git config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
+	test_config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
 	rm "$normal" "$special" &&
 	git checkout -- "$normal" "$special" &&
 
@@ -154,9 +154,9 @@ test_expect_success 'filter shell-escaped filenames' '
 '
 
 test_expect_success 'required filter should filter data' '
-	git config filter.required.smudge ./rot13.sh &&
-	git config filter.required.clean ./rot13.sh &&
-	git config filter.required.required true &&
+	test_config filter.required.smudge ./rot13.sh &&
+	test_config filter.required.clean ./rot13.sh &&
+	test_config filter.required.required true &&
 
 	echo "*.r filter=required" >.gitattributes &&
 
@@ -165,17 +165,17 @@ test_expect_success 'required filter should filter data' '
 
 	rm -f test.r &&
 	git checkout -- test.r &&
-	cmp test.o test.r &&
+	test_cmp test.o test.r &&
 
 	./rot13.sh <test.o >expected &&
 	git cat-file blob :test.r >actual &&
-	cmp expected actual
+	test_cmp expected actual
 '
 
 test_expect_success 'required filter smudge failure' '
-	git config filter.failsmudge.smudge false &&
-	git config filter.failsmudge.clean cat &&
-	git config filter.failsmudge.required true &&
+	test_config filter.failsmudge.smudge false &&
+	test_config filter.failsmudge.clean cat &&
+	test_config filter.failsmudge.required true &&
 
 	echo "*.fs filter=failsmudge" >.gitattributes &&
 
@@ -186,9 +186,9 @@ test_expect_success 'required filter smudge failure' '
 '
 
 test_expect_success 'required filter clean failure' '
-	git config filter.failclean.smudge cat &&
-	git config filter.failclean.clean false &&
-	git config filter.failclean.required true &&
+	test_config filter.failclean.smudge cat &&
+	test_config filter.failclean.clean false &&
+	test_config filter.failclean.required true &&
 
 	echo "*.fc filter=failclean" >.gitattributes &&
 
@@ -197,8 +197,8 @@ test_expect_success 'required filter clean failure' '
 '
 
 test_expect_success 'filtering large input to small output should use little memory' '
-	git config filter.devnull.clean "cat >/dev/null" &&
-	git config filter.devnull.required true &&
+	test_config filter.devnull.clean "cat >/dev/null" &&
+	test_config filter.devnull.required true &&
 	for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
 	echo "30MB filter=devnull" >.gitattributes &&
 	GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
@@ -207,7 +207,7 @@ test_expect_success 'filtering large input to small output should use little mem
 test_expect_success 'filter that does not read is fine' '
 	test-genrandom foo $((128 * 1024 + 1)) >big &&
 	echo "big filter=epipe" >.gitattributes &&
-	git config filter.epipe.clean "echo xyzzy" &&
+	test_config filter.epipe.clean "echo xyzzy" &&
 	git add big &&
 	git cat-file blob :big >actual &&
 	echo xyzzy >expect &&
@@ -215,20 +215,20 @@ test_expect_success 'filter that does not read is fine' '
 '
 
 test_expect_success EXPENSIVE 'filter large file' '
-	git config filter.largefile.smudge cat &&
-	git config filter.largefile.clean cat &&
+	test_config filter.largefile.smudge cat &&
+	test_config filter.largefile.clean cat &&
 	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
 	echo "2GB filter=largefile" >.gitattributes &&
 	git add 2GB 2>err &&
-	! test -s err &&
+	test_must_be_empty err &&
 	rm -f 2GB &&
 	git checkout -- 2GB 2>err &&
-	! test -s err
+	test_must_be_empty err
 '
 
 test_expect_success "filter: clean empty file" '
-	git config filter.in-repo-header.clean  "echo cleaned && cat" &&
-	git config filter.in-repo-header.smudge "sed 1d" &&
+	test_config filter.in-repo-header.clean  "echo cleaned && cat" &&
+	test_config filter.in-repo-header.smudge "sed 1d" &&
 
 	echo "empty-in-worktree    filter=in-repo-header" >>.gitattributes &&
 	>empty-in-worktree &&
@@ -240,8 +240,8 @@ test_expect_success "filter: clean empty file" '
 '
 
 test_expect_success "filter: smudge empty file" '
-	git config filter.empty-in-repo.clean "cat >/dev/null" &&
-	git config filter.empty-in-repo.smudge "echo smudged && cat" &&
+	test_config filter.empty-in-repo.clean "cat >/dev/null" &&
+	test_config filter.empty-in-repo.smudge "echo smudged && cat" &&
 
 	echo "empty-in-repo filter=empty-in-repo" >>.gitattributes &&
 	echo dead data walking >empty-in-repo &&
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 12/15] convert: generate large test files only once
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (10 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 11/15] convert: modernize tests larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:04   ` [PATCH v5 13/15] convert: make apply_filter() adhere to standard Git error handling larsxschneider
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Generate more interesting large test files with pseudo random characters
in between and reuse these test files in multiple tests. Run tests formerly
marked as EXPENSIVE every time but with a smaller data set.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh | 48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7b45136..34c8eb9 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -4,6 +4,15 @@ test_description='blob conversion via gitattributes'
 
 . ./test-lib.sh
 
+if test_have_prereq EXPENSIVE
+then
+	T0021_LARGE_FILE_SIZE=2048
+	T0021_LARGISH_FILE_SIZE=100
+else
+	T0021_LARGE_FILE_SIZE=30
+	T0021_LARGISH_FILE_SIZE=2
+fi
+
 cat <<EOF >rot13.sh
 #!$SHELL_PATH
 tr \
@@ -31,7 +40,26 @@ test_expect_success setup '
 	cat test >test.i &&
 	git add test test.t test.i &&
 	rm -f test test.t test.i &&
-	git checkout -- test test.t test.i
+	git checkout -- test test.t test.i &&
+
+	mkdir generated-test-data &&
+	for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
+	do
+		RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
+		ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"
+		# Generate 1MB of empty data and 100 bytes of random characters
+		# printf "$(test-genrandom start $i)"
+		printf "%1048576d" 1 >>generated-test-data/large.file &&
+		printf "$RANDOM_STRING" >>generated-test-data/large.file &&
+		printf "%1048576d" 1 >>generated-test-data/large.file.rot13 &&
+		printf "$ROT_RANDOM_STRING" >>generated-test-data/large.file.rot13 &&
+
+		if test $i = $T0021_LARGISH_FILE_SIZE
+		then
+			cat generated-test-data/large.file >generated-test-data/largish.file &&
+			cat generated-test-data/large.file.rot13 >generated-test-data/largish.file.rot13
+		fi
+	done
 '
 
 script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
@@ -199,9 +227,9 @@ test_expect_success 'required filter clean failure' '
 test_expect_success 'filtering large input to small output should use little memory' '
 	test_config filter.devnull.clean "cat >/dev/null" &&
 	test_config filter.devnull.required true &&
-	for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
-	echo "30MB filter=devnull" >.gitattributes &&
-	GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
+	cp generated-test-data/large.file large.file &&
+	echo "large.file filter=devnull" >.gitattributes &&
+	GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add large.file
 '
 
 test_expect_success 'filter that does not read is fine' '
@@ -214,15 +242,15 @@ test_expect_success 'filter that does not read is fine' '
 	test_cmp expect actual
 '
 
-test_expect_success EXPENSIVE 'filter large file' '
+test_expect_success 'filter large file' '
 	test_config filter.largefile.smudge cat &&
 	test_config filter.largefile.clean cat &&
-	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
-	echo "2GB filter=largefile" >.gitattributes &&
-	git add 2GB 2>err &&
+	echo "large.file filter=largefile" >.gitattributes &&
+	cp generated-test-data/large.file large.file &&
+	git add large.file 2>err &&
 	test_must_be_empty err &&
-	rm -f 2GB &&
-	git checkout -- 2GB 2>err &&
+	rm -f large.file &&
+	git checkout -- large.file 2>err &&
 	test_must_be_empty err
 '
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 13/15] convert: make apply_filter() adhere to standard Git error handling
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (11 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 12/15] convert: generate large test files only once larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-10 13:04   ` [PATCH v5 14/15] convert: add filter.<driver>.process option larsxschneider
  2016-08-10 13:04   ` [PATCH v5 15/15] read-cache: make sure file handles are not inherited by child processes larsxschneider
  14 siblings, 0 replies; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

apply_filter() returns a boolean that tells the caller if it
"did convert or did not convert". The variable `ret` was used throughout
the function to track errors wheras `1` denoted success and `0` failure.
This is unusual for the Git source where `0` denotes success.

Rename the variable and flip its value to make the function easier
readable for Git developers.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 convert.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index 522e2c5..bd17340 100644
--- a/convert.c
+++ b/convert.c
@@ -436,7 +436,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	 *
 	 * (child --> cmd) --> us
 	 */
-	int ret = 1;
+	int err = 0;
 	struct strbuf nbuf = STRBUF_INIT;
 	struct async async;
 	struct filter_params params;
@@ -463,22 +463,22 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 
 	if (strbuf_read(&nbuf, async.out, len) < 0) {
 		error("read from external filter '%s' failed", cmd);
-		ret = 0;
+		err = -1;
 	}
 	if (close(async.out)) {
 		error("read from external filter '%s' failed", cmd);
-		ret = 0;
+		err = -1;
 	}
 	if (finish_async(&async)) {
 		error("external filter '%s' failed", cmd);
-		ret = 0;
+		err = -1;
 	}
 
-	if (ret) {
+	if (!err) {
 		strbuf_swap(dst, &nbuf);
 	}
 	strbuf_release(&nbuf);
-	return ret;
+	return !err;
 }
 
 static struct convert_driver {
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (12 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 13/15] convert: make apply_filter() adhere to standard Git error handling larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-12 16:33     ` Stefan Beller
  2016-08-10 13:04   ` [PATCH v5 15/15] read-cache: make sure file handles are not inherited by child processes larsxschneider
  14 siblings, 1 reply; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Git's clean/smudge mechanism invokes an external filter process for every
single blob that is affected by a filter. If Git filters a lot of blobs
then the startup time of the external filter processes can become a
significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge filter
written in golang to filter 12,000 files. This process took 364s with the
existing filter mechanism and 5s with the new mechanism. See details here:
https://github.com/github/git-lfs/pull/1382

This patch adds the `filter.<driver>.process` string option which, if used,
keeps the external filter process running and processes all blobs with
the packet format (pkt-line) based protocol over standard input and standard
output described below.

Git starts the filter when it encounters the first file
that needs to be cleaned or smudged. After the filter started
Git sends a welcome message, a list of supported protocol
version numbers, and a flush packet. Git expects to read the
welcome message and one protocol version number from the
previously sent list. Afterwards Git sends a list of supported
capabilities and a flush packet. Git expects to read a list of
desired capabilities, which must be a subset of the supported
capabilities list, and a flush packet as response:
------------------------
packet:          git> git-filter-client
packet:          git> version=2
packet:          git> version=42
packet:          git> 0000
packet:          git< git-filter-server
packet:          git< version=2
packet:          git> clean=true
packet:          git> smudge=true
packet:          git> not-yet-invented=true
packet:          git> 0000
packet:          git< clean=true
packet:          git< smudge=true
packet:          git< 0000
------------------------
Supported filter capabilities in version 2 are "clean" and
"smudge".

Afterwards Git sends a list of "key=value" pairs terminated with
a flush packet. The list will contain at least the filter command
(based on the supported capabilities) and the pathname of the file
to filter relative to the repository root. Right after these packets
Git sends the content split in zero or more pkt-line packets and a
flush packet to terminate content.
------------------------
packet:          git> command=smudge\n
packet:          git> pathname=path/testfile.dat\n
packet:          git> 0000
packet:          git> CONTENT
packet:          git> 0000
------------------------

The filter is expected to respond with a list of "key=value" pairs
terminated with a flush packet. If the filter does not experience
problems then the list must contain a "success" status. Right after
these packets the filter is expected to send the content in zero
or more pkt-line packets and a flush packet at the end. Finally, a
second list of "key=value" pairs terminated with a flush packet
is expected. The filter can change the status in the second list.
------------------------
packet:          git< status=success\n
packet:          git< 0000
packet:          git< SMUDGED_CONTENT
packet:          git< 0000
packet:          git< 0000  # empty list!
------------------------

If the result content is empty then the filter is expected to respond
with a success status and an empty list.
------------------------
packet:          git< status=success\n
packet:          git< 0000
packet:          git< 0000  # empty content!
packet:          git< 0000  # empty list!
------------------------

In case the filter cannot or does not want to process the content,
it is expected to respond with an "error" status. Depending on the
`filter.<driver>.required` flag Git will interpret that as error
but it will not stop or restart the filter process.
------------------------
packet:          git< status=error\n
packet:          git< 0000
------------------------

In case the filter cannot or does not want to process the content
as well as any future content for the lifetime of the Git process,
it is expected to respond with an "error-all" status. Depending on
the `filter.<driver>.required` flag Git will interpret that as error
but it will not stop or restart the filter process.
------------------------
packet:          git< status=error-all\n
packet:          git< 0000
------------------------

If the filter experiences an error during processing, then it can
send the status "error". Depending on the `filter.<driver>.required`
flag Git will interpret that as error but it will not stop or restart
the filter process.
------------------------
packet:          git< status=success\n
packet:          git< 0000
packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
packet:          git< 0000
packet:          git< status=error\n
packet:          git< 0000
------------------------

If the filter dies during the communication or does not adhere to
the protocol then Git will stop the filter process and restart it
with the next file that needs to be processed.

After the filter has processed a blob it is expected to wait for
the next "key=value" list containing a command. When the Git process
terminates, it will send a kill signal to the filter in that stage.

If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
is configured then these commands always take precedence over
a configured `filter.<driver>.process` command.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/gitattributes.txt | 139 ++++++++++++++-
 convert.c                       | 343 ++++++++++++++++++++++++++++++++----
 t/t0021-conversion.sh           | 372 ++++++++++++++++++++++++++++++++++++++++
 t/t0021/rot13-filter.pl         | 176 +++++++++++++++++++
 unpack-trees.c                  |   1 +
 5 files changed, 1001 insertions(+), 30 deletions(-)
 create mode 100755 t/t0021/rot13-filter.pl

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8882a3e..6e563a6 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -300,7 +300,13 @@ checkout, when the `smudge` command is specified, the command is
 fed the blob object from its standard input, and its standard
 output is used to update the worktree file.  Similarly, the
 `clean` command is used to convert the contents of worktree file
-upon checkin.
+upon checkin. By default these commands process only a single
+blob and terminate.  If a long running `process` filter is used
+in place of `clean` and/or `smudge` filters, then Git can process
+all blobs with a single filter command invocation for the entire
+life of a single Git command, for example `git add --all`.  See
+section below for the description of the protocol used to
+communicate with a `process` filter.
 
 One use of the content filtering is to massage the content into a shape
 that is more convenient for the platform, filesystem, and the user to use.
@@ -375,6 +381,137 @@ substitution.  For example:
 ------------------------
 
 
+Long Running Filter Process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If the filter command (a string value) is defined via
+`filter.<driver>.process` then Git can process all blobs with a
+single filter invocation for the entire life of a single Git
+command. This is achieved by using the following packet format
+(pkt-line, see technical/protocol-common.txt) based protocol over
+standard input and standard output.
+
+Git starts the filter when it encounters the first file
+that needs to be cleaned or smudged. After the filter started
+Git sends a welcome message, a list of supported protocol
+version numbers, and a flush packet. Git expects to read the
+welcome message and one protocol version number from the
+previously sent list. Afterwards Git sends a list of supported
+capabilities and a flush packet. Git expects to read a list of
+desired capabilities, which must be a subset of the supported
+capabilities list, and a flush packet as response:
+------------------------
+packet:          git> git-filter-client
+packet:          git> version=2
+packet:          git> version=42
+packet:          git> 0000
+packet:          git< git-filter-server
+packet:          git< version=2
+packet:          git> clean=true
+packet:          git> smudge=true
+packet:          git> not-yet-invented=true
+packet:          git> 0000
+packet:          git< clean=true
+packet:          git< smudge=true
+packet:          git< 0000
+------------------------
+Supported filter capabilities in version 2 are "clean" and
+"smudge".
+
+Afterwards Git sends a list of "key=value" pairs terminated with
+a flush packet. The list will contain at least the filter command
+(based on the supported capabilities) and the pathname of the file
+to filter relative to the repository root. Right after these packets
+Git sends the content split in zero or more pkt-line packets and a
+flush packet to terminate content.
+------------------------
+packet:          git> command=smudge\n
+packet:          git> pathname=path/testfile.dat\n
+packet:          git> 0000
+packet:          git> CONTENT
+packet:          git> 0000
+------------------------
+
+The filter is expected to respond with a list of "key=value" pairs
+terminated with a flush packet. If the filter does not experience
+problems then the list must contain a "success" status. Right after
+these packets the filter is expected to send the content in zero
+or more pkt-line packets and a flush packet at the end. Finally, a
+second list of "key=value" pairs terminated with a flush packet
+is expected. The filter can change the status in the second list.
+------------------------
+packet:          git< status=success\n
+packet:          git< 0000
+packet:          git< SMUDGED_CONTENT
+packet:          git< 0000
+packet:          git< 0000  # empty list!
+------------------------
+
+If the result content is empty then the filter is expected to respond
+with a success status and an empty list.
+------------------------
+packet:          git< status=success\n
+packet:          git< 0000
+packet:          git< 0000  # empty content!
+packet:          git< 0000  # empty list!
+------------------------
+
+In case the filter cannot or does not want to process the content,
+it is expected to respond with an "error" status. Depending on the
+`filter.<driver>.required` flag Git will interpret that as error
+but it will not stop or restart the filter process.
+------------------------
+packet:          git< status=error\n
+packet:          git< 0000
+------------------------
+
+In case the filter cannot or does not want to process the content
+as well as any future content for the lifetime of the Git process,
+it is expected to respond with an "error-all" status. Depending on
+the `filter.<driver>.required` flag Git will interpret that as error
+but it will not stop or restart the filter process.
+------------------------
+packet:          git< status=error-all\n
+packet:          git< 0000
+------------------------
+
+If the filter experiences an error during processing, then it can
+send the status "error". Depending on the `filter.<driver>.required`
+flag Git will interpret that as error but it will not stop or restart
+the filter process.
+------------------------
+packet:          git< status=success\n
+packet:          git< 0000
+packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
+packet:          git< 0000
+packet:          git< status=error\n
+packet:          git< 0000
+------------------------
+
+If the filter dies during the communication or does not adhere to
+the protocol then Git will stop the filter process and restart it
+with the next file that needs to be processed.
+
+After the filter has processed a blob it is expected to wait for
+the next "key=value" list containing a command. When the Git process
+terminates, it will send a kill signal to the filter in that stage.
+
+A long running filter demo implementation can be found in
+`t/t0021/rot13-filter.pl` located in the Git core repository.
+If you develop your own long running filter process then the
+`GIT_TRACE_PACKET` environment variables can be very helpful
+for debugging (see linkgit:git[1]).
+
+If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
+is configured then these commands always take precedence over
+a configured `filter.<driver>.process` command.
+
+Please note that you cannot use an existing `filter.<driver>.clean`
+or `filter.<driver>.smudge` command with `filter.<driver>.process`
+because the former two use a different inter process communication
+protocol than the latter one.
+
+
 Interaction between checkin/checkout attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/convert.c b/convert.c
index bd17340..e421f4a 100644
--- a/convert.c
+++ b/convert.c
@@ -3,6 +3,7 @@
 #include "run-command.h"
 #include "quote.h"
 #include "sigchain.h"
+#include "pkt-line.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -427,7 +428,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len, int fd,
+static int apply_single_file_filter(const char *path, const char *src, size_t len, int fd,
                         struct strbuf *dst, const char *cmd)
 {
 	/*
@@ -441,12 +442,6 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	struct async async;
 	struct filter_params params;
 
-	if (!cmd || !*cmd)
-		return 0;
-
-	if (!dst)
-		return 1;
-
 	memset(&async, 0, sizeof(async));
 	async.proc = filter_buffer_or_fd;
 	async.data = &params;
@@ -481,14 +476,311 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	return !err;
 }
 
+#define CAP_CLEAN    (1u<<0)
+#define CAP_SMUDGE   (1u<<1)
+
+struct cmd2process {
+	struct hashmap_entry ent; /* must be the first member! */
+	int supported_capabilities;
+	const char *cmd;
+	struct child_process process;
+};
+
+static int cmd_process_map_initialized;
+static struct hashmap cmd_process_map;
+
+static int cmd2process_cmp(const struct cmd2process *e1,
+                           const struct cmd2process *e2,
+                           const void *unused)
+{
+	return strcmp(e1->cmd, e2->cmd);
+}
+
+static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
+{
+	struct cmd2process key;
+	hashmap_entry_init(&key, strhash(cmd));
+	key.cmd = cmd;
+	return hashmap_get(hashmap, &key, NULL);
+}
+
+static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
+{
+	if (!entry)
+		return;
+	sigchain_push(SIGPIPE, SIG_IGN);
+	/*
+	 * We kill the filter most likely because an error happened already.
+	 * That's why we are not interested in any error code here.
+	 */
+	close(entry->process.in);
+	close(entry->process.out);
+	sigchain_pop(SIGPIPE);
+	finish_command(&entry->process);
+	hashmap_remove(hashmap, entry, NULL);
+	free(entry);
+}
+
+static int packet_write_list(int fd, const char *line, ...)
+{
+	va_list args;
+	int err;
+	va_start(args, line);
+	for (;;)
+	{
+		if (!line)
+			break;
+		err = packet_write_gently_fmt(fd, "%s", line);
+		if (err)
+			return err;
+		line = va_arg(args, const char*);
+	}
+	va_end(args);
+	return packet_flush_gently(fd);
+}
+
+static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
+{
+	int err;
+	struct cmd2process *entry;
+	struct child_process *process;
+	const char *argv[] = { cmd, NULL };
+	struct string_list cap_list = STRING_LIST_INIT_NODUP;
+	char *cap_buf;
+	const char *cap_name;
+
+	entry = xmalloc(sizeof(*entry));
+	hashmap_entry_init(entry, strhash(cmd));
+	entry->cmd = cmd;
+	entry->supported_capabilities = 0;
+	process = &entry->process;
+
+	child_process_init(process);
+	process->argv = argv;
+	process->use_shell = 1;
+	process->in = -1;
+	process->out = -1;
+
+	if (start_command(process)) {
+		error("cannot fork to run external filter '%s'", cmd);
+		kill_multi_file_filter(hashmap, entry);
+		return NULL;
+	}
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
+	if (err)
+		goto done;
+
+	err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
+	if (err) {
+		error("external filter '%s' does not support long running filter protocol", cmd);
+		goto done;
+	}
+	err = strcmp(packet_read_line(process->out, NULL), "version=2");
+	if (err)
+		goto done;
+
+	err = packet_write_list(process->in, "clean=true", "smudge=true", NULL);
+
+	for (;;)
+	{
+		cap_buf = packet_read_line(process->out, NULL);
+		if (!cap_buf)
+			break;
+		string_list_split_in_place(&cap_list, cap_buf, '=', 1);
+
+		if (cap_list.nr != 2 || strcmp(cap_list.items[1].string, "true"))
+			continue;
+
+		cap_name = cap_list.items[0].string;
+		if (!strcmp(cap_name, "clean")) {
+			entry->supported_capabilities |= CAP_CLEAN;
+		} else if (!strcmp(cap_name, "smudge")) {
+			entry->supported_capabilities |= CAP_SMUDGE;
+		} else {
+			warning(
+				"external filter '%s' requested unsupported filter capability '%s'",
+				cmd, cap_name
+			);
+		}
+
+		string_list_clear(&cap_list, 0);
+	}
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err || errno == EPIPE) {
+		error("initialization for external filter '%s' failed", cmd);
+		kill_multi_file_filter(hashmap, entry);
+		return NULL;
+	}
+
+	hashmap_add(hashmap, entry);
+	return entry;
+}
+
+static void read_multi_file_filter_values(int fd, struct strbuf *status) {
+	struct strbuf **pair;
+	char *line;
+	for (;;) {
+		line = packet_read_line(fd, NULL);
+		if (!line)
+			break;
+		pair = strbuf_split_str(line, '=', 2);
+		if (pair[0] && pair[0]->len && pair[1]) {
+			if (!strcmp(pair[0]->buf, "status=")) {
+				strbuf_reset(status);
+				strbuf_addbuf(status, pair[1]);
+			}
+		}
+	}
+}
+
+static int apply_multi_file_filter(const char *path, const char *src, size_t len,
+                                   int fd, struct strbuf *dst, const char *cmd,
+                                   const int wanted_capability)
+{
+	int err;
+	struct cmd2process *entry;
+	struct child_process *process;
+	struct stat file_stat;
+	struct strbuf nbuf = STRBUF_INIT;
+	struct strbuf filter_status = STRBUF_INIT;
+	char *filter_type;
+
+	if (!cmd_process_map_initialized) {
+		cmd_process_map_initialized = 1;
+		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
+		entry = NULL;
+	} else {
+		entry = find_multi_file_filter_entry(&cmd_process_map, cmd);
+	}
+
+	fflush(NULL);
+
+	if (!entry) {
+		entry = start_multi_file_filter(&cmd_process_map, cmd);
+		if (!entry)
+			return 0;
+	}
+	process = &entry->process;
+
+	if (!(wanted_capability & entry->supported_capabilities))
+		return 0;
+
+	if (CAP_CLEAN & wanted_capability)
+		filter_type = "clean";
+	else if (CAP_SMUDGE & wanted_capability)
+		filter_type = "smudge";
+	else
+		die("unexpected filter type");
+
+	if (fd >= 0 && !src) {
+		if (fstat(fd, &file_stat) == -1)
+			return 0;
+		len = xsize_t(file_stat.st_size);
+	}
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = packet_write_gently_fmt(process->in, "command=%s\n", filter_type);
+	if (err)
+		goto done;
+
+	err = packet_write_gently_fmt(process->in, "pathname=%s\n", path);
+	if (err)
+		goto done;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		goto done;
+
+	if (fd >= 0)
+		err = packet_write_stream_with_flush_from_fd(fd, process->in);
+	else
+		err = packet_write_stream_with_flush_from_buf(src, len, process->in);
+	if (err)
+		goto done;
+
+	read_multi_file_filter_values(process->out, &filter_status);
+	err = strcmp(filter_status.buf, "success");
+	if (err)
+		goto done;
+
+	err = packet_read_till_flush(process->out, &nbuf) < 0;
+	if (err)
+		goto done;
+
+	read_multi_file_filter_values(process->out, &filter_status);
+	err = strcmp(filter_status.buf, "success");
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err || errno == EPIPE) {
+		if (!strcmp(filter_status.buf, "error")) {
+			/*
+		     * The filter signaled a problem with the file.
+		     */
+		} else if (!strcmp(filter_status.buf, "error-all")) {
+			/*
+			 * The filter signaled a permanent problem. Don't try to filter
+			 * files with the same command for the lifetime of the current
+			 * Git process.
+			 */
+			 entry->supported_capabilities &= ~wanted_capability;
+		} else {
+			/*
+			 * Something went wrong with the protocol filter.
+			 * Force shutdown and restart if another blob requires filtering!
+			 */
+			error("external filter '%s' failed", cmd);
+			kill_multi_file_filter(&cmd_process_map, entry);
+		}
+	} else {
+		strbuf_swap(dst, &nbuf);
+	}
+	strbuf_release(&nbuf);
+	return !err;
+}
+
 static struct convert_driver {
 	const char *name;
 	struct convert_driver *next;
 	const char *smudge;
 	const char *clean;
+	const char *process;
 	int required;
 } *user_convert, **user_convert_tail;
 
+static int apply_filter(const char *path, const char *src, size_t len,
+                        int fd, struct strbuf *dst, struct convert_driver *drv,
+                        const int wanted_capability)
+{
+	const char* cmd = NULL;
+
+	if (!drv)
+		return 0;
+
+	if (!dst)
+		return 1;
+
+	if ((CAP_CLEAN & wanted_capability) && drv->clean)
+		cmd = drv->clean;
+	else if ((CAP_SMUDGE & wanted_capability) && drv->smudge)
+		cmd = drv->smudge;
+
+	if (cmd && *cmd)
+		return apply_single_file_filter(path, src, len, fd, dst, cmd);
+	else if (drv->process && *drv->process)
+		return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);
+
+	return 0;
+}
+
 static int read_convert_config(const char *var, const char *value, void *cb)
 {
 	const char *key, *name;
@@ -526,6 +818,10 @@ static int read_convert_config(const char *var, const char *value, void *cb)
 	if (!strcmp("clean", key))
 		return git_config_string(&drv->clean, var, value);
 
+	if (!strcmp("process", key)) {
+		return git_config_string(&drv->process, var, value);
+	}
+
 	if (!strcmp("required", key)) {
 		drv->required = git_config_bool(var, value);
 		return 0;
@@ -823,7 +1119,7 @@ int would_convert_to_git_filter_fd(const char *path)
 	if (!ca.drv->required)
 		return 0;
 
-	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN);
 }
 
 const char *get_convert_attr_ascii(const char *path)
@@ -856,18 +1152,12 @@ int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
-	const char *filter = NULL;
-	int required = 0;
 	struct conv_attrs ca;
 
 	convert_attrs(&ca, path);
-	if (ca.drv) {
-		filter = ca.drv->clean;
-		required = ca.drv->required;
-	}
 
-	ret |= apply_filter(path, src, len, -1, dst, filter);
-	if (!ret && required)
+	ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN);
+	if (!ret && ca.drv && ca.drv->required)
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	if (ret && dst) {
@@ -889,9 +1179,9 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	convert_attrs(&ca, path);
 
 	assert(ca.drv);
-	assert(ca.drv->clean);
+	assert(ca.drv->clean || ca.drv->process);
 
-	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
+	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
@@ -903,15 +1193,9 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 					    int normalizing)
 {
 	int ret = 0, ret_filter = 0;
-	const char *filter = NULL;
-	int required = 0;
 	struct conv_attrs ca;
 
 	convert_attrs(&ca, path);
-	if (ca.drv) {
-		filter = ca.drv->smudge;
-		required = ca.drv->required;
-	}
 
 	ret |= ident_to_worktree(path, src, len, dst, ca.ident);
 	if (ret) {
@@ -920,9 +1204,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 	}
 	/*
 	 * CRLF conversion can be skipped if normalizing, unless there
-	 * is a smudge filter.  The filter might expect CRLFs.
+	 * is a smudge or process filter (even if the process filter doesn't
+	 * support smudge).  The filters might expect CRLFs.
 	 */
-	if (filter || !normalizing) {
+	if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) {
 		ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
 		if (ret) {
 			src = dst->buf;
@@ -930,8 +1215,8 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 		}
 	}
 
-	ret_filter = apply_filter(path, src, len, -1, dst, filter);
-	if (!ret_filter && required)
+	ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE);
+	if (!ret_filter && ca.drv && ca.drv->required)
 		die("%s: smudge filter %s failed", path, ca.drv->name);
 
 	return ret | ret_filter;
@@ -1383,7 +1668,7 @@ struct stream_filter *get_stream_filter(const char *path, const unsigned char *s
 	struct stream_filter *filter = NULL;
 
 	convert_attrs(&ca, path);
-	if (ca.drv && (ca.drv->smudge || ca.drv->clean))
+	if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean))
 		return NULL;
 
 	if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 34c8eb9..5ba694b 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -42,6 +42,9 @@ test_expect_success setup '
 	rm -f test test.t test.i &&
 	git checkout -- test test.t test.i &&
 
+	echo "content-test2" >test2.o &&
+	echo "content-test3-subdir" >test3-subdir.o &&
+
 	mkdir generated-test-data &&
 	for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
 	do
@@ -296,4 +299,373 @@ test_expect_success 'disable filter with empty override' '
 	test_must_be_empty err
 '
 
+check_filter () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	cat >expected.log &&
+	sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_count_clean () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	cat >expected.log &&
+	sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" |
+		sed "s/^\([0-9]\) IN: clean/x IN: clean/" >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_ignore_clean () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" &&
+	cat >expected.log &&
+	grep -v "IN: clean" rot13-filter.log >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_no_call () {
+	rm -f rot13-filter.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	test_must_be_empty rot13-filter.log
+}
+
+check_rot13 () {
+	test_cmp $1 $2 &&
+	./../rot13.sh <$1 >expected &&
+	git cat-file blob :$2 >actual &&
+	test_cmp expected actual
+}
+
+test_expect_success PERL 'required process filter should filter data' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit" &&
+		git branch empty &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+		mkdir testsubdir &&
+		cp ../test3-subdir.o testsubdir/test3-subdir.r &&
+		>test4-empty.r &&
+
+		check_filter \
+			git add . \
+				<<-\EOF &&
+					1 IN: clean test.r 57 [OK] -- OUT: 57 [OK]
+					1 IN: clean test2.r 14 [OK] -- OUT: 14 [OK]
+					1 IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK]
+					1 IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+					1 start
+					1 wrote filter header
+				EOF
+
+		check_filter_count_clean \
+			git commit . -m "test commit" \
+				<<-\EOF &&
+					x IN: clean test.r 57 [OK] -- OUT: 57 [OK]
+					x IN: clean test2.r 14 [OK] -- OUT: 14 [OK]
+					x IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK]
+					x IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+					1 start
+					1 wrote filter header
+				EOF
+
+		rm -f test?.r testsubdir/test3-subdir.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+					IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+				EOF
+
+		check_filter_ignore_clean \
+			git checkout empty \
+				<<-\EOF &&
+					start
+					wrote filter header
+				EOF
+
+		check_filter_ignore_clean \
+			git checkout master \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+					IN: smudge test4-empty.r 0 [OK] -- OUT: 0 [OK]
+					IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r &&
+		check_rot13 ../test3-subdir.o testsubdir/test3-subdir.r
+	)
+'
+
+test_expect_success PERL 'required process filter should filter smudge data and one-shot filter should clean' '
+	test_config_global filter.protocol.clean ./../rot13.sh &&
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit" &&
+		git branch empty &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+
+		check_filter_no_call \
+			git add . &&
+
+		check_filter_no_call \
+			git commit . -m "test commit" &&
+
+		rm -f test?.r testsubdir/test3-subdir.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+				EOF
+
+		git checkout empty &&
+
+		check_filter_ignore_clean \
+			git checkout master\
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r
+	)
+'
+
+test_expect_success PERL 'required process filter should clean only' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit" &&
+		git branch empty &&
+
+		cp ../test.o test.r &&
+
+		check_filter \
+			git add . \
+				<<-\EOF &&
+					1 IN: clean test.r 57 [OK] -- OUT: 57 [OK]
+					1 start
+					1 wrote filter header
+				EOF
+
+		check_filter_count_clean \
+			git commit . -m "test commit" \
+				<<-\EOF
+					x IN: clean test.r 57 [OK] -- OUT: 57 [OK]
+					1 start
+					1 wrote filter header
+				EOF
+	)
+'
+
+test_expect_success PERL 'required process filter should process binary files larger LARGE_PACKET_MAX' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.file filter=protocol" >.gitattributes &&
+		cat ../generated-test-data/largish.file.rot13 >large.rot13 &&
+		cat ../generated-test-data/largish.file >large.file &&
+		cat large.file >large.original &&
+
+		git add large.file .gitattributes &&
+		git commit . -m "test commit" &&
+
+		rm -f large.file &&
+		git checkout -- large.file &&
+		git cat-file blob :large.file >actual &&
+		test_cmp large.rot13 actual
+	)
+'
+
+test_expect_success PERL 'required process filter should with clean error should fail' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		echo "this is going to fail" >clean-write-fail.r &&
+		echo "content-test3-subdir" >test3.r &&
+
+		# Note: There are three clean paths in convert.c we just test one here.
+		test_must_fail git add .
+	)
+'
+
+test_expect_success PERL 'process filter should restart after unexpected write failure' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+		echo "this is going to fail" >smudge-write-fail.o &&
+		cat smudge-write-fail.o >smudge-write-fail.r &&
+		git add . &&
+		git commit . -m "test commit" &&
+		rm -f *.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge smudge-write-fail.r 22 [OK] -- OUT: 22 [WRITE FAIL]
+					start
+					wrote filter header
+					IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r &&
+
+		! test_cmp smudge-write-fail.o smudge-write-fail.r && # Smudge failed!
+		./../rot13.sh <smudge-write-fail.o >expected &&
+		git cat-file blob :smudge-write-fail.r >actual &&
+		test_cmp expected actual							  # Clean worked!
+	)
+'
+
+test_expect_success PERL 'process filter should not restart in case of an error' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+		echo "this will cause an error" >error.o &&
+		cp error.o error.r &&
+		git add . &&
+		git commit . -m "test commit" &&
+		rm -f *.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge error.r 25 [OK] -- OUT: 0 [ERROR]
+					IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r &&
+		test_cmp error.o error.r
+	)
+'
+
+test_expect_success PERL 'process filter should be able to signal an error for all future files' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+		echo "error this blob and all future blobs" >error-all.o &&
+		cp error-all.o error-all.r &&
+		git add . &&
+		git commit . -m "test commit" &&
+		rm -f *.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge error-all.r 37 [OK] -- OUT: 0 [ERROR-ALL]
+				EOF
+
+		test_cmp ../test.o test.r &&
+		test_cmp ../test2.o test2.r &&
+		test_cmp error-all.o error-all.r
+	)
+'
+
+test_expect_success PERL 'invalid process filter must fail (and not hang!)' '
+	test_config_global filter.protocol.process cat &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		test_must_fail git add . 2> git_stderr.log &&
+		grep "not support long running filter protocol" git_stderr.log
+	)
+'
+
 test_done
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
new file mode 100755
index 0000000..0e1181d
--- /dev/null
+++ b/t/t0021/rot13-filter.pl
@@ -0,0 +1,176 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git filter protocol version 2
+# See Documentation/gitattributes.txt, section "Filter Protocol"
+#
+# The script takes the list of supported protocol capabilities as
+# arguments ("clean", "smudge", etc).
+#
+# This implementation supports special test cases:
+# (1) If data with the pathname "clean-write-fail.r" is processed with
+#     a "clean" operation then the write operation will die.
+# (2) If data with the pathname "smudge-write-fail.r" is processed with
+#     a "smudge" operation then the write operation will die.
+# (3) If data with the pathname "error.r" is processed with any
+#     operation then the filter signals that it cannot or does not want
+#     to process the file.
+# (4) If data with the pathname "error-all.r" is processed with any
+#     operation then the filter signals that it cannot or does not want
+#     to process the file and any file after that is processed with the
+#     same command.
+#
+
+use strict;
+use warnings;
+
+my $MAX_PACKET_CONTENT_SIZE = 65516;
+my @capabilities            = @ARGV;
+
+sub rot13 {
+    my ($str) = @_;
+    $str =~ y/A-Za-z/N-ZA-Mn-za-m/;
+    return $str;
+}
+
+sub packet_read {
+    my $buffer;
+    my $bytes_read = read STDIN, $buffer, 4;
+    if ( $bytes_read == 0 ) {
+
+        # EOF - Git stopped talking to us!
+        exit();
+    }
+    elsif ( $bytes_read != 4 ) {
+        die "invalid packet size '$bytes_read' field";
+    }
+    my $pkt_size = hex($buffer);
+    if ( $pkt_size == 0 ) {
+        return ( 1, "" );
+    }
+    elsif ( $pkt_size > 4 ) {
+        my $content_size = $pkt_size - 4;
+        $bytes_read = read STDIN, $buffer, $content_size;
+        if ( $bytes_read != $content_size ) {
+            die "invalid packet ($content_size expected; $bytes_read read)";
+        }
+        return ( 0, $buffer );
+    }
+    else {
+        die "invalid packet size";
+    }
+}
+
+sub packet_write {
+    my ($packet) = @_;
+    print STDOUT sprintf( "%04x", length($packet) + 4 );
+    print STDOUT $packet;
+    STDOUT->flush();
+}
+
+sub packet_flush {
+    print STDOUT sprintf( "%04x", 0 );
+    STDOUT->flush();
+}
+
+open my $debug, ">>", "rot13-filter.log";
+print $debug "start\n";
+$debug->flush();
+
+( packet_read() eq ( 0, "git-filter-client" ) ) || die "bad initialization";
+( packet_read() eq ( 0, "version=2" ) )         || die "bad version";
+( packet_read() eq ( 1, "" ) )                  || die "bad version end";
+
+packet_write("git-filter-server\n");
+packet_write("version=2\n");
+
+( packet_read() eq ( 0, "clean=true" ) )  || die "bad capability";
+( packet_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_read() eq ( 1, "" ) )            || die "bad capability end";
+
+foreach (@capabilities) {
+    packet_write( $_ . "=true\n" );
+}
+packet_flush();
+print $debug "wrote filter header\n";
+$debug->flush();
+
+while (1) {
+    my ($command) = packet_read() =~ /^command=([^=]+)\n$/;
+    print $debug "IN: $command";
+    $debug->flush();
+
+    my ($pathname) = packet_read() =~ /^pathname=([^=]+)\n$/;
+    print $debug " $pathname";
+    $debug->flush();
+
+    # Flush
+    packet_read();
+
+    my $input = "";
+    {
+        binmode(STDIN);
+        my $buffer;
+        my $done = 0;
+        while ( !$done ) {
+            ( $done, $buffer ) = packet_read();
+            $input .= $buffer;
+        }
+        print $debug " " . length($input) . " [OK] -- ";
+        $debug->flush();
+    }
+
+    my $output;
+    if ( $pathname eq "error.r" or $pathname eq "error-all.r" ) {
+        $output = "";
+    }
+    elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
+        $output = rot13($input);
+    }
+    elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
+        $output = rot13($input);
+    }
+    else {
+        die "bad command '$command'";
+    }
+
+    print $debug "OUT: " . length($output) . " ";
+    $debug->flush();
+
+    if ( $pathname eq "error.r" ) {
+        print $debug "[ERROR]\n";
+        $debug->flush();
+        packet_write("status=error\n");
+        packet_flush();
+    }
+    elsif ( $pathname eq "error-all.r" ) {
+        print $debug "[ERROR-ALL]\n";
+        $debug->flush();
+        packet_write("status=error-all\n");
+        packet_flush();
+    }
+    else {
+        packet_write("status=success\n");
+        packet_flush();
+
+        if ( $pathname eq "${command}-write-fail.r" ) {
+            print $debug "[WRITE FAIL]\n";
+            $debug->flush();
+            die "${command} write error";
+        }
+
+        while ( length($output) > 0 ) {
+            my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+            packet_write($packet);
+            if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+                $output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+            }
+            else {
+                $output = "";
+            }
+        }
+        packet_flush();
+        print $debug "[OK]\n";
+        $debug->flush();
+        packet_flush();
+    }
+}
diff --git a/unpack-trees.c b/unpack-trees.c
index 11c37fb..f6798f8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -10,6 +10,7 @@
 #include "attr.h"
 #include "split-index.h"
 #include "dir.h"
+#include "convert.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 15/15] read-cache: make sure file handles are not inherited by child processes
  2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
                     ` (13 preceding siblings ...)
  2016-08-10 13:04   ` [PATCH v5 14/15] convert: add filter.<driver>.process option larsxschneider
@ 2016-08-10 13:04   ` larsxschneider
  2016-08-18 14:23     ` Johannes Schindelin
  14 siblings, 1 reply; 50+ messages in thread
From: larsxschneider @ 2016-08-10 13:04 UTC (permalink / raw)
  To: git
  Cc: gitster, jnareb, mlbright, e, peff, Johannes.Schindelin, ben,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Consider the case of a file that requires filtering and is present in branch A
but not in branch B. If A is the current HEAD and we checkout B then the
following happens:

1. ce_compare_data() opens the file
2.   index_fd() detects that the file requires to run a clean filter and
     calls index_stream_convert_blob()
4.     index_stream_convert_blob() calls convert_to_git_filter_fd()
5.       convert_to_git_filter_fd() calls apply_filter() which creates a new
         long running filter process (in case it is the first file of this kind
         to be filtered)
6.       The new filter process inherits all file handles. This is the default
         on Linux/OSX and is explicitly defined in the `CreateProcessW` call
         in `mingw.c` on Windows.
7. ce_compare_data() closes the file
8. Git unlinks the file as it is not present in B

The unlink operation does not work on Windows because the filter process has
still an open handle to the file. Apparently that is no problem on Linux/OSX.
Probably because "[...] the two file descriptors share open file status flags"
(see fork(2)).

Fix this problem by opening files in read-cache with the `O_CLOEXEC` flag to
ensure that the file descriptor does not remain open in a newly spawned process.
`O_CLOEXEX` is defined as `O_NOINHERIT` on Windows. A similar fix for temporary
file handles was applied on Git for Windows already:
https://github.com/git-for-windows/git/commit/667b8b51ec850c3e1c7d75dee69dc13c29d1f162

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index db27766..f481dee 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -159,7 +159,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 {
 	int match = -1;
-	int fd = open(ce->name, O_RDONLY);
+	int fd = open(ce->name, O_RDONLY | O_CLOEXEC);
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send
  2016-08-10 13:03   ` [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
@ 2016-08-10 13:13     ` Jeff King
  2016-08-10 13:24       ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2016-08-10 13:13 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschneider@gmail.com wrote:

> From: Lars Schneider <larsxschneider@gmail.com>
> 
> The packet_trace() call is not ideal in format_packet() as we would print
> a trace when a packet is formatted and (potentially) when the packet is
> actually send. This was no problem up until now because format_packet()
> was only used by one function. Fix it by moving the trace call into the
> function that actually sends the packet.

It looks like there are two functions: packet_write() and
packet_buf_write().

Your patch only touches one of them, and it looks like we would fail to
trace many packets (e.g., see receive-pack.c:report(), which uses
packet_buf_write() and then write()s out the result).

-Peff

PS Also, s/send/sent/ in the commit message.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()
  2016-08-10 13:03   ` [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet() larsxschneider
@ 2016-08-10 13:15     ` Jeff King
  2016-08-10 13:29       ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2016-08-10 13:15 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschneider@gmail.com wrote:

> From: Lars Schneider <larsxschneider@gmail.com>
> 
> format_packet() dies if the caller wants to format a packet larger than
> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.

I am not sure I agree here. Certainly I see the usefulness of gently
handling a failure to write(). But if you are passing in too-large
buffers, isn't that a bug in the program?

How would you recover, except by splitting up the content? That might
not be possible depending on how you are using the pkt-lines. And even
if it is, wouldn't it be simpler to split it up before sending it to
format_packet()?

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send
  2016-08-10 13:13     ` Jeff King
@ 2016-08-10 13:24       ` Lars Schneider
  2016-08-10 13:30         ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 13:24 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, gitster, jnareb, mlbright, e,
	Johannes.Schindelin, ben


> On 10 Aug 2016, at 15:13, Jeff King <peff@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschneider@gmail.com wrote:
> 
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> The packet_trace() call is not ideal in format_packet() as we would print
>> a trace when a packet is formatted and (potentially) when the packet is
>> actually send. This was no problem up until now because format_packet()
>> was only used by one function. Fix it by moving the trace call into the
>> function that actually sends the packet.
> 
> It looks like there are two functions: packet_write() and
> packet_buf_write().

I did not call trace in packet_buf_write() because this function does not
perform any writes.


> Your patch only touches one of them, and it looks like we would fail to
> trace many packets (e.g., see receive-pack.c:report(), which uses
> packet_buf_write() and then write()s out the result).

I see. But isn't it confusing if packet_buf_write() issues a trace call?
If I just call this function then nothing happens at all. Shouldn't the
trace call be made in receive-pack.c:report() ? Or shouldn't receive-pack
let pkt-line.c perform the write calls?

-Lars


> PS Also, s/send/sent/ in the commit message.

Thanks :)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()
  2016-08-10 13:04   ` [PATCH v5 04/15] pkt-line: add packet_write_gently() larsxschneider
@ 2016-08-10 13:28     ` Jeff King
  2016-08-10 13:36       ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2016-08-10 13:28 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:04:00PM +0200, larsxschneider@gmail.com wrote:

> From: Lars Schneider <larsxschneider@gmail.com>
> 
> packet_write() has two shortcomings. First, it uses format_packet() which
> lets the caller only send string data via "%s". That means it cannot be
> used for arbitrary data that may contain NULs. Second, it will always
> die on error.
> 
> Add packet_write_gently() which writes arbitrary data and returns `0` for
> success and `-1` for an error.

So now we have packet_write() and packet_write_gently(), but they differ
in more than just whether they are gentle. That seems like a weird
interface.

Should we either be picking a new name (e.g., packet_write_mem() or
something), or migrating packet_write() to packet_write_fmt()?

> diff --git a/pkt-line.c b/pkt-line.c
> index e6b8410..4f25748 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -3,6 +3,7 @@
>  #include "run-command.h"
>  
>  char packet_buffer[LARGE_PACKET_MAX];
> +char packet_write_buffer[LARGE_PACKET_MAX];

Should this be static? I.e., are random other bits of the code allowed
to write into it (I guess not because it's not declared in pkt-line.h).

> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
> +{
> +	if (size > PKTLINE_DATA_MAXLEN)
> +		return -1;
> +	packet_trace(buf, size, 1);
> +	memmove(packet_write_buffer + 4, buf, size);

It looks like this iteration drops the idea of callers using a
LARGE_PACKET_MAX buffer and only filling it at "buf+4" with
PKTLINE_DATA_MAXLEN bytes (which is fine).

I wonder if we still need PKTLINE_DATA_MAXLEN, or of it is just
obscuring things. The magic number "4" still appears separately here,
and it actually makes it harder to see that things are correct.  I.e.,
doing:

  if (size > sizeof(packet_write_buffer) - 4)
	return -1;
  memmove(packet_write_buffer + 4, buf, size);

is more obviously correct, because you do not have to wonder about the
relationship between the size of your buffer and the macro.

It might still be worth having PKTLINE_DATA_MAXLEN public, though, if
callers use it to size their input to packet_write_gently().

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()
  2016-08-10 13:15     ` Jeff King
@ 2016-08-10 13:29       ` Lars Schneider
  2016-08-10 13:37         ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 13:29 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben


> On 10 Aug 2016, at 15:15, Jeff King <peff@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschneider@gmail.com wrote:
> 
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> format_packet() dies if the caller wants to format a packet larger than
>> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.
> 
> I am not sure I agree here. Certainly I see the usefulness of gently
> handling a failure to write(). But if you are passing in too-large
> buffers, isn't that a bug in the program?
> 
> How would you recover, except by splitting up the content? That might
> not be possible depending on how you are using the pkt-lines. And even
> if it is, wouldn't it be simpler to split it up before sending it to
> format_packet()?

Good argument. I agree - this patch should be dropped.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send
  2016-08-10 13:24       ` Lars Schneider
@ 2016-08-10 13:30         ` Jeff King
  2016-08-10 13:51           ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2016-08-10 13:30 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Git Mailing List, gitster, jnareb, mlbright, e,
	Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:24:38PM +0200, Lars Schneider wrote:

> > On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschneider@gmail.com wrote:
> > 
> >> From: Lars Schneider <larsxschneider@gmail.com>
> >> 
> >> The packet_trace() call is not ideal in format_packet() as we would print
> >> a trace when a packet is formatted and (potentially) when the packet is
> >> actually send. This was no problem up until now because format_packet()
> >> was only used by one function. Fix it by moving the trace call into the
> >> function that actually sends the packet.
> > 
> > It looks like there are two functions: packet_write() and
> > packet_buf_write().
> 
> I did not call trace in packet_buf_write() because this function does not
> perform any writes.

Yes, but then who is responsible for the trace? The caller?

And why is it a bad thing to do it some time other than writing? It is
if you format and then _don't_ write the packet, but the current callers
are not doing that.

> > Your patch only touches one of them, and it looks like we would fail to
> > trace many packets (e.g., see receive-pack.c:report(), which uses
> > packet_buf_write() and then write()s out the result).
> 
> I see. But isn't it confusing if packet_buf_write() issues a trace call?
> If I just call this function then nothing happens at all. Shouldn't the
> trace call be made in receive-pack.c:report() ? Or shouldn't receive-pack
> let pkt-line.c perform the write calls?

How would report() do that without re-parsing each of the packets?

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()
  2016-08-10 13:28     ` Jeff King
@ 2016-08-10 13:36       ` Lars Schneider
  2016-08-10 13:40         ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 13:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben


> On 10 Aug 2016, at 15:28, Jeff King <peff@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:04:00PM +0200, larsxschneider@gmail.com wrote:
> 
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> packet_write() has two shortcomings. First, it uses format_packet() which
>> lets the caller only send string data via "%s". That means it cannot be
>> used for arbitrary data that may contain NULs. Second, it will always
>> die on error.
>> 
>> Add packet_write_gently() which writes arbitrary data and returns `0` for
>> success and `-1` for an error.
> 
> So now we have packet_write() and packet_write_gently(), but they differ
> in more than just whether they are gentle. That seems like a weird
> interface.
> 
> Should we either be picking a new name (e.g., packet_write_mem() or
> something), or migrating packet_write() to packet_write_fmt()?

Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt()"


>> diff --git a/pkt-line.c b/pkt-line.c
>> index e6b8410..4f25748 100644
>> --- a/pkt-line.c
>> +++ b/pkt-line.c
>> @@ -3,6 +3,7 @@
>> #include "run-command.h"
>> 
>> char packet_buffer[LARGE_PACKET_MAX];
>> +char packet_write_buffer[LARGE_PACKET_MAX];
> 
> Should this be static? I.e., are random other bits of the code allowed
> to write into it (I guess not because it's not declared in pkt-line.h).

static is better!


>> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
>> +{
>> +	if (size > PKTLINE_DATA_MAXLEN)
>> +		return -1;
>> +	packet_trace(buf, size, 1);
>> +	memmove(packet_write_buffer + 4, buf, size);
> 
> It looks like this iteration drops the idea of callers using a
> LARGE_PACKET_MAX buffer and only filling it at "buf+4" with
> PKTLINE_DATA_MAXLEN bytes (which is fine).
> 
> I wonder if we still need PKTLINE_DATA_MAXLEN, or of it is just
> obscuring things. The magic number "4" still appears separately here,
> and it actually makes it harder to see that things are correct.  I.e.,
> doing:
> 
>  if (size > sizeof(packet_write_buffer) - 4)
> 	return -1;
>  memmove(packet_write_buffer + 4, buf, size);
> 
> is more obviously correct, because you do not have to wonder about the
> relationship between the size of your buffer and the macro.
> 
> It might still be worth having PKTLINE_DATA_MAXLEN public, though, if
> callers use it to size their input to packet_write_gently().

I agree. In a later patch I am using PKTLINE_DATA_MAXLEN inside pkt-line.c,
too. I will change it to your suggestion.

For now I would remove PKTLINE_DATA_MAXLEN because it should be an implementation
detail of pkt-line.c (plus it is not used by anyone).

Thanks,
Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()
  2016-08-10 13:29       ` Lars Schneider
@ 2016-08-10 13:37         ` Jeff King
  2016-08-10 13:59           ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2016-08-10 13:37 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:29:26PM +0200, Lars Schneider wrote:

> 
> > On 10 Aug 2016, at 15:15, Jeff King <peff@peff.net> wrote:
> > 
> > On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschneider@gmail.com wrote:
> > 
> >> From: Lars Schneider <larsxschneider@gmail.com>
> >> 
> >> format_packet() dies if the caller wants to format a packet larger than
> >> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.
> > 
> > I am not sure I agree here. Certainly I see the usefulness of gently
> > handling a failure to write(). But if you are passing in too-large
> > buffers, isn't that a bug in the program?
> > 
> > How would you recover, except by splitting up the content? That might
> > not be possible depending on how you are using the pkt-lines. And even
> > if it is, wouldn't it be simpler to split it up before sending it to
> > format_packet()?
> 
> Good argument. I agree - this patch should be dropped.

Actually, after reading further, one thought did occur to me. Let's say
you are writing to a smudge filter, and one of the header packets you
send has the filename in it. So you might do something like:

  if (packet_write_fmt_gently(fd, "filename=%s", filename) < 0) {
	if (filter_required)
		die(...);
	else
		return -1; /* we tried our best; skip smudge */
  }

The "recovery" there is not to try sending again, but rather to give up.
And that is presumably a sane outcome for somebody who tries to checkout
a filename larger than 64K.

It does still feel a little weird that you cannot tell the difference
between a write() error and bad input. Because you really might want to
do something different between the two. Like:

  #define MAX_FILENAME (PKTLINE_DATA_MAXLEN - strlen("filename"))

  if (filename > MAX_FILENAME) {
	warning("woah, that name is ridiculous; truncating");
	ret = packet_write_fmt_gently(fd, "%.*s", MAX_FILENAME, filename);
  } else
	ret = packet_write_fmt_gently(fd, "%s", filename);

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()
  2016-08-10 13:36       ` Lars Schneider
@ 2016-08-10 13:40         ` Jeff King
  2016-08-10 17:17           ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2016-08-10 13:40 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:36:45PM +0200, Lars Schneider wrote:

> > So now we have packet_write() and packet_write_gently(), but they differ
> > in more than just whether they are gentle. That seems like a weird
> > interface.
> > 
> > Should we either be picking a new name (e.g., packet_write_mem() or
> > something), or migrating packet_write() to packet_write_fmt()?
> 
> Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt()"

Ah, OK. Generally I'd suggest to reorder things so that each patch looks
like a step forward (and so the early patches become preparatory steps,
and the justification in them is something like "we're going to add more
write functions, so let's give this a more descriptive name").

> I agree. In a later patch I am using PKTLINE_DATA_MAXLEN inside pkt-line.c,
> too. I will change it to your suggestion.
> 
> For now I would remove PKTLINE_DATA_MAXLEN because it should be an implementation
> detail of pkt-line.c (plus it is not used by anyone).

Sounds reasonable.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()
  2016-08-10 13:04   ` [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt() larsxschneider
@ 2016-08-10 13:43     ` Jeff King
  2016-08-10 14:10       ` Lars Schneider
  2016-08-10 17:18       ` Junio C Hamano
  0 siblings, 2 replies; 50+ messages in thread
From: Jeff King @ 2016-08-10 13:43 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschneider@gmail.com wrote:

> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
> +{
> +	static struct strbuf buf = STRBUF_INIT;
> +	va_list args;
> +
> +	strbuf_reset(&buf);
> +	va_start(args, fmt);
> +	format_packet(1, &buf, fmt, args);
> +	va_end(args);
> +	packet_trace(buf.buf + 4, buf.len - 4, 1);
> +	return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
> +}

Could the end of this function just be:

  return packet_write_gently(fd, buf.buf, buf.len);

? I guess we'd prefer to avoid that, because it incurs an extra
memmove() of the data.

Similarly, I'd think this could share code with the non-gentle form
(which should be able to just call this and die() if returns an error).
Though sometimes the va_list transformation makes that awkward.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send
  2016-08-10 13:30         ` Jeff King
@ 2016-08-10 13:51           ` Lars Schneider
  2016-08-10 14:33             ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 13:51 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, gitster, jnareb, mlbright, e,
	Johannes.Schindelin, ben


> On 10 Aug 2016, at 15:30, Jeff King <peff@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:24:38PM +0200, Lars Schneider wrote:
> 
>>> On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschneider@gmail.com wrote:
>>> 
>>>> From: Lars Schneider <larsxschneider@gmail.com>
>>>> 
>>>> The packet_trace() call is not ideal in format_packet() as we would print
>>>> a trace when a packet is formatted and (potentially) when the packet is
>>>> actually send. This was no problem up until now because format_packet()
>>>> was only used by one function. Fix it by moving the trace call into the
>>>> function that actually sends the packet.
>>> 
>>> It looks like there are two functions: packet_write() and
>>> packet_buf_write().
>> 
>> I did not call trace in packet_buf_write() because this function does not
>> perform any writes.
> 
> Yes, but then who is responsible for the trace? The caller?

From my point of view the one that issues the write call.


> And why is it a bad thing to do it some time other than writing? It is
> if you format and then _don't_ write the packet, but the current callers
> are not doing that.

True, they don't do that. However, I don't think that is intuitive behavior
for future callers. If I call "format" and then trace tells me that a packet
was sent although it was not (yet).

> 
>>> Your patch only touches one of them, and it looks like we would fail to
>>> trace many packets (e.g., see receive-pack.c:report(), which uses
>>> packet_buf_write() and then write()s out the result).
>> 
>> I see. But isn't it confusing if packet_buf_write() issues a trace call?
>> If I just call this function then nothing happens at all. Shouldn't the
>> trace call be made in receive-pack.c:report() ? Or shouldn't receive-pack
>> let pkt-line.c perform the write calls?
> 
> How would report() do that without re-parsing each of the packets?

We could add a function to pkt-line that takes a list of strings and
generates a list of packets with a terminating flush packet out of it.
Then it sends the packets.

As a quick compromise we could also introduce a function "packet_buf_write_with_trace()"
that wraps the "packet_buf_write()" and calls trace. However, I can imagine that
would be perceived as ugly.

I guess my point is that I stumbled over the un-intutiive format_packet() behavior
and I wanted to improve the situation in a way that others don't run into this
trap. If you think that is no issue then it would be OK for me if we leave the
current behavior as is.

Thanks,
Lars





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()
  2016-08-10 13:37         ` Jeff King
@ 2016-08-10 13:59           ` Lars Schneider
  2016-08-10 14:34             ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 13:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben


> On 10 Aug 2016, at 15:37, Jeff King <peff@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:29:26PM +0200, Lars Schneider wrote:
> 
>> 
>>> On 10 Aug 2016, at 15:15, Jeff King <peff@peff.net> wrote:
>>> 
>>> On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschneider@gmail.com wrote:
>>> 
>>>> From: Lars Schneider <larsxschneider@gmail.com>
>>>> 
>>>> format_packet() dies if the caller wants to format a packet larger than
>>>> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.
>>> 
>>> I am not sure I agree here. Certainly I see the usefulness of gently
>>> handling a failure to write(). But if you are passing in too-large
>>> buffers, isn't that a bug in the program?
>>> 
>>> How would you recover, except by splitting up the content? That might
>>> not be possible depending on how you are using the pkt-lines. And even
>>> if it is, wouldn't it be simpler to split it up before sending it to
>>> format_packet()?
>> 
>> Good argument. I agree - this patch should be dropped.
> 
> Actually, after reading further, one thought did occur to me. Let's say
> you are writing to a smudge filter, and one of the header packets you
> send has the filename in it. So you might do something like:
> 
>  if (packet_write_fmt_gently(fd, "filename=%s", filename) < 0) {
> 	if (filter_required)
> 		die(...);
> 	else
> 		return -1; /* we tried our best; skip smudge */
>  }
> 
> The "recovery" there is not to try sending again, but rather to give up.
> And that is presumably a sane outcome for somebody who tries to checkout
> a filename larger than 64K.

Yes!


> It does still feel a little weird that you cannot tell the difference
> between a write() error and bad input. Because you really might want to
> do something different between the two. Like:
> 
>  #define MAX_FILENAME (PKTLINE_DATA_MAXLEN - strlen("filename"))
> 
>  if (filename > MAX_FILENAME) {
> 	warning("woah, that name is ridiculous; truncating");
> 	ret = packet_write_fmt_gently(fd, "%.*s", MAX_FILENAME, filename);
>  } else
> 	ret = packet_write_fmt_gently(fd, "%s", filename);


I can do that. However, I wouldn't truncate the filename as this
might create a weird outcome. I would just let the filter fail.

OK?

- Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()
  2016-08-10 13:43     ` Jeff King
@ 2016-08-10 14:10       ` Lars Schneider
  2016-08-10 15:01         ` Jeff King
  2016-08-10 17:18       ` Junio C Hamano
  1 sibling, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 14:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben


> On 10 Aug 2016, at 15:43, Jeff King <peff@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschneider@gmail.com wrote:
> 
>> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
>> +{
>> +	static struct strbuf buf = STRBUF_INIT;
>> +	va_list args;
>> +
>> +	strbuf_reset(&buf);
>> +	va_start(args, fmt);
>> +	format_packet(1, &buf, fmt, args);
>> +	va_end(args);
>> +	packet_trace(buf.buf + 4, buf.len - 4, 1);
>> +	return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
>> +}
> 
> Could the end of this function just be:
> 
>  return packet_write_gently(fd, buf.buf, buf.len);
> 
> ? I guess we'd prefer to avoid that, because it incurs an extra
> memmove() of the data.

I don't think the memmove would be that expensive. However, format_packet()
already creates the packet_header and packet_write_gently would do the same
again, no?


> Similarly, I'd think this could share code with the non-gentle form
> (which should be able to just call this and die() if returns an error).
> Though sometimes the va_list transformation makes that awkward.

Yeah, the code duplication annoyed me, too. va_list was the reason I did it
that way. Do you think that is something that needs to be addressed in the
series?

Thanks,
Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send
  2016-08-10 13:51           ` Lars Schneider
@ 2016-08-10 14:33             ` Jeff King
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2016-08-10 14:33 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Git Mailing List, gitster, jnareb, mlbright, e,
	Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:51:35PM +0200, Lars Schneider wrote:

> I guess my point is that I stumbled over the un-intutiive format_packet() behavior
> and I wanted to improve the situation in a way that others don't run into this
> trap. If you think that is no issue then it would be OK for me if we leave the
> current behavior as is.

I don't think the behavior as-is is a problem, and it would remain OK as
long as no callers are added who format a packet but don't write it (or
don't write it in a timely manner).

But most importantly, if you are going to refactor code, you can't
regress the existing callers. So even if we did want to change how this
worked, this patch is not acceptable as-is; it would need to fix up all
of the callers of packet_buf_write().

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()
  2016-08-10 13:59           ` Lars Schneider
@ 2016-08-10 14:34             ` Jeff King
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2016-08-10 14:34 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 03:59:19PM +0200, Lars Schneider wrote:

> > It does still feel a little weird that you cannot tell the difference
> > between a write() error and bad input. Because you really might want to
> > do something different between the two. Like:
> > 
> >  #define MAX_FILENAME (PKTLINE_DATA_MAXLEN - strlen("filename"))
> > 
> >  if (filename > MAX_FILENAME) {
> > 	warning("woah, that name is ridiculous; truncating");
> > 	ret = packet_write_fmt_gently(fd, "%.*s", MAX_FILENAME, filename);
> >  } else
> > 	ret = packet_write_fmt_gently(fd, "%s", filename);
> 
> 
> I can do that. However, I wouldn't truncate the filename as this
> might create a weird outcome. I would just let the filter fail.

Yeah, I think that is probably fine (I don't have a real opinion for
this particular case, but was mostly just trying to think about whether
the pktline interface was suitably flexible).

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()
  2016-08-10 14:10       ` Lars Schneider
@ 2016-08-10 15:01         ` Jeff King
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2016-08-10 15:01 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, gitster, jnareb, mlbright, e, Johannes.Schindelin, ben

On Wed, Aug 10, 2016 at 04:10:02PM +0200, Lars Schneider wrote:

> 
> > On 10 Aug 2016, at 15:43, Jeff King <peff@peff.net> wrote:
> > 
> > On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschneider@gmail.com wrote:
> > 
> >> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
> >> +{
> >> +	static struct strbuf buf = STRBUF_INIT;
> >> +	va_list args;
> >> +
> >> +	strbuf_reset(&buf);
> >> +	va_start(args, fmt);
> >> +	format_packet(1, &buf, fmt, args);
> >> +	va_end(args);
> >> +	packet_trace(buf.buf + 4, buf.len - 4, 1);
> >> +	return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
> >> +}
> > 
> > Could the end of this function just be:
> > 
> >  return packet_write_gently(fd, buf.buf, buf.len);
> > 
> > ? I guess we'd prefer to avoid that, because it incurs an extra
> > memmove() of the data.
> 
> I don't think the memmove would be that expensive. However, format_packet()
> already creates the packet_header and packet_write_gently would do the same
> again, no?

Yeah, I think you would want extra refactoring to have a shared common
function. I took a stab at it, but the result ends up pretty ugly; the
amount of boilerplate exceeds the duplication here (the really nasty
thing is that format_packet() is hard to split up, because the part you
want to switch out is in the middle, but it needs to keep some context
between the start and the end. In a higher level language you'd pass it
a callback to fill in the strbuf in the middle, but in C that just ends
up horrible).

> > Similarly, I'd think this could share code with the non-gentle form
> > (which should be able to just call this and die() if returns an error).
> > Though sometimes the va_list transformation makes that awkward.
> 
> Yeah, the code duplication annoyed me, too. va_list was the reason I did it
> that way. Do you think that is something that needs to be addressed in the
> series?

No, I don't think it needs to be. It's just a case of making sure that
the internals don't grow too crufty and unmanageable for future
maintainability.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()
  2016-08-10 13:40         ` Jeff King
@ 2016-08-10 17:17           ` Junio C Hamano
  2016-08-10 17:49             ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2016-08-10 17:17 UTC (permalink / raw)
  To: Jeff King
  Cc: Lars Schneider, git, jnareb, mlbright, e, Johannes.Schindelin,
	ben

Jeff King <peff@peff.net> writes:

> On Wed, Aug 10, 2016 at 03:36:45PM +0200, Lars Schneider wrote:
>
>> > So now we have packet_write() and packet_write_gently(), but they differ
>> > in more than just whether they are gentle. That seems like a weird
>> > interface.
>> > 
>> > Should we either be picking a new name (e.g., packet_write_mem() or
>> > something), or migrating packet_write() to packet_write_fmt()?
>> 
>> Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt()"
>
> Ah, OK. Generally I'd suggest to reorder things so that each patch looks
> like a step forward (and so the early patches become preparatory steps,
> and the justification in them is something like "we're going to add more
> write functions, so let's give this a more descriptive name").

I am guilty for saying "packet_write() should have been similar to
write(2)".  We may want to have a time-period during which there is
no "packet_write()" in the codebase, before we get to that stage.
I.e. rename it to packet_write_fmt() to vacate the name and add
packet_write_mem(), and then later rename packet_write_mem() to its
final name packet_write(), or something like that.  The two-step
process would reduce the chance of misconversion.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()
  2016-08-10 13:43     ` Jeff King
  2016-08-10 14:10       ` Lars Schneider
@ 2016-08-10 17:18       ` Junio C Hamano
  2016-08-10 17:53         ` Lars Schneider
  1 sibling, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2016-08-10 17:18 UTC (permalink / raw)
  To: Jeff King
  Cc: larsxschneider, git, jnareb, mlbright, e, Johannes.Schindelin,
	ben

Jeff King <peff@peff.net> writes:

> On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschneider@gmail.com wrote:
>
>> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
>> +{
>> +	static struct strbuf buf = STRBUF_INIT;
>> +	va_list args;
>> +
>> +	strbuf_reset(&buf);
>> +	va_start(args, fmt);
>> +	format_packet(1, &buf, fmt, args);
>> +	va_end(args);
>> +	packet_trace(buf.buf + 4, buf.len - 4, 1);
>> +	return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
>> +}
>
> Could the end of this function just be:
>
>   return packet_write_gently(fd, buf.buf, buf.len);
>
> ? I guess we'd prefer to avoid that, because it incurs an extra
> memmove() of the data.
>
> Similarly, I'd think this could share code with the non-gentle form
> (which should be able to just call this and die() if returns an error).
> Though sometimes the va_list transformation makes that awkward.

Yes.

Also regarding the naming, please have "_gently" at the end; that is
how all other function families with _gently variant are named, I
think.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()
  2016-08-10 17:17           ` Junio C Hamano
@ 2016-08-10 17:49             ` Lars Schneider
  2016-08-10 18:21               ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 17:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, jnareb, mlbright, e, Johannes.Schindelin, ben


> On 10 Aug 2016, at 19:17, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Jeff King <peff@peff.net> writes:
> 
>> On Wed, Aug 10, 2016 at 03:36:45PM +0200, Lars Schneider wrote:
>> 
>>>> So now we have packet_write() and packet_write_gently(), but they differ
>>>> in more than just whether they are gentle. That seems like a weird
>>>> interface.
>>>> 
>>>> Should we either be picking a new name (e.g., packet_write_mem() or
>>>> something), or migrating packet_write() to packet_write_fmt()?
>>> 
>>> Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt()"
>> 
>> Ah, OK. Generally I'd suggest to reorder things so that each patch looks
>> like a step forward (and so the early patches become preparatory steps,
>> and the justification in them is something like "we're going to add more
>> write functions, so let's give this a more descriptive name").
> 
> I am guilty for saying "packet_write() should have been similar to
> write(2)".  We may want to have a time-period during which there is
> no "packet_write()" in the codebase, before we get to that stage.
> I.e. rename it to packet_write_fmt() to vacate the name and add
> packet_write_mem(), and then later rename packet_write_mem() to its
> final name packet_write(), or something like that.  The two-step
> process would reduce the chance of misconversion.

OK. Does this mean I can leave the "packet_write()" to "packet_write_fmt()"
rename as is in this series?

Thanks,
Lars


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()
  2016-08-10 17:18       ` Junio C Hamano
@ 2016-08-10 17:53         ` Lars Schneider
  2016-08-10 18:42           ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 17:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, jnareb, mlbright, e, Johannes.Schindelin, ben


> On 10 Aug 2016, at 19:18, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Jeff King <peff@peff.net> writes:
> 
>> On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschneider@gmail.com wrote:
>> 
>>> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
>>> +{
>>> +	static struct strbuf buf = STRBUF_INIT;
>>> +	va_list args;
>>> +
>>> +	strbuf_reset(&buf);
>>> +	va_start(args, fmt);
>>> +	format_packet(1, &buf, fmt, args);
>>> +	va_end(args);
>>> +	packet_trace(buf.buf + 4, buf.len - 4, 1);
>>> +	return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
>>> +}
>> 
>> Could the end of this function just be:
>> 
>>  return packet_write_gently(fd, buf.buf, buf.len);
>> 
>> ? I guess we'd prefer to avoid that, because it incurs an extra
>> memmove() of the data.
>> 
>> Similarly, I'd think this could share code with the non-gentle form
>> (which should be able to just call this and die() if returns an error).
>> Though sometimes the va_list transformation makes that awkward.
> 
> Yes.

Peff just posted that he tried the shared code idea but the result
ended up ugly.


> Also regarding the naming, please have "_gently" at the end; that is
> how all other function families with _gently variant are named, I
> think.

OK, I will rename them.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()
  2016-08-10 17:49             ` Lars Schneider
@ 2016-08-10 18:21               ` Junio C Hamano
  2016-08-10 19:15                 ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2016-08-10 18:21 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Jeff King, git, jnareb, mlbright, e, Johannes.Schindelin, ben

Lars Schneider <larsxschneider@gmail.com> writes:

>> On 10 Aug 2016, at 19:17, Junio C Hamano <gitster@pobox.com> wrote:
>> 
> OK. Does this mean I can leave the "packet_write()" to "packet_write_fmt()"
> rename as is in this series?

I didn't really check what order you are doing things to answer
that.

If the function that is introduced in this step is a version of
packet_write_fmt() that does its thing only gently, you would want
to do the rename s/packet_write/packet_write_fmt/ before this step,
and then add the new function as packet_write_fmt_gently(), I would
think.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()
  2016-08-10 17:53         ` Lars Schneider
@ 2016-08-10 18:42           ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2016-08-10 18:42 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Jeff King, git, jnareb, mlbright, e, Johannes.Schindelin, ben

Lars Schneider <larsxschneider@gmail.com> writes:

>>> Similarly, I'd think this could share code with the non-gentle form
>>> (which should be able to just call this and die() if returns an error).
>>> Though sometimes the va_list transformation makes that awkward.
>> 
>> Yes.
>
> Peff just posted that he tried the shared code idea but the result
> ended up ugly.

OK.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()
  2016-08-10 18:21               ` Junio C Hamano
@ 2016-08-10 19:15                 ` Lars Schneider
  0 siblings, 0 replies; 50+ messages in thread
From: Lars Schneider @ 2016-08-10 19:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, jnareb, mlbright, e, Johannes.Schindelin, ben


> On 10 Aug 2016, at 20:21, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>>> On 10 Aug 2016, at 19:17, Junio C Hamano <gitster@pobox.com> wrote:
>>> 
>> OK. Does this mean I can leave the "packet_write()" to "packet_write_fmt()"
>> rename as is in this series?
> 
> I didn't really check what order you are doing things to answer
> that.
> 
> If the function that is introduced in this step is a version of
> packet_write_fmt() that does its thing only gently, you would want
> to do the rename s/packet_write/packet_write_fmt/ before this step,
> and then add the new function as packet_write_fmt_gently(), I would
> think.

OK - will fix. I did it that way because I thought it would be easier
if we decide to drop the big rename patch.

Thanks,
Lars


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-10 13:04   ` [PATCH v5 14/15] convert: add filter.<driver>.process option larsxschneider
@ 2016-08-12 16:33     ` Stefan Beller
  2016-08-12 16:38       ` Jeff King
  2016-08-12 16:59       ` Lars Schneider
  0 siblings, 2 replies; 50+ messages in thread
From: Stefan Beller @ 2016-08-12 16:33 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git@vger.kernel.org, Junio C Hamano, Jakub Narębski,
	mlbright, Eric Wong, Jeff King, Johannes Schindelin, ben

On Wed, Aug 10, 2016 at 6:04 AM,  <larsxschneider@gmail.com> wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> Git's clean/smudge mechanism invokes an external filter process for every
> single blob that is affected by a filter. If Git filters a lot of blobs
> then the startup time of the external filter processes can become a
> significant part of the overall Git execution time.
>
> In a preliminary performance test this developer used a clean/smudge filter
> written in golang to filter 12,000 files. This process took 364s with the
> existing filter mechanism and 5s with the new mechanism. See details here:
> https://github.com/github/git-lfs/pull/1382
>
> This patch adds the `filter.<driver>.process` string option which, if used,
> keeps the external filter process running and processes all blobs with
> the packet format (pkt-line) based protocol over standard input and standard
> output described below.
>
> Git starts the filter when it encounters the first file
> that needs to be cleaned or smudged. After the filter started
> Git sends a welcome message, a list of supported protocol
> version numbers, and a flush packet. Git expects to read the
> welcome message and one protocol version number from the
> previously sent list. Afterwards Git sends a list of supported
> capabilities and a flush packet. Git expects to read a list of
> desired capabilities, which must be a subset of the supported
> capabilities list, and a flush packet as response:
> ------------------------
> packet:          git> git-filter-client
> packet:          git> version=2
> packet:          git> version=42
> packet:          git> 0000
> packet:          git< git-filter-server
> packet:          git< version=2

what follows is specific to version=2?
version 42 may deem capabilities a bad idea?

> packet:          git> clean=true
> packet:          git> smudge=true
> packet:          git> not-yet-invented=true
> packet:          git> 0000
> packet:          git< clean=true
> packet:          git< smudge=true
> packet:          git< 0000
> ------------------------
> Supported filter capabilities in version 2 are "clean" and
> "smudge".

I assume version 2 is an example here and we actually start with v1?

Can you clarify why we need welcome messages?
(Is there a technical reason, or better debuggability for humans?)

>
> Afterwards Git sends a list of "key=value" pairs terminated with
> a flush packet. The list will contain at least the filter command
> (based on the supported capabilities) and the pathname of the file
> to filter relative to the repository root. Right after these packets
> Git sends the content split in zero or more pkt-line packets and a
> flush packet to terminate content.
> ------------------------
> packet:          git> command=smudge\n
> packet:          git> pathname=path/testfile.dat\n
> packet:          git> 0000
> packet:          git> CONTENT
> packet:          git> 0000
> ------------------------
>
> The filter is expected to respond with a list of "key=value" pairs
> terminated with a flush packet. If the filter does not experience
> problems then the list must contain a "success" status. Right after
> these packets the filter is expected to send the content in zero
> or more pkt-line packets and a flush packet at the end. Finally, a
> second list of "key=value" pairs terminated with a flush packet
> is expected. The filter can change the status in the second list.
> ------------------------
> packet:          git< status=success\n
> packet:          git< 0000
> packet:          git< SMUDGED_CONTENT
> packet:          git< 0000
> packet:          git< 0000  # empty list!
> ------------------------
>
> If the result content is empty then the filter is expected to respond
> with a success status and an empty list.
> ------------------------
> packet:          git< status=success\n
> packet:          git< 0000
> packet:          git< 0000  # empty content!
> packet:          git< 0000  # empty list!
> ------------------------

Why do we need the last flush packet? We'd expect as many successes
as we send out contents? Do we plan on interleaving operation, i.e.
Git sends out 10 files but the filter process is not as fast as Git sending
out and the answers trickle in slowly?

>
> In case the filter cannot or does not want to process the content,
> it is expected to respond with an "error" status. Depending on the
> `filter.<driver>.required` flag Git will interpret that as error
> but it will not stop or restart the filter process.
> ------------------------
> packet:          git< status=error\n
> packet:          git< 0000
> ------------------------
>
> In case the filter cannot or does not want to process the content
> as well as any future content for the lifetime of the Git process,
> it is expected to respond with an "error-all" status. Depending on
> the `filter.<driver>.required` flag Git will interpret that as error
> but it will not stop or restart the filter process.
> ------------------------
> packet:          git< status=error-all\n
> packet:          git< 0000
> ------------------------
>
> If the filter experiences an error during processing, then it can
> send the status "error". Depending on the `filter.<driver>.required`
> flag Git will interpret that as error but it will not stop or restart
> the filter process.
> ------------------------
> packet:          git< status=success\n

So the first success is meaningless essentially?
Would it make sense to move the sucess behind the content sending
in all cases?

> packet:          git< 0000
> packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
> packet:          git< 0000
> packet:          git< status=error\n
> packet:          git< 0000
> ------------------------
>
> If the filter dies during the communication or does not adhere to
> the protocol then Git will stop the filter process and restart it
> with the next file that needs to be processed.
>
> After the filter has processed a blob it is expected to wait for
> the next "key=value" list containing a command. When the Git process
> terminates, it will send a kill signal to the filter in that stage.
>
> If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
> is configured then these commands always take precedence over
> a configured `filter.<driver>.process` command.

okay. I think you can omit most of the commit message as it is a duplicate
of the documentation?

Instead the commit message can answer questions that are not part of
the documentation. (See the questions above which can be summarized
as "Why do we do it this way and not differently?")


> +       if (err || errno == EPIPE) {
> +               if (!strcmp(filter_status.buf, "error")) {
> +                       /*
> +                    * The filter signaled a problem with the file.
> +                    */

/* This could go into a single line comment. */

> +               } else if (!strcmp(filter_status.buf, "error-all")) {
> +                       /*
> +                        * The filter signaled a permanent problem. Don't try to filter
> +                        * files with the same command for the lifetime of the current
> +                        * Git process.
> +                        */
> +                        entry->supported_capabilities &= ~wanted_capability;
> +               } else {
> +                       /*
> +                        * Something went wrong with the protocol filter.
> +                        * Force shutdown and restart if another blob requires filtering!
> +                        */
> +                       error("external filter '%s' failed", cmd);

failed .. Can you give more information to the user such that they can easier
debug? (blob/path or state / expected state)


> +
>  static int read_convert_config(const char *var, const char *value, void *cb)
>  {
>         const char *key, *name;
> @@ -526,6 +818,10 @@ static int read_convert_config(const char *var, const char *value, void *cb)
>         if (!strcmp("clean", key))
>                 return git_config_string(&drv->clean, var, value);
>
> +       if (!strcmp("process", key)) {
> +               return git_config_string(&drv->process, var, value);
> +       }

optional nit: braces unnecessary

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 16:33     ` Stefan Beller
@ 2016-08-12 16:38       ` Jeff King
  2016-08-12 16:48         ` Stefan Beller
  2016-08-12 16:59       ` Lars Schneider
  1 sibling, 1 reply; 50+ messages in thread
From: Jeff King @ 2016-08-12 16:38 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Jakub Narębski, mlbright, Eric Wong, Johannes Schindelin,
	ben

On Fri, Aug 12, 2016 at 09:33:18AM -0700, Stefan Beller wrote:

> > If the result content is empty then the filter is expected to respond
> > with a success status and an empty list.
> > ------------------------
> > packet:          git< status=success\n
> > packet:          git< 0000
> > packet:          git< 0000  # empty content!
> > packet:          git< 0000  # empty list!
> > ------------------------
> 
> Why do we need the last flush packet? We'd expect as many successes
> as we send out contents? Do we plan on interleaving operation, i.e.
> Git sends out 10 files but the filter process is not as fast as Git sending
> out and the answers trickle in slowly?

There was prior discussion in the thread, but basically, it is there to
be able to signal an error that is encountered midway through sending
the file (i.e., to say "status=error"). If you do not have a final
flush, then you would send nothing, and the receiver would be left
wondering if you were successful, or if it simply did not get your error
report yet.

> > If the filter experiences an error during processing, then it can
> > send the status "error". Depending on the `filter.<driver>.required`
> > flag Git will interpret that as error but it will not stop or restart
> > the filter process.
> > ------------------------
> > packet:          git< status=success\n
> 
> So the first success is meaningless essentially?
> Would it make sense to move the sucess behind the content sending
> in all cases?

No, the first success says "good so far, here's the file content". The
second says "I succeeded in sending you the file content".

You _can_ drop the first one, but it may be more convenient for the
receiver to know up-front that there was a failure.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 16:38       ` Jeff King
@ 2016-08-12 16:48         ` Stefan Beller
  2016-08-12 17:08           ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Beller @ 2016-08-12 16:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Jakub Narębski, mlbright, Eric Wong, Johannes Schindelin,
	ben

On Fri, Aug 12, 2016 at 9:38 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Aug 12, 2016 at 09:33:18AM -0700, Stefan Beller wrote:
>
>> > If the result content is empty then the filter is expected to respond
>> > with a success status and an empty list.
>> > ------------------------
>> > packet:          git< status=success\n
>> > packet:          git< 0000
>> > packet:          git< 0000  # empty content!
>> > packet:          git< 0000  # empty list!
>> > ------------------------
>>
>> Why do we need the last flush packet? We'd expect as many successes
>> as we send out contents? Do we plan on interleaving operation, i.e.
>> Git sends out 10 files but the filter process is not as fast as Git sending
>> out and the answers trickle in slowly?
>
> There was prior discussion in the thread, but basically, it is there to
> be able to signal an error that is encountered midway through sending
> the file (i.e., to say "status=error"). If you do not have a final
> flush, then you would send nothing, and the receiver would be left
> wondering if you were successful, or if it simply did not get your error
> report yet.

    I did not follow the prior discussion, so I approached this review with
    no prior knowledge from prior reviews, but instead read through and
    was asking a lot of questions that came up immediately. In case my
    questions are too dumb just omit them, but I thought they were good
    material to answer in a commit message ("Why did we do it this way
    and not differently").

Thanks for the explanation. So this is similar as the situation below
that we wait for the flush and then an error/success report?

>
>> > If the filter experiences an error during processing, then it can
>> > send the status "error". Depending on the `filter.<driver>.required`
>> > flag Git will interpret that as error but it will not stop or restart
>> > the filter process.
>> > ------------------------
>> > packet:          git< status=success\n
>>
>> So the first success is meaningless essentially?
>> Would it make sense to move the sucess behind the content sending
>> in all cases?
>
> No, the first success says "good so far, here's the file content". The
> second says "I succeeded in sending you the file content".
>
> You _can_ drop the first one, but it may be more convenient for the
> receiver to know up-front that there was a failure.


If there was a failure upfront, it would become

packet:          git< 0000
# no content is encapsulated here
packet:          git< 0000
packet:          git< status=error\n
packet:          git< 0000

so from a protocol side I'd claim it doesn't look bad.
I assume with convenient you mean the implementation
side of things?

If we do the success first and then error out halfway, we
still have to clean up, so I do not see how this impacts
implementation?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 16:33     ` Stefan Beller
  2016-08-12 16:38       ` Jeff King
@ 2016-08-12 16:59       ` Lars Schneider
  2016-08-12 17:07         ` Stefan Beller
  1 sibling, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-12 16:59 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Junio C Hamano, Jakub Narębski,
	Martin-Louis Bright, Eric Wong, Jeff King, Johannes Schindelin,
	ben


> On 12 Aug 2016, at 18:33, Stefan Beller <sbeller@google.com> wrote:
> 
> On Wed, Aug 10, 2016 at 6:04 AM,  <larsxschneider@gmail.com> wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> Git's clean/smudge mechanism invokes an external filter process for every
>> single blob that is affected by a filter. If Git filters a lot of blobs
>> then the startup time of the external filter processes can become a
>> significant part of the overall Git execution time.
>> 
>> In a preliminary performance test this developer used a clean/smudge filter
>> written in golang to filter 12,000 files. This process took 364s with the
>> existing filter mechanism and 5s with the new mechanism. See details here:
>> https://github.com/github/git-lfs/pull/1382
>> 
>> This patch adds the `filter.<driver>.process` string option which, if used,
>> keeps the external filter process running and processes all blobs with
>> the packet format (pkt-line) based protocol over standard input and standard
>> output described below.
>> 
>> Git starts the filter when it encounters the first file
>> that needs to be cleaned or smudged. After the filter started
>> Git sends a welcome message, a list of supported protocol
>> version numbers, and a flush packet. Git expects to read the
>> welcome message and one protocol version number from the
>> previously sent list. Afterwards Git sends a list of supported
>> capabilities and a flush packet. Git expects to read a list of
>> desired capabilities, which must be a subset of the supported
>> capabilities list, and a flush packet as response:
>> ------------------------
>> packet:          git> git-filter-client
>> packet:          git> version=2
>> packet:          git> version=42
>> packet:          git> 0000
>> packet:          git< git-filter-server
>> packet:          git< version=2
> 
> what follows is specific to version=2?
> version 42 may deem capabilities a bad idea?

"version=42" is just an example to show how the initialization could look
like in a distant future when we support even another protocol version.

You are correct, what follows is specific to version=2. I will state
that more clearly in the documentation.

Can you try to rephrase "version 42 may deem capabilities a bad idea?"
I am not sure I understand what you mean.


> 
>> packet:          git> clean=true
>> packet:          git> smudge=true
>> packet:          git> not-yet-invented=true
>> packet:          git> 0000
>> packet:          git< clean=true
>> packet:          git< smudge=true
>> packet:          git< 0000
>> ------------------------
>> Supported filter capabilities in version 2 are "clean" and
>> "smudge".
> 
> I assume version 2 is an example here and we actually start with v1?

No, it is actually called version 2 because I consider the current
clean/smudge protocol version 1.


> Can you clarify why we need welcome messages?
> (Is there a technical reason, or better debuggability for humans?)

The welcome message is necessary to distinguish the long running
filter protocol (v2) from the current one-shot filter protocol (v1).
This is becomes important if a users tries to use a v1 clean/smudge
filter with the v2 git config settings.


>> Afterwards Git sends a list of "key=value" pairs terminated with
>> a flush packet. The list will contain at least the filter command
>> (based on the supported capabilities) and the pathname of the file
>> to filter relative to the repository root. Right after these packets
>> Git sends the content split in zero or more pkt-line packets and a
>> flush packet to terminate content.
>> ------------------------
>> packet:          git> command=smudge\n
>> packet:          git> pathname=path/testfile.dat\n
>> packet:          git> 0000
>> packet:          git> CONTENT
>> packet:          git> 0000
>> ------------------------
>> 
>> The filter is expected to respond with a list of "key=value" pairs
>> terminated with a flush packet. If the filter does not experience
>> problems then the list must contain a "success" status. Right after
>> these packets the filter is expected to send the content in zero
>> or more pkt-line packets and a flush packet at the end. Finally, a
>> second list of "key=value" pairs terminated with a flush packet
>> is expected. The filter can change the status in the second list.
>> ------------------------
>> packet:          git< status=success\n
>> packet:          git< 0000
>> packet:          git< SMUDGED_CONTENT
>> packet:          git< 0000
>> packet:          git< 0000  # empty list!
>> ------------------------
>> 
>> If the result content is empty then the filter is expected to respond
>> with a success status and an empty list.
>> ------------------------
>> packet:          git< status=success\n
>> packet:          git< 0000
>> packet:          git< 0000  # empty content!
>> packet:          git< 0000  # empty list!
>> ------------------------
> 
> Why do we need the last flush packet? We'd expect as many successes
> as we send out contents? Do we plan on interleaving operation, i.e.
> Git sends out 10 files but the filter process is not as fast as Git sending
> out and the answers trickle in slowly?

Git filter processes run sequentially right now (unfortunately).

re flush: please see Peff's answer:
http://public-inbox.org/git/20160812163809.3wdkuqegxfjam2yn%40sigill.intra.peff.net/


>> In case the filter cannot or does not want to process the content,
>> it is expected to respond with an "error" status. Depending on the
>> `filter.<driver>.required` flag Git will interpret that as error
>> but it will not stop or restart the filter process.
>> ------------------------
>> packet:          git< status=error\n
>> packet:          git< 0000
>> ------------------------
>> 
>> In case the filter cannot or does not want to process the content
>> as well as any future content for the lifetime of the Git process,
>> it is expected to respond with an "error-all" status. Depending on
>> the `filter.<driver>.required` flag Git will interpret that as error
>> but it will not stop or restart the filter process.
>> ------------------------
>> packet:          git< status=error-all\n
>> packet:          git< 0000
>> ------------------------
>> 
>> If the filter experiences an error during processing, then it can
>> send the status "error". Depending on the `filter.<driver>.required`
>> flag Git will interpret that as error but it will not stop or restart
>> the filter process.
>> ------------------------
>> packet:          git< status=success\n
> 
> So the first success is meaningless essentially?
> Would it make sense to move the sucess behind the content sending
> in all cases?

Again, I refer to Peff's answer.


>> packet:          git< 0000
>> packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
>> packet:          git< 0000
>> packet:          git< status=error\n
>> packet:          git< 0000
>> ------------------------
>> 
>> If the filter dies during the communication or does not adhere to
>> the protocol then Git will stop the filter process and restart it
>> with the next file that needs to be processed.
>> 
>> After the filter has processed a blob it is expected to wait for
>> the next "key=value" list containing a command. When the Git process
>> terminates, it will send a kill signal to the filter in that stage.
>> 
>> If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
>> is configured then these commands always take precedence over
>> a configured `filter.<driver>.process` command.
> 
> okay. I think you can omit most of the commit message as it is a duplicate
> of the documentation?

Yes it duplicates the documentation. 


> Instead the commit message can answer questions that are not part of
> the documentation. (See the questions above which can be summarized
> as "Why do we do it this way and not differently?")

OK, point taken. I will write a new commit message for v6.


> 
>> +       if (err || errno == EPIPE) {
>> +               if (!strcmp(filter_status.buf, "error")) {
>> +                       /*
>> +                    * The filter signaled a problem with the file.
>> +                    */
> 
> /* This could go into a single line comment. */

OK, will change.


>> +               } else if (!strcmp(filter_status.buf, "error-all")) {
>> +                       /*
>> +                        * The filter signaled a permanent problem. Don't try to filter
>> +                        * files with the same command for the lifetime of the current
>> +                        * Git process.
>> +                        */
>> +                        entry->supported_capabilities &= ~wanted_capability;
>> +               } else {
>> +                       /*
>> +                        * Something went wrong with the protocol filter.
>> +                        * Force shutdown and restart if another blob requires filtering!
>> +                        */
>> +                       error("external filter '%s' failed", cmd);
> 
> failed .. Can you give more information to the user such that they can easier
> debug? (blob/path or state / expected state)

Agreed, will add!
However, we don't give this information with the current clean/smudge interface.


>> +
>> static int read_convert_config(const char *var, const char *value, void *cb)
>> {
>>        const char *key, *name;
>> @@ -526,6 +818,10 @@ static int read_convert_config(const char *var, const char *value, void *cb)
>>        if (!strcmp("clean", key))
>>                return git_config_string(&drv->clean, var, value);
>> 
>> +       if (!strcmp("process", key)) {
>> +               return git_config_string(&drv->process, var, value);
>> +       }
> 
> optional nit: braces unnecessary

Agreed, will remove!


Thanks a lot for the review,
Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 16:59       ` Lars Schneider
@ 2016-08-12 17:07         ` Stefan Beller
  2016-08-12 17:14           ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Beller @ 2016-08-12 17:07 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git@vger.kernel.org, Junio C Hamano, Jakub Narębski,
	Martin-Louis Bright, Eric Wong, Jeff King, Johannes Schindelin,
	ben

On Fri, Aug 12, 2016 at 9:59 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:
>
> The welcome message is necessary to distinguish the long running
> filter protocol (v2) from the current one-shot filter protocol (v1).
> This is becomes important if a users tries to use a v1 clean/smudge
> filter with the v2 git config settings.

Oh I see, that's why we're at v2 now.
How do you distinguish between v1 and v2? Does the welcome message
need to follow a certain pattern to be recognized to make it v2+ ?

>
> Thanks a lot for the review,
> Lars

Sorry for repeating the questions (it seems I missed
a lot of the prior discussion), but I think these questions
may help future readers of the commit message,

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 16:48         ` Stefan Beller
@ 2016-08-12 17:08           ` Lars Schneider
  2016-08-12 17:13             ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-12 17:08 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeff King, git@vger.kernel.org, Junio C Hamano,
	Jakub Narębski, mlbright, Eric Wong, Johannes Schindelin,
	ben


> On 12 Aug 2016, at 18:48, Stefan Beller <sbeller@google.com> wrote:
> 
> On Fri, Aug 12, 2016 at 9:38 AM, Jeff King <peff@peff.net> wrote:
>> On Fri, Aug 12, 2016 at 09:33:18AM -0700, Stefan Beller wrote:
>> 
>>>> If the result content is empty then the filter is expected to respond
>>>> with a success status and an empty list.
>>>> ------------------------
>>>> packet:          git< status=success\n
>>>> packet:          git< 0000
>>>> packet:          git< 0000  # empty content!
>>>> packet:          git< 0000  # empty list!
>>>> ------------------------
>>> 
>>> Why do we need the last flush packet? We'd expect as many successes
>>> as we send out contents? Do we plan on interleaving operation, i.e.
>>> Git sends out 10 files but the filter process is not as fast as Git sending
>>> out and the answers trickle in slowly?
>> 
>> There was prior discussion in the thread, but basically, it is there to
>> be able to signal an error that is encountered midway through sending
>> the file (i.e., to say "status=error"). If you do not have a final
>> flush, then you would send nothing, and the receiver would be left
>> wondering if you were successful, or if it simply did not get your error
>> report yet.
> 
>    I did not follow the prior discussion, so I approached this review with
>    no prior knowledge from prior reviews, but instead read through and
>    was asking a lot of questions that came up immediately. In case my
>    questions are too dumb just omit them, but I thought they were good
>    material to answer in a commit message ("Why did we do it this way
>    and not differently").

Thanks! That's very helpful and I will address your questions in the commit
message as anyone looking at this commit in the future will have no prior 
knowledge, too.


> Thanks for the explanation. So this is similar as the situation below
> that we wait for the flush and then an error/success report?

Correct. If we would just process the status packet then we wouldn't
even need to wait for the flush. I added flush because that allows us
to send an arbitrary list of key=value pairs in the future.


>>>> If the filter experiences an error during processing, then it can
>>>> send the status "error". Depending on the `filter.<driver>.required`
>>>> flag Git will interpret that as error but it will not stop or restart
>>>> the filter process.
>>>> ------------------------
>>>> packet:          git< status=success\n
>>> 
>>> So the first success is meaningless essentially?
>>> Would it make sense to move the sucess behind the content sending
>>> in all cases?
>> 
>> No, the first success says "good so far, here's the file content". The
>> second says "I succeeded in sending you the file content".
>> 
>> You _can_ drop the first one, but it may be more convenient for the
>> receiver to know up-front that there was a failure.
> 
> 
> If there was a failure upfront, it would become
> 
> packet:          git< 0000
> # no content is encapsulated here
> packet:          git< 0000
> packet:          git< status=error\n
> packet:          git< 0000

No, a failure upfront would look like this (see documentation):

------------------------
packet:          git< status=error\n
packet:          git< 0000
------------------------

No content and no 2nd key=value pair list is exchanged after an error.



> so from a protocol side I'd claim it doesn't look bad.
> I assume with convenient you mean the implementation
> side of things?
> 
> If we do the success first and then error out halfway, we
> still have to clean up, so I do not see how this impacts
> implementation?

That is true. The reasoning is that an error in between is somewhat
less expected. Therefore additional work is OK.

An error upfront is much more likely because it is also a mechanism
for the filter to reject certain files. If the filter is configured
as "required=false" then this reject would actually be OK.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 17:08           ` Lars Schneider
@ 2016-08-12 17:13             ` Junio C Hamano
  2016-08-12 17:21               ` Lars Schneider
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2016-08-12 17:13 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Stefan Beller, Jeff King, git@vger.kernel.org,
	Jakub Narębski, mlbright, Eric Wong, Johannes Schindelin,
	ben

Lars Schneider <larsxschneider@gmail.com> writes:

>> If we do the success first and then error out halfway, we
>> still have to clean up, so I do not see how this impacts
>> implementation?
>
> That is true. The reasoning is that an error in between is somewhat
> less expected. Therefore additional work is OK.
>
> An error upfront is much more likely because it is also a mechanism
> for the filter to reject certain files. If the filter is configured
> as "required=false" then this reject would actually be OK.

Unless the reasoning is "an error in between is so rare that we are
OK if the protocol misbehaves and the receiving end omits error
handing", I am not so sure how "therefore additional work is OK" is
a reasonable conclusion.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 17:07         ` Stefan Beller
@ 2016-08-12 17:14           ` Lars Schneider
  0 siblings, 0 replies; 50+ messages in thread
From: Lars Schneider @ 2016-08-12 17:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Junio C Hamano, Jakub Narębski,
	Martin-Louis Bright, Eric Wong, Jeff King, Johannes Schindelin,
	ben


> On 12 Aug 2016, at 19:07, Stefan Beller <sbeller@google.com> wrote:
> 
> On Fri, Aug 12, 2016 at 9:59 AM, Lars Schneider
> <larsxschneider@gmail.com> wrote:
>> 
>> The welcome message is necessary to distinguish the long running
>> filter protocol (v2) from the current one-shot filter protocol (v1).
>> This is becomes important if a users tries to use a v1 clean/smudge
>> filter with the v2 git config settings.
> 
> Oh I see, that's why we're at v2 now.
> How do you distinguish between v1 and v2? Does the welcome message
> need to follow a certain pattern to be recognized to make it v2+ ?

v1 has no format at all. It works like this:

1. Git starts the filter process
2. Git writes the entire file via stdin to the filter process
3. Git reads the result via stdout from the filter process
3. Git stops the filter process


Any v2+ would need to deal with the following:

packet:          git> git-filter-client
packet:          git> version=2
packet:          git> version=2+
packet:          git> 0000
packet:          git< git-filter-server

Everything after could be different in v2+ compared to v2.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 17:13             ` Junio C Hamano
@ 2016-08-12 17:21               ` Lars Schneider
  2016-08-12 18:03                 ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Schneider @ 2016-08-12 17:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jeff King, git@vger.kernel.org,
	Jakub Narębski, mlbright, Eric Wong, Johannes Schindelin,
	ben


> On 12 Aug 2016, at 19:13, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>>> If we do the success first and then error out halfway, we
>>> still have to clean up, so I do not see how this impacts
>>> implementation?
>> 
>> That is true. The reasoning is that an error in between is somewhat
>> less expected. Therefore additional work is OK.
>> 
>> An error upfront is much more likely because it is also a mechanism
>> for the filter to reject certain files. If the filter is configured
>> as "required=false" then this reject would actually be OK.
> 
> Unless the reasoning is "an error in between is so rare that we are
> OK if the protocol misbehaves and the receiving end omits error
> handing", I am not so sure how "therefore additional work is OK" is
> a reasonable conclusion.

Maybe I need to reword. An error is detected in either way if something
goes wrong. The advantage of the two step status is that if we fail early
then Git does not even need to create structures to read the response.

See Peff's answer here:
http://public-inbox.org/git/20160806121421.bs7n4lhed7phdshb%40sigill.intra.peff.net/
http://public-inbox.org/git/20160805222710.chefh5kiktyzketh%40sigill.intra.peff.net/

Thanks,
Lars

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/15] convert: add filter.<driver>.process option
  2016-08-12 17:21               ` Lars Schneider
@ 2016-08-12 18:03                 ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2016-08-12 18:03 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Stefan Beller, Jeff King, git@vger.kernel.org,
	Jakub Narębski, mlbright, Eric Wong, Johannes Schindelin,
	ben

Lars Schneider <larsxschneider@gmail.com> writes:

>> Unless the reasoning is "an error in between is so rare that we are
>> OK if the protocol misbehaves and the receiving end omits error
>> handing", I am not so sure how "therefore additional work is OK" is
>> a reasonable conclusion.
>
> Maybe I need to reword. An error is detected in either way if something
> goes wrong. The advantage of the two step status is that if we fail early
> then Git does not even need to create structures to read the response.

OK, that's much better.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 15/15] read-cache: make sure file handles are not inherited by child processes
  2016-08-10 13:04   ` [PATCH v5 15/15] read-cache: make sure file handles are not inherited by child processes larsxschneider
@ 2016-08-18 14:23     ` Johannes Schindelin
  0 siblings, 0 replies; 50+ messages in thread
From: Johannes Schindelin @ 2016-08-18 14:23 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, gitster, jnareb, mlbright, e, peff, ben

Hi Lars,

On Wed, 10 Aug 2016, larsxschneider@gmail.com wrote:

> From: Lars Schneider <larsxschneider@gmail.com>
> 
> Consider the case of a file that requires filtering and is present in branch A
> but not in branch B. If A is the current HEAD and we checkout B then the
> following happens:
> 
> 1. ce_compare_data() opens the file
> 2.   index_fd() detects that the file requires to run a clean filter and
>      calls index_stream_convert_blob()
> 4.     index_stream_convert_blob() calls convert_to_git_filter_fd()
> 5.       convert_to_git_filter_fd() calls apply_filter() which creates a new
>          long running filter process (in case it is the first file of this kind
>          to be filtered)
> 6.       The new filter process inherits all file handles. This is the default
>          on Linux/OSX and is explicitly defined in the `CreateProcessW` call
>          in `mingw.c` on Windows.
> 7. ce_compare_data() closes the file
> 8. Git unlinks the file as it is not present in B
> 
> The unlink operation does not work on Windows because the filter process has
> still an open handle to the file. Apparently that is no problem on Linux/OSX.
> Probably because "[...] the two file descriptors share open file status flags"
> (see fork(2)).

We typically wrap the commit messages at 76 columns per row (personally,
I wrap already at 72, and it seems Junio wraps at 70).

> Fix this problem by opening files in read-cache with the `O_CLOEXEC` flag to
> ensure that the file descriptor does not remain open in a newly spawned process.
> `O_CLOEXEX` is defined as `O_NOINHERIT` on Windows. A similar fix for temporary

s/CLOEXEX/CLOEXEC/

> file handles was applied on Git for Windows already:
> https://github.com/git-for-windows/git/commit/667b8b51ec850c3e1c7d75dee69dc13c29d1f162

In response to your commit note on GitHub, I submitted this patch series
(slightly cleaned up) yesterday (and you already commented on it):

https://public-inbox.org/git/cover.1471437637.git.johannes.schindelin@gmx.de

The patch is obviously correct, and needs the patch series I submitted to
compile on Windows (this note is more for Junio's interest than a comment
on this patch).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-08-18 14:24 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20160803164225.46355-1-larsxschneider@gmail.com/>
2016-08-10 13:03 ` [PATCH v5 00/15] Git filter protocol larsxschneider
2016-08-10 13:03   ` [PATCH v5 01/15] pkt-line: extract set_packet_header() larsxschneider
2016-08-10 13:03   ` [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-10 13:13     ` Jeff King
2016-08-10 13:24       ` Lars Schneider
2016-08-10 13:30         ` Jeff King
2016-08-10 13:51           ` Lars Schneider
2016-08-10 14:33             ` Jeff King
2016-08-10 13:03   ` [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet() larsxschneider
2016-08-10 13:15     ` Jeff King
2016-08-10 13:29       ` Lars Schneider
2016-08-10 13:37         ` Jeff King
2016-08-10 13:59           ` Lars Schneider
2016-08-10 14:34             ` Jeff King
2016-08-10 13:04   ` [PATCH v5 04/15] pkt-line: add packet_write_gently() larsxschneider
2016-08-10 13:28     ` Jeff King
2016-08-10 13:36       ` Lars Schneider
2016-08-10 13:40         ` Jeff King
2016-08-10 17:17           ` Junio C Hamano
2016-08-10 17:49             ` Lars Schneider
2016-08-10 18:21               ` Junio C Hamano
2016-08-10 19:15                 ` Lars Schneider
2016-08-10 13:04   ` [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt() larsxschneider
2016-08-10 13:43     ` Jeff King
2016-08-10 14:10       ` Lars Schneider
2016-08-10 15:01         ` Jeff King
2016-08-10 17:18       ` Junio C Hamano
2016-08-10 17:53         ` Lars Schneider
2016-08-10 18:42           ` Junio C Hamano
2016-08-10 13:04   ` [PATCH v5 06/15] pkt-line: add packet_flush_gently() larsxschneider
2016-08-10 13:04   ` [PATCH v5 07/15] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-10 13:04   ` [PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-08-10 13:04   ` [PATCH v5 09/15] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-10 13:04   ` [PATCH v5 10/15] convert: quote filter names in error messages larsxschneider
2016-08-10 13:04   ` [PATCH v5 11/15] convert: modernize tests larsxschneider
2016-08-10 13:04   ` [PATCH v5 12/15] convert: generate large test files only once larsxschneider
2016-08-10 13:04   ` [PATCH v5 13/15] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-08-10 13:04   ` [PATCH v5 14/15] convert: add filter.<driver>.process option larsxschneider
2016-08-12 16:33     ` Stefan Beller
2016-08-12 16:38       ` Jeff King
2016-08-12 16:48         ` Stefan Beller
2016-08-12 17:08           ` Lars Schneider
2016-08-12 17:13             ` Junio C Hamano
2016-08-12 17:21               ` Lars Schneider
2016-08-12 18:03                 ` Junio C Hamano
2016-08-12 16:59       ` Lars Schneider
2016-08-12 17:07         ` Stefan Beller
2016-08-12 17:14           ` Lars Schneider
2016-08-10 13:04   ` [PATCH v5 15/15] read-cache: make sure file handles are not inherited by child processes larsxschneider
2016-08-18 14:23     ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).