git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/3] Add support for downloading blobs on demand
@ 2017-03-22 16:52 Ben Peart
  2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Ben Peart @ 2017-03-22 16:52 UTC (permalink / raw)
  To: git; +Cc: benpeart, christian.couder, larsxschneider

We have a couple of patch series we’re working on (ObjectDB/Read-Object,
Watchman integration) where we could use the ability to have a
background process running that can accept multiple commands thus
avoiding the overhead of spawning a new process for every command.

The ability to do this was added in:

	Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16)
	keeps the external process running and processes all commands

but it is integrated into the convert code.  This patch series takes the
code from convert.c and refactors it into a separate “sub-process”
module so that we can centralize and reuse this logic in other areas.
Once the code was refactored into sub-process, convert.c was updated to
use the new module.

Ben Peart (3):
  pkt-line: add packet_write_list_gently()
  sub-process: refactor the filter process code into a reusable module
  convert: use new sub-process module for filter processes

 Documentation/technical/api-sub-process.txt |  55 ++++++++++
 Makefile                                    |   1 +
 convert.c                                   | 154 +++++-----------------------
 pkt-line.c                                  |  19 ++++
 pkt-line.h                                  |   1 +
 sub-process.c                               | 113 ++++++++++++++++++++
 sub-process.h                               |  46 +++++++++
 7 files changed, 259 insertions(+), 130 deletions(-)
 create mode 100644 Documentation/technical/api-sub-process.txt
 create mode 100644 sub-process.c
 create mode 100644 sub-process.h

-- 
2.12.0.gvfs.1.42.g0b7328eac2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 1/3] pkt-line: add packet_write_list_gently()
  2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart
@ 2017-03-22 16:52 ` Ben Peart
  2017-03-22 20:21   ` Junio C Hamano
  2017-03-22 16:52 ` [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module Ben Peart
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Ben Peart @ 2017-03-22 16:52 UTC (permalink / raw)
  To: git; +Cc: benpeart, christian.couder, larsxschneider

Add packet_write_list_gently() which writes multiple lines in a single
call and then calls packet_flush_gently(). This is used later in this
patch series.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 pkt-line.c | 19 +++++++++++++++++++
 pkt-line.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index d4b6bfe076..fccdac1352 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
 	return status;
 }
 
+int packet_write_list_gently(int fd, const char *line, ...)
+{
+	va_list args;
+	int err;
+	va_start(args, line);
+	for (;;) {
+		if (!line)
+			break;
+		if (strlen(line) > LARGE_PACKET_DATA_MAX)
+			return -1;
+		err = packet_write_fmt_gently(fd, "%s\n", line);
+		if (err)
+			return err;
+		line = va_arg(args, const char*);
+	}
+	va_end(args);
+	return packet_flush_gently(fd);
+}
+
 static int packet_write_gently(const int fd_out, const char *buf, size_t size)
 {
 	static char packet_write_buffer[LARGE_PACKET_MAX];
diff --git a/pkt-line.h b/pkt-line.h
index 18eac64830..3674d04509 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_write_list_gently(int fd, const char *line, ...);
 int write_packetized_from_fd(int fd_in, int fd_out);
 int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
 
-- 
2.12.0.gvfs.1.42.g0b7328eac2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module
  2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart
  2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart
@ 2017-03-22 16:52 ` Ben Peart
  2017-03-23  6:16   ` Junio C Hamano
  2017-03-22 16:52 ` [PATCH v1 3/3] convert: use new sub-process module for filter processes Ben Peart
  2017-03-25 11:59 ` [PATCH v1 0/3] Add support for downloading blobs on demand Duy Nguyen
  3 siblings, 1 reply; 12+ messages in thread
From: Ben Peart @ 2017-03-22 16:52 UTC (permalink / raw)
  To: git; +Cc: benpeart, christian.couder, larsxschneider

Create a new sub-process module that can be used to reduce the cost of
starting up a sub-process for multiple commands.  It does this by
keeping the external process running and processing all commands by
communicating over standard input and standard output using the packet
format (pkt-line) based protocol.  Full documentation is contained in
Documentation/technical/api-sub-process.txt.

This code is refactored from:

	Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16)
	keeps the external process running and processes all commands

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Documentation/technical/api-sub-process.txt |  55 ++++++++++++++
 Makefile                                    |   1 +
 sub-process.c                               | 113 ++++++++++++++++++++++++++++
 sub-process.h                               |  46 +++++++++++
 4 files changed, 215 insertions(+)
 create mode 100644 Documentation/technical/api-sub-process.txt
 create mode 100644 sub-process.c
 create mode 100644 sub-process.h

diff --git a/Documentation/technical/api-sub-process.txt b/Documentation/technical/api-sub-process.txt
new file mode 100644
index 0000000000..8471875611
--- /dev/null
+++ b/Documentation/technical/api-sub-process.txt
@@ -0,0 +1,55 @@
+sub-process API
+===============
+
+The sub-process API makes it possible to run background sub-processes
+that should run until the git command exits and communicate with it
+through stdin and stdout.  This reduces the overhead of having to fork
+a new process each time it needs to be communicated with.
+
+The sub-processes are kept in a hashmap by command name and looked up
+via the subprocess_find_entry function.  If an existing instance can not
+be found then a new process should be created and started.  When the
+parent git command terminates, all sub-processes are also terminated.
+
+This API is based on the run-command API.
+
+Data structures
+---------------
+
+* `struct subprocess_entry`
+
+The sub-process structure.  Members should not be accessed directly.
+
+Types
+-----
+
+'int(*subprocess_start_fn)(struct subprocess_entry *entry)'::
+
+	User-supplied function to initialize the sub-process.  This is
+	typically used to negoiate the interface version and capabilities.
+
+
+Functions
+---------
+
+`subprocess_start`::
+
+	Start a subprocess and add it to the subprocess hashmap.
+
+`subprocess_stop`::
+
+	Kill a subprocess and remove it from the subprocess hashmap.
+
+`subprocess_find_entry`::
+
+	Find a subprocess in the subprocess hashmap.
+
+`subprocess_get_child_process`::
+
+	Get the underlying `struct child_process` from a subprocess.
+
+`subprocess_read_status`::
+
+	Helper function to read packets looking for the last "status=<foo>"
+	key/value pair.
+
diff --git a/Makefile b/Makefile
index a5a11e721a..8afe733092 100644
--- a/Makefile
+++ b/Makefile
@@ -830,6 +830,7 @@ LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
 LIB_OBJS += submodule.o
 LIB_OBJS += submodule-config.o
+LIB_OBJS += sub-process.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += tempfile.o
diff --git a/sub-process.c b/sub-process.c
new file mode 100644
index 0000000000..02050b6867
--- /dev/null
+++ b/sub-process.c
@@ -0,0 +1,113 @@
+/*
+ * Generic implementation of background process infrastructure.
+ */
+#include "sub-process.h"
+#include "sigchain.h"
+#include "pkt-line.h"
+
+static int subprocess_map_initialized;
+static struct hashmap subprocess_map;
+
+static int name2process_cmp(const struct subprocess_entry *e1,
+		const struct subprocess_entry *e2, const void *unused)
+{
+	return strcmp(e1->cmd, e2->cmd);
+}
+
+static void subprocess_exit_handler(struct child_process *process)
+{
+	sigchain_push(SIGPIPE, SIG_IGN);
+	/* Closing the pipe signals the filter to initiate a shutdown. */
+	close(process->in);
+	close(process->out);
+	sigchain_pop(SIGPIPE);
+	/* Finish command will wait until the shutdown is complete. */
+	finish_command(process);
+}
+
+int subprocess_start(struct subprocess_entry *entry, const char *cmd,
+		subprocess_start_fn startfn)
+{
+	int err;
+	const char *argv[] = { cmd, NULL };
+
+	if (!subprocess_map_initialized) {
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)name2process_cmp, 0);
+		subprocess_map_initialized = 1;
+	}
+
+	entry->cmd = cmd;
+
+	child_process_init(&entry->process);
+	entry->process.argv = argv;
+	entry->process.use_shell = 1;
+	entry->process.in = -1;
+	entry->process.out = -1;
+	entry->process.clean_on_exit = 1;
+	entry->process.clean_on_exit_handler = subprocess_exit_handler;
+
+	err = start_command(&entry->process);
+	if (err) {
+		error("cannot fork to run sub-process '%s'", entry->cmd);
+		return err;
+	}
+
+	err = startfn(entry);
+	if (err) {
+		error("initialization for sub-process '%s' failed", entry->cmd);
+		subprocess_stop(entry);
+		return err;
+	}
+
+	hashmap_entry_init(entry, strhash(entry->cmd));
+	hashmap_add(&subprocess_map, entry);
+
+	return 0;
+}
+
+void subprocess_stop(struct subprocess_entry *entry)
+{
+	if (!entry)
+		return;
+
+	entry->process.clean_on_exit = 0;
+	kill(entry->process.pid, SIGTERM);
+	finish_command(&entry->process);
+
+	hashmap_remove(&subprocess_map, entry, NULL);
+}
+
+struct subprocess_entry *subprocess_find_entry(const char *cmd)
+{
+	struct subprocess_entry key;
+
+	if (!subprocess_map_initialized) {
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)name2process_cmp, 0);
+		subprocess_map_initialized = 1;
+		return NULL;
+	}
+
+	hashmap_entry_init(&key, strhash(cmd));
+	key.cmd = cmd;
+	return hashmap_get(&subprocess_map, &key, NULL);
+}
+
+void subprocess_read_status(int fd, struct strbuf *status)
+{
+	struct strbuf **pair;
+	char *line;
+	for (;;) {
+		line = packet_read_line(fd, NULL);
+		if (!line)
+			break;
+		pair = strbuf_split_str(line, '=', 2);
+		if (pair[0] && pair[0]->len && pair[1]) {
+			/* the last "status=<foo>" line wins */
+			if (!strcmp(pair[0]->buf, "status=")) {
+				strbuf_reset(status);
+				strbuf_addbuf(status, pair[1]);
+			}
+		}
+		strbuf_list_free(pair);
+	}
+}
diff --git a/sub-process.h b/sub-process.h
new file mode 100644
index 0000000000..235f1e5fa3
--- /dev/null
+++ b/sub-process.h
@@ -0,0 +1,46 @@
+#ifndef SUBPROCESS_H
+#define SUBPROCESS_H
+
+#include "git-compat-util.h"
+#include "hashmap.h"
+#include "run-command.h"
+
+/*
+ * Generic implementation of background process infrastructure.
+ * See Documentation/technical/api-background-process.txt.
+ */
+
+ /* data structures */
+
+struct subprocess_entry {
+	struct hashmap_entry ent; /* must be the first member! */
+	struct child_process process;
+	const char *cmd;
+};
+
+/* subprocess functions */
+
+typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
+int subprocess_start(struct subprocess_entry *entry, const char *cmd,
+		subprocess_start_fn startfn);
+
+void subprocess_stop(struct subprocess_entry *entry);
+
+struct subprocess_entry *subprocess_find_entry(const char *cmd);
+
+/* subprocess helper functions */
+
+static inline struct child_process *subprocess_get_child_process(
+		struct subprocess_entry *entry)
+{
+	return &entry->process;
+}
+
+/*
+ * Helper function that will read packets looking for "status=<foo>"
+ * key/value pairs and return the value from the last "status" packet
+ */
+
+void subprocess_read_status(int fd, struct strbuf *status);
+
+#endif
-- 
2.12.0.gvfs.1.42.g0b7328eac2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 3/3] convert: use new sub-process module for filter processes
  2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart
  2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart
  2017-03-22 16:52 ` [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module Ben Peart
@ 2017-03-22 16:52 ` Ben Peart
  2017-03-25 11:59 ` [PATCH v1 0/3] Add support for downloading blobs on demand Duy Nguyen
  3 siblings, 0 replies; 12+ messages in thread
From: Ben Peart @ 2017-03-22 16:52 UTC (permalink / raw)
  To: git; +Cc: benpeart, christian.couder, larsxschneider

The filter.<driver>.process code was refactored into a separate
“sub-process” module to facilitate reuse in future patch series. This
centralizes the logic in a single place to simplify maintenance. As a
side benefit, it also improves the readability of convert.c.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 convert.c | 154 ++++++++++----------------------------------------------------
 1 file changed, 24 insertions(+), 130 deletions(-)

diff --git a/convert.c b/convert.c
index 8d652bf27c..ca8e0d26c0 100644
--- a/convert.c
+++ b/convert.c
@@ -4,6 +4,7 @@
 #include "quote.h"
 #include "sigchain.h"
 #include "pkt-line.h"
+#include "sub-process.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -497,132 +498,30 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
 #define CAP_SMUDGE   (1u<<1)
 
 struct cmd2process {
-	struct hashmap_entry ent; /* must be the first member! */
+	struct subprocess_entry subprocess;
 	unsigned int supported_capabilities;
-	const char *cmd;
-	struct child_process process;
 };
 
-static int cmd_process_map_initialized;
-static struct hashmap cmd_process_map;
-
-static int cmd2process_cmp(const struct cmd2process *e1,
-			   const struct cmd2process *e2,
-			   const void *unused)
-{
-	return strcmp(e1->cmd, e2->cmd);
-}
-
-static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
-{
-	struct cmd2process key;
-	hashmap_entry_init(&key, strhash(cmd));
-	key.cmd = cmd;
-	return hashmap_get(hashmap, &key, NULL);
-}
-
-static int packet_write_list(int fd, const char *line, ...)
+static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 {
-	va_list args;
 	int err;
-	va_start(args, line);
-	for (;;) {
-		if (!line)
-			break;
-		if (strlen(line) > LARGE_PACKET_DATA_MAX)
-			return -1;
-		err = packet_write_fmt_gently(fd, "%s\n", line);
-		if (err)
-			return err;
-		line = va_arg(args, const char*);
-	}
-	va_end(args);
-	return packet_flush_gently(fd);
-}
-
-static void read_multi_file_filter_status(int fd, struct strbuf *status)
-{
-	struct strbuf **pair;
-	char *line;
-	for (;;) {
-		line = packet_read_line(fd, NULL);
-		if (!line)
-			break;
-		pair = strbuf_split_str(line, '=', 2);
-		if (pair[0] && pair[0]->len && pair[1]) {
-			/* the last "status=<foo>" line wins */
-			if (!strcmp(pair[0]->buf, "status=")) {
-				strbuf_reset(status);
-				strbuf_addbuf(status, pair[1]);
-			}
-		}
-		strbuf_list_free(pair);
-	}
-}
-
-static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
-{
-	if (!entry)
-		return;
-
-	entry->process.clean_on_exit = 0;
-	kill(entry->process.pid, SIGTERM);
-	finish_command(&entry->process);
-
-	hashmap_remove(hashmap, entry, NULL);
-	free(entry);
-}
-
-static void stop_multi_file_filter(struct child_process *process)
-{
-	sigchain_push(SIGPIPE, SIG_IGN);
-	/* Closing the pipe signals the filter to initiate a shutdown. */
-	close(process->in);
-	close(process->out);
-	sigchain_pop(SIGPIPE);
-	/* Finish command will wait until the shutdown is complete. */
-	finish_command(process);
-}
-
-static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
-{
-	int err;
-	struct cmd2process *entry;
+	struct cmd2process *entry = (struct cmd2process *)subprocess;
 	struct child_process *process;
-	const char *argv[] = { cmd, NULL };
 	struct string_list cap_list = STRING_LIST_INIT_NODUP;
 	char *cap_buf;
 	const char *cap_name;
 
-	entry = xmalloc(sizeof(*entry));
-	entry->cmd = cmd;
-	entry->supported_capabilities = 0;
-	process = &entry->process;
-
-	child_process_init(process);
-	process->argv = argv;
-	process->use_shell = 1;
-	process->in = -1;
-	process->out = -1;
-	process->clean_on_exit = 1;
-	process->clean_on_exit_handler = stop_multi_file_filter;
-
-	if (start_command(process)) {
-		error("cannot fork to run external filter '%s'", cmd);
-		return NULL;
-	}
-
-	hashmap_entry_init(entry, strhash(cmd));
+	process = subprocess_get_child_process(&entry->subprocess);
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
-	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
+	err = packet_write_list_gently(process->in, "git-filter-client", "version=2", NULL);
 	if (err)
 		goto done;
 
 	err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
 	if (err) {
-		error("external filter '%s' does not support filter protocol version 2", cmd);
+		error("external filter '%s' does not support filter protocol version 2", subprocess->cmd);
 		goto done;
 	}
 	err = strcmp(packet_read_line(process->out, NULL), "version=2");
@@ -632,7 +531,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	if (err)
 		goto done;
 
-	err = packet_write_list(process->in, "capability=clean", "capability=smudge", NULL);
+	err = packet_write_list_gently(process->in, "capability=clean", "capability=smudge", NULL);
 
 	for (;;) {
 		cap_buf = packet_read_line(process->out, NULL);
@@ -651,7 +550,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 		} else {
 			warning(
 				"external filter '%s' requested unsupported filter capability '%s'",
-				cmd, cap_name
+				subprocess->cmd, cap_name
 			);
 		}
 
@@ -661,14 +560,10 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 done:
 	sigchain_pop(SIGPIPE);
 
-	if (err || errno == EPIPE) {
-		error("initialization for external filter '%s' failed", cmd);
-		kill_multi_file_filter(hashmap, entry);
-		return NULL;
-	}
+	if (err || errno == EPIPE)
+		err = err ? err : errno;
 
-	hashmap_add(hashmap, entry);
-	return entry;
+	return err;
 }
 
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
@@ -682,22 +577,20 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	struct strbuf filter_status = STRBUF_INIT;
 	const char *filter_type;
 
-	if (!cmd_process_map_initialized) {
-		cmd_process_map_initialized = 1;
-		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
-		entry = NULL;
-	} else {
-		entry = find_multi_file_filter_entry(&cmd_process_map, cmd);
-	}
+	entry = (struct cmd2process *)subprocess_find_entry(cmd);
 
 	fflush(NULL);
 
 	if (!entry) {
-		entry = start_multi_file_filter(&cmd_process_map, cmd);
-		if (!entry)
+		entry = xmalloc(sizeof(*entry));
+		entry->supported_capabilities = 0;
+
+		if (subprocess_start(&entry->subprocess, cmd, start_multi_file_filter_fn)) {
+			free(entry);
 			return 0;
+		}
 	}
-	process = &entry->process;
+	process = subprocess_get_child_process(&entry->subprocess);
 
 	if (!(wanted_capability & entry->supported_capabilities))
 		return 0;
@@ -737,7 +630,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	read_multi_file_filter_status(process->out, &filter_status);
+	subprocess_read_status(process->out, &filter_status);
 	err = strcmp(filter_status.buf, "success");
 	if (err)
 		goto done;
@@ -746,7 +639,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	read_multi_file_filter_status(process->out, &filter_status);
+	subprocess_read_status(process->out, &filter_status);
 	err = strcmp(filter_status.buf, "success");
 
 done:
@@ -768,7 +661,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 			 * Force shutdown and restart if another blob requires filtering.
 			 */
 			error("external filter '%s' failed", cmd);
-			kill_multi_file_filter(&cmd_process_map, entry);
+			subprocess_stop((struct subprocess_entry *)entry);
+			free(entry);
 		}
 	} else {
 		strbuf_swap(dst, &nbuf);
-- 
2.12.0.gvfs.1.42.g0b7328eac2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] pkt-line: add packet_write_list_gently()
  2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart
@ 2017-03-22 20:21   ` Junio C Hamano
  2017-03-24 12:34     ` Ben Peart
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2017-03-22 20:21 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, benpeart, christian.couder, larsxschneider

Ben Peart <peartben@gmail.com> writes:

> Add packet_write_list_gently() which writes multiple lines in a single
> call and then calls packet_flush_gently(). This is used later in this
> patch series.

I can see how it would be convenient to have a function like this.
I'd name it without _gently(), though.  We call something _gently()
when we initially only have a function that dies hard on error and
later want to have a variant that returns an error for the caller to
handle.  You are starting without a dying variant (which is probably
a preferable way to structure the API).

Also I am hesitant to take a function that does not take any "list"
type argument and yet calls itself "write_list".  IOW, I'd expect
something like these

	write_list(int fd, const char **lines);
	write_list(int fd, struct string_list *lines);

given that name, and not "varargs, each of which is a line".  I am
tempted to suggest

	packet_writel(int fd, const char *line, ...);

noticing similarity with execl(), but perhaps others may be able to
come up with better names.

> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>  pkt-line.c | 19 +++++++++++++++++++
>  pkt-line.h |  1 +
>  2 files changed, 20 insertions(+)
>
> diff --git a/pkt-line.c b/pkt-line.c
> index d4b6bfe076..fccdac1352 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
>  	return status;
>  }
>  
> +int packet_write_list_gently(int fd, const char *line, ...)
> +{
> +	va_list args;
> +	int err;
> +	va_start(args, line);
> +	for (;;) {
> +		if (!line)
> +			break;
> +		if (strlen(line) > LARGE_PACKET_DATA_MAX)
> +			return -1;
> +		err = packet_write_fmt_gently(fd, "%s\n", line);
> +		if (err)
> +			return err;
> +		line = va_arg(args, const char*);
> +	}
> +	va_end(args);
> +	return packet_flush_gently(fd);
> +}
> +
>  static int packet_write_gently(const int fd_out, const char *buf, size_t size)
>  {
>  	static char packet_write_buffer[LARGE_PACKET_MAX];
> diff --git a/pkt-line.h b/pkt-line.h
> index 18eac64830..3674d04509 100644
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf);
>  void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
>  int packet_flush_gently(int fd);
>  int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
> +int packet_write_list_gently(int fd, const char *line, ...);
>  int write_packetized_from_fd(int fd_in, int fd_out);
>  int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module
  2017-03-22 16:52 ` [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module Ben Peart
@ 2017-03-23  6:16   ` Junio C Hamano
  2017-03-24 12:39     ` Ben Peart
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2017-03-23  6:16 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, benpeart, christian.couder, larsxschneider

Ben Peart <peartben@gmail.com> writes:

> This code is refactored from:
>
> 	Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16)
> 	keeps the external process running and processes all commands

I am afraid that this organization of the patch series is making it
harder than necessary to review, by duplicating the same code first
_WITH_ renaming of symbols, etc., in this step and then updaing the
original callers while removing the now-stale original callee
implementation in a separate next step.

Would it perhaps make it clearer to understand what is going on if
you instead started to update the code in convert.c in place to make
it more suitable fro reuse as the patch title advertises, and then
move the resulting cleaned-up code to a separate file, I wonder.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v1 1/3] pkt-line: add packet_write_list_gently()
  2017-03-22 20:21   ` Junio C Hamano
@ 2017-03-24 12:34     ` Ben Peart
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Peart @ 2017-03-24 12:34 UTC (permalink / raw)
  To: Junio C Hamano, Ben Peart
  Cc: git@vger.kernel.org, christian.couder@gmail.com,
	larsxschneider@gmail.com

> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@pobox.com]
> Sent: Wednesday, March 22, 2017 4:21 PM
> To: Ben Peart <peartben@gmail.com>
> Cc: git@vger.kernel.org; Ben Peart <Ben.Peart@microsoft.com>;
> christian.couder@gmail.com; larsxschneider@gmail.com
> Subject: Re: [PATCH v1 1/3] pkt-line: add packet_write_list_gently()
> 
> Ben Peart <peartben@gmail.com> writes:
> 
> > Add packet_write_list_gently() which writes multiple lines in a single
> > call and then calls packet_flush_gently(). This is used later in this
> > patch series.
> 
> I can see how it would be convenient to have a function like this.
> I'd name it without _gently(), though.  We call something _gently() when we
> initially only have a function that dies hard on error and later want to have a
> variant that returns an error for the caller to handle.  You are starting
> without a dying variant (which is probably a preferable way to structure the
> API).
> 
> Also I am hesitant to take a function that does not take any "list"
> type argument and yet calls itself "write_list".  IOW, I'd expect something like
> these
> 
> 	write_list(int fd, const char **lines);
> 	write_list(int fd, struct string_list *lines);
> 
> given that name, and not "varargs, each of which is a line".  I am tempted to
> suggest
> 
> 	packet_writel(int fd, const char *line, ...);
> 
> noticing similarity with execl(), but perhaps others may be able to come up
> with better names.

Given there haven't been any better names suggested, I'll go ahead and update it to be packet_writel.

> 
> > Signed-off-by: Ben Peart <benpeart@microsoft.com>
> > ---
> >  pkt-line.c | 19 +++++++++++++++++++
> >  pkt-line.h |  1 +
> >  2 files changed, 20 insertions(+)
> >
> > diff --git a/pkt-line.c b/pkt-line.c
> > index d4b6bfe076..fccdac1352 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char
> *fmt, ...)
> >  	return status;
> >  }
> >
> > +int packet_write_list_gently(int fd, const char *line, ...) {
> > +	va_list args;
> > +	int err;
> > +	va_start(args, line);
> > +	for (;;) {
> > +		if (!line)
> > +			break;
> > +		if (strlen(line) > LARGE_PACKET_DATA_MAX)
> > +			return -1;
> > +		err = packet_write_fmt_gently(fd, "%s\n", line);
> > +		if (err)
> > +			return err;
> > +		line = va_arg(args, const char*);
> > +	}
> > +	va_end(args);
> > +	return packet_flush_gently(fd);
> > +}
> > +
> >  static int packet_write_gently(const int fd_out, const char *buf,
> > size_t size)  {
> >  	static char packet_write_buffer[LARGE_PACKET_MAX];
> > diff --git a/pkt-line.h b/pkt-line.h
> > index 18eac64830..3674d04509 100644
> > --- a/pkt-line.h
> > +++ b/pkt-line.h
> > @@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf);  void
> > packet_buf_write(struct strbuf *buf, const char *fmt, ...)
> > __attribute__((format (printf, 2, 3)));  int packet_flush_gently(int
> > fd);  int packet_write_fmt_gently(int fd, const char *fmt, ...)
> > __attribute__((format (printf, 2, 3)));
> > +int packet_write_list_gently(int fd, const char *line, ...);
> >  int write_packetized_from_fd(int fd_in, int fd_out);  int
> > write_packetized_from_buf(const char *src_in, size_t len, int fd_out);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module
  2017-03-23  6:16   ` Junio C Hamano
@ 2017-03-24 12:39     ` Ben Peart
  2017-03-24 16:10       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Peart @ 2017-03-24 12:39 UTC (permalink / raw)
  To: Junio C Hamano, Ben Peart
  Cc: git@vger.kernel.org, christian.couder@gmail.com,
	larsxschneider@gmail.com

> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@pobox.com]
> Sent: Thursday, March 23, 2017 2:17 AM
> To: Ben Peart <peartben@gmail.com>
> Cc: git@vger.kernel.org; Ben Peart <Ben.Peart@microsoft.com>;
> christian.couder@gmail.com; larsxschneider@gmail.com
> Subject: Re: [PATCH v1 2/3] sub-process: refactor the filter process code into
> a reusable module
> 
> Ben Peart <peartben@gmail.com> writes:
> 
> > This code is refactored from:
> >
> > 	Commit edcc85814c ("convert: add filter.<driver>.process option",
> 2016-10-16)
> > 	keeps the external process running and processes all commands
> 
> I am afraid that this organization of the patch series is making it harder than
> necessary to review, by duplicating the same code first _WITH_ renaming of
> symbols, etc., in this step and then updaing the original callers while
> removing the now-stale original callee implementation in a separate next
> step.
> 
> Would it perhaps make it clearer to understand what is going on if you
> instead started to update the code in convert.c in place to make it more
> suitable fro reuse as the patch title advertises, and then move the resulting
> cleaned-up code to a separate file, I wonder.

I'm not entirely sure what you're asking for here but I think the
challenge may be that by splitting the refactoring into two separate
commits, it's hard to see the before and after when looking at the
commit in isolation.  By splitting them, it's more "a bunch of new code"
followed by "using new code" than it is a refactoring of existing code.

How about I squash the last two patches together so that its more
apparent that it's just a refactoring of existing code with the before
and after in a single patch?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module
  2017-03-24 12:39     ` Ben Peart
@ 2017-03-24 16:10       ` Junio C Hamano
  2017-03-24 17:15         ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2017-03-24 16:10 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ben Peart, git@vger.kernel.org, christian.couder@gmail.com,
	larsxschneider@gmail.com

Ben Peart <Ben.Peart@microsoft.com> writes:

> How about I squash the last two patches together so that its more
> apparent that it's just a refactoring of existing code with the before
> and after in a single patch?

I do not think making a pair of patches, each already does too much,
into one patch would make things easier to chew.  It would make it
even harder to understand.

I offhand do not know how many patches the ideal split of this
series should be, but I know it shouldn't be one.

More likely that it should be N+1, and I know the last one should be
"Now all data structures and variables have been renamed, all helper
functions have been renamed and generalized for reuse while the
original code these helper functions came from have been updated to
call them, we can move them from convert.c to sub-process.[ch];
let's create these two files".  This last step will be moving lines
from an old file to two new files, almost without any modification
(of course, "#ifndef SUB_PROCESS_H", "#include sub-process.h", etc.
would be new lines so this will not be strictnly pure movement, but
all readers of this message are intelligent enough to understand
what I mean).

What would the other N patches before the last step should contain,
then?

For example, your name2process_cmp() is just a renamed version of
the original.  Some of its callers may also just straight rename.
These "only renaming, doing nothing else" changes can go into a
single large patch and as long as you mark as such, the review
burden can be lessened.  It would be a "boring mechanical" part of
the whole thing.

On the other hand, your subprocess_start() shares quite a lot with
the original start_multi_file_filter() but the latter does some more
than the former (because the latter is more specific to the need of
convert.c).  

A patch series must be structured to make it easier to review such
changes, because they are the likely places where mistakes happen
(both in implementation and in the helper API design).  To get from
the original start_multi_file_filter() to a new version that calls
subprocess_start(), such a patch would _MOVE_ bunch of lines from
the old function to the new function, the new function may acquire
its own unique new lines, the old function would lose these moved
lines but instead call the new function, etc.  You can call it as "I
refactored subprocess_start() out of start_multi_file_filter() and
updated the latter to call the former." and that would be a single
logical unit of the change.  To make patches easier to understand,
these logical unit would want to be reasonably small.

And for something of sub-process.[ch]'s size, I suspect that it
would have more than 1 such logical unit to be independently
refactored, so in total, I would suspect the series would become

    1 (boring mechanical part) +
    2 or more (refactoring) +
    1 (final movement)

i.e. 4 or more patches?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module
  2017-03-24 16:10       ` Junio C Hamano
@ 2017-03-24 17:15         ` Junio C Hamano
  2017-03-27 22:04           ` Ben Peart
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2017-03-24 17:15 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ben Peart, git@vger.kernel.org, christian.couder@gmail.com,
	larsxschneider@gmail.com

Junio C Hamano <gitster@pobox.com> writes:

> ...
> And for something of sub-process.[ch]'s size, I suspect that it
> would have more than 1 such logical unit to be independently
> refactored, so in total, I would suspect the series would become
>
>     1 (boring mechanical part) +
>     2 or more (refactoring) +
>     1 (final movement)
>
> i.e. 4 or more patches?

To avoid confusion (although readers may not require), even though I
explained "boring mechanical part" first and "refactoring", that was
purely for explanation.  

In practice, I would expect that it would be easier to both do and
review if refactoring is done with the original name.  

A function that will keep its name in the final result (e.g.
start_multi_file_filter()) will call a new and more generic helper
function.  The new helper may start using the new name from the
get-go (e.g. subprocess_start()), but the data types it shares with
the original part of the code (e.g. 'struct cmd2process') may still
be using the original name.

And after completing "2 or more" refactoring would be a good place
to do the remaining "boring mechanical rename".  IOW, the count
above could be

     2 or more (refactoring) +
     1 (boring mechanical part) +
     1 (final movement)

and I didn't mean to say that you need to rename first.  What we
want is "if you need to have a single large patch that cannot be
split, see if you can make it purely mechanical.", as a single large
patch that is _not_ mechanical conversion is the worst kind of patch
for reviewers.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 0/3] Add support for downloading blobs on demand
  2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart
                   ` (2 preceding siblings ...)
  2017-03-22 16:52 ` [PATCH v1 3/3] convert: use new sub-process module for filter processes Ben Peart
@ 2017-03-25 11:59 ` Duy Nguyen
  3 siblings, 0 replies; 12+ messages in thread
From: Duy Nguyen @ 2017-03-25 11:59 UTC (permalink / raw)
  To: Ben Peart; +Cc: Git Mailing List, benpeart, Christian Couder, Lars Schneider

On Wed, Mar 22, 2017 at 11:52 PM, Ben Peart <peartben@gmail.com> wrote:
> We have a couple of patch series we’re working on (ObjectDB/Read-Object,
> Watchman integration)

Oops, sorry. I should be reworking the index-helper series for
watchman support, but I haven't time for it. Yes I'm also eyeing Lars
code as the communication channel for any external helper (which in
turn can talk to watchman or whatever). I guess I'll let you do it
then.
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module
  2017-03-24 17:15         ` Junio C Hamano
@ 2017-03-27 22:04           ` Ben Peart
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Peart @ 2017-03-27 22:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ben Peart, git@vger.kernel.org, christian.couder@gmail.com,
	larsxschneider@gmail.com

> Junio C Hamano <gitster@pobox.com> writes:
> 
> 
> To avoid confusion (although readers may not require), even though I
> explained "boring mechanical part" first and "refactoring", that was purely
> for explanation.
> 
> In practice, I would expect that it would be easier to both do and review if
> refactoring is done with the original name.
> 
> A function that will keep its name in the final result (e.g.
> start_multi_file_filter()) will call a new and more generic helper function.  The
> new helper may start using the new name from the get-go (e.g.
> subprocess_start()), but the data types it shares with the original part of the
> code (e.g. 'struct cmd2process') may still be using the original name.
> 
> And after completing "2 or more" refactoring would be a good place to do
> the remaining "boring mechanical rename".  IOW, the count above could be
> 
>      2 or more (refactoring) +
>      1 (boring mechanical part) +
>      1 (final movement)
> 
> and I didn't mean to say that you need to rename first.  What we want is "if
> you need to have a single large patch that cannot be split, see if you can
> make it purely mechanical.", as a single large patch that is _not_ mechanical
> conversion is the worst kind of patch for reviewers.

Thanks, I think I better understand what you are looking for in a patch series.  In short, any non trivial refactoring should take place within the same file using 1 or more patches to keep each patch as simple as possible.  Any large or cross file refactoring should be made as boring/mechanical as possible. This is to make it easier to see any complex changes within a single format patch section and avoid having to look between two file patches to ensure the refactoring didn't unintentionally change behavior.

I'll throw out my current refactoring and do it again attempting to follow these guidelines as soon as I can find the time ($DAYJOB tends to take priority over my open source work).


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-03-27 22:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart
2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart
2017-03-22 20:21   ` Junio C Hamano
2017-03-24 12:34     ` Ben Peart
2017-03-22 16:52 ` [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module Ben Peart
2017-03-23  6:16   ` Junio C Hamano
2017-03-24 12:39     ` Ben Peart
2017-03-24 16:10       ` Junio C Hamano
2017-03-24 17:15         ` Junio C Hamano
2017-03-27 22:04           ` Ben Peart
2017-03-22 16:52 ` [PATCH v1 3/3] convert: use new sub-process module for filter processes Ben Peart
2017-03-25 11:59 ` [PATCH v1 0/3] Add support for downloading blobs on demand Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).