* [PATCH v1 0/3] Add support for downloading blobs on demand @ 2017-03-22 16:52 Ben Peart 2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Ben Peart @ 2017-03-22 16:52 UTC (permalink / raw) To: git; +Cc: benpeart, christian.couder, larsxschneider We have a couple of patch series we’re working on (ObjectDB/Read-Object, Watchman integration) where we could use the ability to have a background process running that can accept multiple commands thus avoiding the overhead of spawning a new process for every command. The ability to do this was added in: Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16) keeps the external process running and processes all commands but it is integrated into the convert code. This patch series takes the code from convert.c and refactors it into a separate “sub-process” module so that we can centralize and reuse this logic in other areas. Once the code was refactored into sub-process, convert.c was updated to use the new module. Ben Peart (3): pkt-line: add packet_write_list_gently() sub-process: refactor the filter process code into a reusable module convert: use new sub-process module for filter processes Documentation/technical/api-sub-process.txt | 55 ++++++++++ Makefile | 1 + convert.c | 154 +++++----------------------- pkt-line.c | 19 ++++ pkt-line.h | 1 + sub-process.c | 113 ++++++++++++++++++++ sub-process.h | 46 +++++++++ 7 files changed, 259 insertions(+), 130 deletions(-) create mode 100644 Documentation/technical/api-sub-process.txt create mode 100644 sub-process.c create mode 100644 sub-process.h -- 2.12.0.gvfs.1.42.g0b7328eac2 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v1 1/3] pkt-line: add packet_write_list_gently() 2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart @ 2017-03-22 16:52 ` Ben Peart 2017-03-22 20:21 ` Junio C Hamano 2017-03-22 16:52 ` [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module Ben Peart ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Ben Peart @ 2017-03-22 16:52 UTC (permalink / raw) To: git; +Cc: benpeart, christian.couder, larsxschneider Add packet_write_list_gently() which writes multiple lines in a single call and then calls packet_flush_gently(). This is used later in this patch series. Signed-off-by: Ben Peart <benpeart@microsoft.com> --- pkt-line.c | 19 +++++++++++++++++++ pkt-line.h | 1 + 2 files changed, 20 insertions(+) diff --git a/pkt-line.c b/pkt-line.c index d4b6bfe076..fccdac1352 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...) return status; } +int packet_write_list_gently(int fd, const char *line, ...) +{ + va_list args; + int err; + va_start(args, line); + for (;;) { + if (!line) + break; + if (strlen(line) > LARGE_PACKET_DATA_MAX) + return -1; + err = packet_write_fmt_gently(fd, "%s\n", line); + if (err) + return err; + line = va_arg(args, const char*); + } + va_end(args); + return packet_flush_gently(fd); +} + static int packet_write_gently(const int fd_out, const char *buf, size_t size) { static char packet_write_buffer[LARGE_PACKET_MAX]; diff --git a/pkt-line.h b/pkt-line.h index 18eac64830..3674d04509 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf); void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3))); int packet_flush_gently(int fd); int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3))); +int packet_write_list_gently(int fd, const char *line, ...); int write_packetized_from_fd(int fd_in, int fd_out); int write_packetized_from_buf(const char *src_in, size_t len, int fd_out); -- 2.12.0.gvfs.1.42.g0b7328eac2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v1 1/3] pkt-line: add packet_write_list_gently() 2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart @ 2017-03-22 20:21 ` Junio C Hamano 2017-03-24 12:34 ` Ben Peart 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2017-03-22 20:21 UTC (permalink / raw) To: Ben Peart; +Cc: git, benpeart, christian.couder, larsxschneider Ben Peart <peartben@gmail.com> writes: > Add packet_write_list_gently() which writes multiple lines in a single > call and then calls packet_flush_gently(). This is used later in this > patch series. I can see how it would be convenient to have a function like this. I'd name it without _gently(), though. We call something _gently() when we initially only have a function that dies hard on error and later want to have a variant that returns an error for the caller to handle. You are starting without a dying variant (which is probably a preferable way to structure the API). Also I am hesitant to take a function that does not take any "list" type argument and yet calls itself "write_list". IOW, I'd expect something like these write_list(int fd, const char **lines); write_list(int fd, struct string_list *lines); given that name, and not "varargs, each of which is a line". I am tempted to suggest packet_writel(int fd, const char *line, ...); noticing similarity with execl(), but perhaps others may be able to come up with better names. > Signed-off-by: Ben Peart <benpeart@microsoft.com> > --- > pkt-line.c | 19 +++++++++++++++++++ > pkt-line.h | 1 + > 2 files changed, 20 insertions(+) > > diff --git a/pkt-line.c b/pkt-line.c > index d4b6bfe076..fccdac1352 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...) > return status; > } > > +int packet_write_list_gently(int fd, const char *line, ...) > +{ > + va_list args; > + int err; > + va_start(args, line); > + for (;;) { > + if (!line) > + break; > + if (strlen(line) > LARGE_PACKET_DATA_MAX) > + return -1; > + err = packet_write_fmt_gently(fd, "%s\n", line); > + if (err) > + return err; > + line = va_arg(args, const char*); > + } > + va_end(args); > + return packet_flush_gently(fd); > +} > + > static int packet_write_gently(const int fd_out, const char *buf, size_t size) > { > static char packet_write_buffer[LARGE_PACKET_MAX]; > diff --git a/pkt-line.h b/pkt-line.h > index 18eac64830..3674d04509 100644 > --- a/pkt-line.h > +++ b/pkt-line.h > @@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf); > void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3))); > int packet_flush_gently(int fd); > int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3))); > +int packet_write_list_gently(int fd, const char *line, ...); > int write_packetized_from_fd(int fd_in, int fd_out); > int write_packetized_from_buf(const char *src_in, size_t len, int fd_out); ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH v1 1/3] pkt-line: add packet_write_list_gently() 2017-03-22 20:21 ` Junio C Hamano @ 2017-03-24 12:34 ` Ben Peart 0 siblings, 0 replies; 12+ messages in thread From: Ben Peart @ 2017-03-24 12:34 UTC (permalink / raw) To: Junio C Hamano, Ben Peart Cc: git@vger.kernel.org, christian.couder@gmail.com, larsxschneider@gmail.com > -----Original Message----- > From: Junio C Hamano [mailto:gitster@pobox.com] > Sent: Wednesday, March 22, 2017 4:21 PM > To: Ben Peart <peartben@gmail.com> > Cc: git@vger.kernel.org; Ben Peart <Ben.Peart@microsoft.com>; > christian.couder@gmail.com; larsxschneider@gmail.com > Subject: Re: [PATCH v1 1/3] pkt-line: add packet_write_list_gently() > > Ben Peart <peartben@gmail.com> writes: > > > Add packet_write_list_gently() which writes multiple lines in a single > > call and then calls packet_flush_gently(). This is used later in this > > patch series. > > I can see how it would be convenient to have a function like this. > I'd name it without _gently(), though. We call something _gently() when we > initially only have a function that dies hard on error and later want to have a > variant that returns an error for the caller to handle. You are starting > without a dying variant (which is probably a preferable way to structure the > API). > > Also I am hesitant to take a function that does not take any "list" > type argument and yet calls itself "write_list". IOW, I'd expect something like > these > > write_list(int fd, const char **lines); > write_list(int fd, struct string_list *lines); > > given that name, and not "varargs, each of which is a line". I am tempted to > suggest > > packet_writel(int fd, const char *line, ...); > > noticing similarity with execl(), but perhaps others may be able to come up > with better names. Given there haven't been any better names suggested, I'll go ahead and update it to be packet_writel. > > > Signed-off-by: Ben Peart <benpeart@microsoft.com> > > --- > > pkt-line.c | 19 +++++++++++++++++++ > > pkt-line.h | 1 + > > 2 files changed, 20 insertions(+) > > > > diff --git a/pkt-line.c b/pkt-line.c > > index d4b6bfe076..fccdac1352 100644 > > --- a/pkt-line.c > > +++ b/pkt-line.c > > @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char > *fmt, ...) > > return status; > > } > > > > +int packet_write_list_gently(int fd, const char *line, ...) { > > + va_list args; > > + int err; > > + va_start(args, line); > > + for (;;) { > > + if (!line) > > + break; > > + if (strlen(line) > LARGE_PACKET_DATA_MAX) > > + return -1; > > + err = packet_write_fmt_gently(fd, "%s\n", line); > > + if (err) > > + return err; > > + line = va_arg(args, const char*); > > + } > > + va_end(args); > > + return packet_flush_gently(fd); > > +} > > + > > static int packet_write_gently(const int fd_out, const char *buf, > > size_t size) { > > static char packet_write_buffer[LARGE_PACKET_MAX]; > > diff --git a/pkt-line.h b/pkt-line.h > > index 18eac64830..3674d04509 100644 > > --- a/pkt-line.h > > +++ b/pkt-line.h > > @@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf); void > > packet_buf_write(struct strbuf *buf, const char *fmt, ...) > > __attribute__((format (printf, 2, 3))); int packet_flush_gently(int > > fd); int packet_write_fmt_gently(int fd, const char *fmt, ...) > > __attribute__((format (printf, 2, 3))); > > +int packet_write_list_gently(int fd, const char *line, ...); > > int write_packetized_from_fd(int fd_in, int fd_out); int > > write_packetized_from_buf(const char *src_in, size_t len, int fd_out); ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module 2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart 2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart @ 2017-03-22 16:52 ` Ben Peart 2017-03-23 6:16 ` Junio C Hamano 2017-03-22 16:52 ` [PATCH v1 3/3] convert: use new sub-process module for filter processes Ben Peart 2017-03-25 11:59 ` [PATCH v1 0/3] Add support for downloading blobs on demand Duy Nguyen 3 siblings, 1 reply; 12+ messages in thread From: Ben Peart @ 2017-03-22 16:52 UTC (permalink / raw) To: git; +Cc: benpeart, christian.couder, larsxschneider Create a new sub-process module that can be used to reduce the cost of starting up a sub-process for multiple commands. It does this by keeping the external process running and processing all commands by communicating over standard input and standard output using the packet format (pkt-line) based protocol. Full documentation is contained in Documentation/technical/api-sub-process.txt. This code is refactored from: Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16) keeps the external process running and processes all commands Signed-off-by: Ben Peart <benpeart@microsoft.com> --- Documentation/technical/api-sub-process.txt | 55 ++++++++++++++ Makefile | 1 + sub-process.c | 113 ++++++++++++++++++++++++++++ sub-process.h | 46 +++++++++++ 4 files changed, 215 insertions(+) create mode 100644 Documentation/technical/api-sub-process.txt create mode 100644 sub-process.c create mode 100644 sub-process.h diff --git a/Documentation/technical/api-sub-process.txt b/Documentation/technical/api-sub-process.txt new file mode 100644 index 0000000000..8471875611 --- /dev/null +++ b/Documentation/technical/api-sub-process.txt @@ -0,0 +1,55 @@ +sub-process API +=============== + +The sub-process API makes it possible to run background sub-processes +that should run until the git command exits and communicate with it +through stdin and stdout. This reduces the overhead of having to fork +a new process each time it needs to be communicated with. + +The sub-processes are kept in a hashmap by command name and looked up +via the subprocess_find_entry function. If an existing instance can not +be found then a new process should be created and started. When the +parent git command terminates, all sub-processes are also terminated. + +This API is based on the run-command API. + +Data structures +--------------- + +* `struct subprocess_entry` + +The sub-process structure. Members should not be accessed directly. + +Types +----- + +'int(*subprocess_start_fn)(struct subprocess_entry *entry)':: + + User-supplied function to initialize the sub-process. This is + typically used to negoiate the interface version and capabilities. + + +Functions +--------- + +`subprocess_start`:: + + Start a subprocess and add it to the subprocess hashmap. + +`subprocess_stop`:: + + Kill a subprocess and remove it from the subprocess hashmap. + +`subprocess_find_entry`:: + + Find a subprocess in the subprocess hashmap. + +`subprocess_get_child_process`:: + + Get the underlying `struct child_process` from a subprocess. + +`subprocess_read_status`:: + + Helper function to read packets looking for the last "status=<foo>" + key/value pair. + diff --git a/Makefile b/Makefile index a5a11e721a..8afe733092 100644 --- a/Makefile +++ b/Makefile @@ -830,6 +830,7 @@ LIB_OBJS += streaming.o LIB_OBJS += string-list.o LIB_OBJS += submodule.o LIB_OBJS += submodule-config.o +LIB_OBJS += sub-process.o LIB_OBJS += symlinks.o LIB_OBJS += tag.o LIB_OBJS += tempfile.o diff --git a/sub-process.c b/sub-process.c new file mode 100644 index 0000000000..02050b6867 --- /dev/null +++ b/sub-process.c @@ -0,0 +1,113 @@ +/* + * Generic implementation of background process infrastructure. + */ +#include "sub-process.h" +#include "sigchain.h" +#include "pkt-line.h" + +static int subprocess_map_initialized; +static struct hashmap subprocess_map; + +static int name2process_cmp(const struct subprocess_entry *e1, + const struct subprocess_entry *e2, const void *unused) +{ + return strcmp(e1->cmd, e2->cmd); +} + +static void subprocess_exit_handler(struct child_process *process) +{ + sigchain_push(SIGPIPE, SIG_IGN); + /* Closing the pipe signals the filter to initiate a shutdown. */ + close(process->in); + close(process->out); + sigchain_pop(SIGPIPE); + /* Finish command will wait until the shutdown is complete. */ + finish_command(process); +} + +int subprocess_start(struct subprocess_entry *entry, const char *cmd, + subprocess_start_fn startfn) +{ + int err; + const char *argv[] = { cmd, NULL }; + + if (!subprocess_map_initialized) { + hashmap_init(&subprocess_map, (hashmap_cmp_fn)name2process_cmp, 0); + subprocess_map_initialized = 1; + } + + entry->cmd = cmd; + + child_process_init(&entry->process); + entry->process.argv = argv; + entry->process.use_shell = 1; + entry->process.in = -1; + entry->process.out = -1; + entry->process.clean_on_exit = 1; + entry->process.clean_on_exit_handler = subprocess_exit_handler; + + err = start_command(&entry->process); + if (err) { + error("cannot fork to run sub-process '%s'", entry->cmd); + return err; + } + + err = startfn(entry); + if (err) { + error("initialization for sub-process '%s' failed", entry->cmd); + subprocess_stop(entry); + return err; + } + + hashmap_entry_init(entry, strhash(entry->cmd)); + hashmap_add(&subprocess_map, entry); + + return 0; +} + +void subprocess_stop(struct subprocess_entry *entry) +{ + if (!entry) + return; + + entry->process.clean_on_exit = 0; + kill(entry->process.pid, SIGTERM); + finish_command(&entry->process); + + hashmap_remove(&subprocess_map, entry, NULL); +} + +struct subprocess_entry *subprocess_find_entry(const char *cmd) +{ + struct subprocess_entry key; + + if (!subprocess_map_initialized) { + hashmap_init(&subprocess_map, (hashmap_cmp_fn)name2process_cmp, 0); + subprocess_map_initialized = 1; + return NULL; + } + + hashmap_entry_init(&key, strhash(cmd)); + key.cmd = cmd; + return hashmap_get(&subprocess_map, &key, NULL); +} + +void subprocess_read_status(int fd, struct strbuf *status) +{ + struct strbuf **pair; + char *line; + for (;;) { + line = packet_read_line(fd, NULL); + if (!line) + break; + pair = strbuf_split_str(line, '=', 2); + if (pair[0] && pair[0]->len && pair[1]) { + /* the last "status=<foo>" line wins */ + if (!strcmp(pair[0]->buf, "status=")) { + strbuf_reset(status); + strbuf_addbuf(status, pair[1]); + } + } + strbuf_list_free(pair); + } +} diff --git a/sub-process.h b/sub-process.h new file mode 100644 index 0000000000..235f1e5fa3 --- /dev/null +++ b/sub-process.h @@ -0,0 +1,46 @@ +#ifndef SUBPROCESS_H +#define SUBPROCESS_H + +#include "git-compat-util.h" +#include "hashmap.h" +#include "run-command.h" + +/* + * Generic implementation of background process infrastructure. + * See Documentation/technical/api-background-process.txt. + */ + + /* data structures */ + +struct subprocess_entry { + struct hashmap_entry ent; /* must be the first member! */ + struct child_process process; + const char *cmd; +}; + +/* subprocess functions */ + +typedef int(*subprocess_start_fn)(struct subprocess_entry *entry); +int subprocess_start(struct subprocess_entry *entry, const char *cmd, + subprocess_start_fn startfn); + +void subprocess_stop(struct subprocess_entry *entry); + +struct subprocess_entry *subprocess_find_entry(const char *cmd); + +/* subprocess helper functions */ + +static inline struct child_process *subprocess_get_child_process( + struct subprocess_entry *entry) +{ + return &entry->process; +} + +/* + * Helper function that will read packets looking for "status=<foo>" + * key/value pairs and return the value from the last "status" packet + */ + +void subprocess_read_status(int fd, struct strbuf *status); + +#endif -- 2.12.0.gvfs.1.42.g0b7328eac2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module 2017-03-22 16:52 ` [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module Ben Peart @ 2017-03-23 6:16 ` Junio C Hamano 2017-03-24 12:39 ` Ben Peart 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2017-03-23 6:16 UTC (permalink / raw) To: Ben Peart; +Cc: git, benpeart, christian.couder, larsxschneider Ben Peart <peartben@gmail.com> writes: > This code is refactored from: > > Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16) > keeps the external process running and processes all commands I am afraid that this organization of the patch series is making it harder than necessary to review, by duplicating the same code first _WITH_ renaming of symbols, etc., in this step and then updaing the original callers while removing the now-stale original callee implementation in a separate next step. Would it perhaps make it clearer to understand what is going on if you instead started to update the code in convert.c in place to make it more suitable fro reuse as the patch title advertises, and then move the resulting cleaned-up code to a separate file, I wonder. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module 2017-03-23 6:16 ` Junio C Hamano @ 2017-03-24 12:39 ` Ben Peart 2017-03-24 16:10 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Ben Peart @ 2017-03-24 12:39 UTC (permalink / raw) To: Junio C Hamano, Ben Peart Cc: git@vger.kernel.org, christian.couder@gmail.com, larsxschneider@gmail.com > -----Original Message----- > From: Junio C Hamano [mailto:gitster@pobox.com] > Sent: Thursday, March 23, 2017 2:17 AM > To: Ben Peart <peartben@gmail.com> > Cc: git@vger.kernel.org; Ben Peart <Ben.Peart@microsoft.com>; > christian.couder@gmail.com; larsxschneider@gmail.com > Subject: Re: [PATCH v1 2/3] sub-process: refactor the filter process code into > a reusable module > > Ben Peart <peartben@gmail.com> writes: > > > This code is refactored from: > > > > Commit edcc85814c ("convert: add filter.<driver>.process option", > 2016-10-16) > > keeps the external process running and processes all commands > > I am afraid that this organization of the patch series is making it harder than > necessary to review, by duplicating the same code first _WITH_ renaming of > symbols, etc., in this step and then updaing the original callers while > removing the now-stale original callee implementation in a separate next > step. > > Would it perhaps make it clearer to understand what is going on if you > instead started to update the code in convert.c in place to make it more > suitable fro reuse as the patch title advertises, and then move the resulting > cleaned-up code to a separate file, I wonder. I'm not entirely sure what you're asking for here but I think the challenge may be that by splitting the refactoring into two separate commits, it's hard to see the before and after when looking at the commit in isolation. By splitting them, it's more "a bunch of new code" followed by "using new code" than it is a refactoring of existing code. How about I squash the last two patches together so that its more apparent that it's just a refactoring of existing code with the before and after in a single patch? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module 2017-03-24 12:39 ` Ben Peart @ 2017-03-24 16:10 ` Junio C Hamano 2017-03-24 17:15 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2017-03-24 16:10 UTC (permalink / raw) To: Ben Peart Cc: Ben Peart, git@vger.kernel.org, christian.couder@gmail.com, larsxschneider@gmail.com Ben Peart <Ben.Peart@microsoft.com> writes: > How about I squash the last two patches together so that its more > apparent that it's just a refactoring of existing code with the before > and after in a single patch? I do not think making a pair of patches, each already does too much, into one patch would make things easier to chew. It would make it even harder to understand. I offhand do not know how many patches the ideal split of this series should be, but I know it shouldn't be one. More likely that it should be N+1, and I know the last one should be "Now all data structures and variables have been renamed, all helper functions have been renamed and generalized for reuse while the original code these helper functions came from have been updated to call them, we can move them from convert.c to sub-process.[ch]; let's create these two files". This last step will be moving lines from an old file to two new files, almost without any modification (of course, "#ifndef SUB_PROCESS_H", "#include sub-process.h", etc. would be new lines so this will not be strictnly pure movement, but all readers of this message are intelligent enough to understand what I mean). What would the other N patches before the last step should contain, then? For example, your name2process_cmp() is just a renamed version of the original. Some of its callers may also just straight rename. These "only renaming, doing nothing else" changes can go into a single large patch and as long as you mark as such, the review burden can be lessened. It would be a "boring mechanical" part of the whole thing. On the other hand, your subprocess_start() shares quite a lot with the original start_multi_file_filter() but the latter does some more than the former (because the latter is more specific to the need of convert.c). A patch series must be structured to make it easier to review such changes, because they are the likely places where mistakes happen (both in implementation and in the helper API design). To get from the original start_multi_file_filter() to a new version that calls subprocess_start(), such a patch would _MOVE_ bunch of lines from the old function to the new function, the new function may acquire its own unique new lines, the old function would lose these moved lines but instead call the new function, etc. You can call it as "I refactored subprocess_start() out of start_multi_file_filter() and updated the latter to call the former." and that would be a single logical unit of the change. To make patches easier to understand, these logical unit would want to be reasonably small. And for something of sub-process.[ch]'s size, I suspect that it would have more than 1 such logical unit to be independently refactored, so in total, I would suspect the series would become 1 (boring mechanical part) + 2 or more (refactoring) + 1 (final movement) i.e. 4 or more patches? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module 2017-03-24 16:10 ` Junio C Hamano @ 2017-03-24 17:15 ` Junio C Hamano 2017-03-27 22:04 ` Ben Peart 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2017-03-24 17:15 UTC (permalink / raw) To: Ben Peart Cc: Ben Peart, git@vger.kernel.org, christian.couder@gmail.com, larsxschneider@gmail.com Junio C Hamano <gitster@pobox.com> writes: > ... > And for something of sub-process.[ch]'s size, I suspect that it > would have more than 1 such logical unit to be independently > refactored, so in total, I would suspect the series would become > > 1 (boring mechanical part) + > 2 or more (refactoring) + > 1 (final movement) > > i.e. 4 or more patches? To avoid confusion (although readers may not require), even though I explained "boring mechanical part" first and "refactoring", that was purely for explanation. In practice, I would expect that it would be easier to both do and review if refactoring is done with the original name. A function that will keep its name in the final result (e.g. start_multi_file_filter()) will call a new and more generic helper function. The new helper may start using the new name from the get-go (e.g. subprocess_start()), but the data types it shares with the original part of the code (e.g. 'struct cmd2process') may still be using the original name. And after completing "2 or more" refactoring would be a good place to do the remaining "boring mechanical rename". IOW, the count above could be 2 or more (refactoring) + 1 (boring mechanical part) + 1 (final movement) and I didn't mean to say that you need to rename first. What we want is "if you need to have a single large patch that cannot be split, see if you can make it purely mechanical.", as a single large patch that is _not_ mechanical conversion is the worst kind of patch for reviewers. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module 2017-03-24 17:15 ` Junio C Hamano @ 2017-03-27 22:04 ` Ben Peart 0 siblings, 0 replies; 12+ messages in thread From: Ben Peart @ 2017-03-27 22:04 UTC (permalink / raw) To: Junio C Hamano Cc: Ben Peart, git@vger.kernel.org, christian.couder@gmail.com, larsxschneider@gmail.com > Junio C Hamano <gitster@pobox.com> writes: > > > To avoid confusion (although readers may not require), even though I > explained "boring mechanical part" first and "refactoring", that was purely > for explanation. > > In practice, I would expect that it would be easier to both do and review if > refactoring is done with the original name. > > A function that will keep its name in the final result (e.g. > start_multi_file_filter()) will call a new and more generic helper function. The > new helper may start using the new name from the get-go (e.g. > subprocess_start()), but the data types it shares with the original part of the > code (e.g. 'struct cmd2process') may still be using the original name. > > And after completing "2 or more" refactoring would be a good place to do > the remaining "boring mechanical rename". IOW, the count above could be > > 2 or more (refactoring) + > 1 (boring mechanical part) + > 1 (final movement) > > and I didn't mean to say that you need to rename first. What we want is "if > you need to have a single large patch that cannot be split, see if you can > make it purely mechanical.", as a single large patch that is _not_ mechanical > conversion is the worst kind of patch for reviewers. Thanks, I think I better understand what you are looking for in a patch series. In short, any non trivial refactoring should take place within the same file using 1 or more patches to keep each patch as simple as possible. Any large or cross file refactoring should be made as boring/mechanical as possible. This is to make it easier to see any complex changes within a single format patch section and avoid having to look between two file patches to ensure the refactoring didn't unintentionally change behavior. I'll throw out my current refactoring and do it again attempting to follow these guidelines as soon as I can find the time ($DAYJOB tends to take priority over my open source work). ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v1 3/3] convert: use new sub-process module for filter processes 2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart 2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart 2017-03-22 16:52 ` [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module Ben Peart @ 2017-03-22 16:52 ` Ben Peart 2017-03-25 11:59 ` [PATCH v1 0/3] Add support for downloading blobs on demand Duy Nguyen 3 siblings, 0 replies; 12+ messages in thread From: Ben Peart @ 2017-03-22 16:52 UTC (permalink / raw) To: git; +Cc: benpeart, christian.couder, larsxschneider The filter.<driver>.process code was refactored into a separate sub-process module to facilitate reuse in future patch series. This centralizes the logic in a single place to simplify maintenance. As a side benefit, it also improves the readability of convert.c. Signed-off-by: Ben Peart <benpeart@microsoft.com> --- convert.c | 154 ++++++++++---------------------------------------------------- 1 file changed, 24 insertions(+), 130 deletions(-) diff --git a/convert.c b/convert.c index 8d652bf27c..ca8e0d26c0 100644 --- a/convert.c +++ b/convert.c @@ -4,6 +4,7 @@ #include "quote.h" #include "sigchain.h" #include "pkt-line.h" +#include "sub-process.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -497,132 +498,30 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le #define CAP_SMUDGE (1u<<1) struct cmd2process { - struct hashmap_entry ent; /* must be the first member! */ + struct subprocess_entry subprocess; unsigned int supported_capabilities; - const char *cmd; - struct child_process process; }; -static int cmd_process_map_initialized; -static struct hashmap cmd_process_map; - -static int cmd2process_cmp(const struct cmd2process *e1, - const struct cmd2process *e2, - const void *unused) -{ - return strcmp(e1->cmd, e2->cmd); -} - -static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd) -{ - struct cmd2process key; - hashmap_entry_init(&key, strhash(cmd)); - key.cmd = cmd; - return hashmap_get(hashmap, &key, NULL); -} - -static int packet_write_list(int fd, const char *line, ...) +static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) { - va_list args; int err; - va_start(args, line); - for (;;) { - if (!line) - break; - if (strlen(line) > LARGE_PACKET_DATA_MAX) - return -1; - err = packet_write_fmt_gently(fd, "%s\n", line); - if (err) - return err; - line = va_arg(args, const char*); - } - va_end(args); - return packet_flush_gently(fd); -} - -static void read_multi_file_filter_status(int fd, struct strbuf *status) -{ - struct strbuf **pair; - char *line; - for (;;) { - line = packet_read_line(fd, NULL); - if (!line) - break; - pair = strbuf_split_str(line, '=', 2); - if (pair[0] && pair[0]->len && pair[1]) { - /* the last "status=<foo>" line wins */ - if (!strcmp(pair[0]->buf, "status=")) { - strbuf_reset(status); - strbuf_addbuf(status, pair[1]); - } - } - strbuf_list_free(pair); - } -} - -static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry) -{ - if (!entry) - return; - - entry->process.clean_on_exit = 0; - kill(entry->process.pid, SIGTERM); - finish_command(&entry->process); - - hashmap_remove(hashmap, entry, NULL); - free(entry); -} - -static void stop_multi_file_filter(struct child_process *process) -{ - sigchain_push(SIGPIPE, SIG_IGN); - /* Closing the pipe signals the filter to initiate a shutdown. */ - close(process->in); - close(process->out); - sigchain_pop(SIGPIPE); - /* Finish command will wait until the shutdown is complete. */ - finish_command(process); -} - -static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd) -{ - int err; - struct cmd2process *entry; + struct cmd2process *entry = (struct cmd2process *)subprocess; struct child_process *process; - const char *argv[] = { cmd, NULL }; struct string_list cap_list = STRING_LIST_INIT_NODUP; char *cap_buf; const char *cap_name; - entry = xmalloc(sizeof(*entry)); - entry->cmd = cmd; - entry->supported_capabilities = 0; - process = &entry->process; - - child_process_init(process); - process->argv = argv; - process->use_shell = 1; - process->in = -1; - process->out = -1; - process->clean_on_exit = 1; - process->clean_on_exit_handler = stop_multi_file_filter; - - if (start_command(process)) { - error("cannot fork to run external filter '%s'", cmd); - return NULL; - } - - hashmap_entry_init(entry, strhash(cmd)); + process = subprocess_get_child_process(&entry->subprocess); sigchain_push(SIGPIPE, SIG_IGN); - err = packet_write_list(process->in, "git-filter-client", "version=2", NULL); + err = packet_write_list_gently(process->in, "git-filter-client", "version=2", NULL); if (err) goto done; err = strcmp(packet_read_line(process->out, NULL), "git-filter-server"); if (err) { - error("external filter '%s' does not support filter protocol version 2", cmd); + error("external filter '%s' does not support filter protocol version 2", subprocess->cmd); goto done; } err = strcmp(packet_read_line(process->out, NULL), "version=2"); @@ -632,7 +531,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons if (err) goto done; - err = packet_write_list(process->in, "capability=clean", "capability=smudge", NULL); + err = packet_write_list_gently(process->in, "capability=clean", "capability=smudge", NULL); for (;;) { cap_buf = packet_read_line(process->out, NULL); @@ -651,7 +550,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons } else { warning( "external filter '%s' requested unsupported filter capability '%s'", - cmd, cap_name + subprocess->cmd, cap_name ); } @@ -661,14 +560,10 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons done: sigchain_pop(SIGPIPE); - if (err || errno == EPIPE) { - error("initialization for external filter '%s' failed", cmd); - kill_multi_file_filter(hashmap, entry); - return NULL; - } + if (err || errno == EPIPE) + err = err ? err : errno; - hashmap_add(hashmap, entry); - return entry; + return err; } static int apply_multi_file_filter(const char *path, const char *src, size_t len, @@ -682,22 +577,20 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len struct strbuf filter_status = STRBUF_INIT; const char *filter_type; - if (!cmd_process_map_initialized) { - cmd_process_map_initialized = 1; - hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0); - entry = NULL; - } else { - entry = find_multi_file_filter_entry(&cmd_process_map, cmd); - } + entry = (struct cmd2process *)subprocess_find_entry(cmd); fflush(NULL); if (!entry) { - entry = start_multi_file_filter(&cmd_process_map, cmd); - if (!entry) + entry = xmalloc(sizeof(*entry)); + entry->supported_capabilities = 0; + + if (subprocess_start(&entry->subprocess, cmd, start_multi_file_filter_fn)) { + free(entry); return 0; + } } - process = &entry->process; + process = subprocess_get_child_process(&entry->subprocess); if (!(wanted_capability & entry->supported_capabilities)) return 0; @@ -737,7 +630,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len if (err) goto done; - read_multi_file_filter_status(process->out, &filter_status); + subprocess_read_status(process->out, &filter_status); err = strcmp(filter_status.buf, "success"); if (err) goto done; @@ -746,7 +639,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len if (err) goto done; - read_multi_file_filter_status(process->out, &filter_status); + subprocess_read_status(process->out, &filter_status); err = strcmp(filter_status.buf, "success"); done: @@ -768,7 +661,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len * Force shutdown and restart if another blob requires filtering. */ error("external filter '%s' failed", cmd); - kill_multi_file_filter(&cmd_process_map, entry); + subprocess_stop((struct subprocess_entry *)entry); + free(entry); } } else { strbuf_swap(dst, &nbuf); -- 2.12.0.gvfs.1.42.g0b7328eac2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v1 0/3] Add support for downloading blobs on demand 2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart ` (2 preceding siblings ...) 2017-03-22 16:52 ` [PATCH v1 3/3] convert: use new sub-process module for filter processes Ben Peart @ 2017-03-25 11:59 ` Duy Nguyen 3 siblings, 0 replies; 12+ messages in thread From: Duy Nguyen @ 2017-03-25 11:59 UTC (permalink / raw) To: Ben Peart; +Cc: Git Mailing List, benpeart, Christian Couder, Lars Schneider On Wed, Mar 22, 2017 at 11:52 PM, Ben Peart <peartben@gmail.com> wrote: > We have a couple of patch series we’re working on (ObjectDB/Read-Object, > Watchman integration) Oops, sorry. I should be reworking the index-helper series for watchman support, but I haven't time for it. Yes I'm also eyeing Lars code as the communication channel for any external helper (which in turn can talk to watchman or whatever). I guess I'll let you do it then. -- Duy ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-03-27 22:21 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-22 16:52 [PATCH v1 0/3] Add support for downloading blobs on demand Ben Peart 2017-03-22 16:52 ` [PATCH v1 1/3] pkt-line: add packet_write_list_gently() Ben Peart 2017-03-22 20:21 ` Junio C Hamano 2017-03-24 12:34 ` Ben Peart 2017-03-22 16:52 ` [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module Ben Peart 2017-03-23 6:16 ` Junio C Hamano 2017-03-24 12:39 ` Ben Peart 2017-03-24 16:10 ` Junio C Hamano 2017-03-24 17:15 ` Junio C Hamano 2017-03-27 22:04 ` Ben Peart 2017-03-22 16:52 ` [PATCH v1 3/3] convert: use new sub-process module for filter processes Ben Peart 2017-03-25 11:59 ` [PATCH v1 0/3] Add support for downloading blobs on demand Duy Nguyen
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).