* [PATCH v6 0/6] convert: add "status=delayed" to filter process protocol @ 2017-06-25 18:21 Lars Schneider 2017-06-25 18:21 ` [PATCH v6 1/6] t0021: keep filter log files on comparison Lars Schneider ` (5 more replies) 0 siblings, 6 replies; 24+ messages in thread From: Lars Schneider @ 2017-06-25 18:21 UTC (permalink / raw) To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben Hi, here is the 6th iteration of my "status delayed" topic. Patch 1 to 3 are minor t0021 test adjustments and haven't been changed since v3. Patch 4 is new and a minor style adjustment. Patch 5 is a minor "extract method" refactoring in convert.c with an additional minor style adjustment in this round. Patch 6 is the new feature. Changes since v5 (full inter diff below!): * commit message improvements (Torsten/Peff) * reorder checkout struct to keep base_dir* member together (Peff) * remove unnecessary initialization in CHECKOUT_INIT (Peff) * adhere to Git style (";" instead "{}") for empty if blocks (Torsten) * flag fields before flags (Peff) * make operator precedence explicit with parenthesis (Peff) * use skip_prefix() to simplify parsing and silently parse unknown keys (Peff) * add ce_delay_state explanation (Peff) * print files that the filter delayed but not delivered; clean up data structure (Peff) If you review this series then please read the "Delay" section in "Documentation/gitattributes.txt" first for an overview of the delay mechanism. The changes in "t/t0021/rot13-filter.pl" are easier to review if you ignore whitespace changes. I also noticed that the diff around the "static int apply_multi_file_filter" looks really weird as my new function "int async_query_available_blobs" is right behind it with similar lines. Thanks, Lars RFC: http://public-inbox.org/git/D10F7C47-14E8-465B-8B7A-A09A1B28A39F@gmail.com/ v1: http://public-inbox.org/git/20170108191736.47359-1-larsxschneider@gmail.com/ v2: http://public-inbox.org/git/20170226184816.30010-1-larsxschneider@gmail.com/ v3: http://public-inbox.org/git/20170409191107.20547-1-larsxschneider@gmail.com/ v4: http://public-inbox.org/git/20170522135001.54506-1-larsxschneider@gmail.com/ v5: http://public-inbox.org/git/20170601082203.50397-1-larsxschneider@gmail.com/ Base Ref: master Web-Diff: https://github.com/larsxschneider/git/commit/ab3535aa60 Checkout: git fetch https://github.com/larsxschneider/git filter-process/delay-v6 && git checkout ab3535aa60 Interdiff (v5..v6): diff --git a/cache.h b/cache.h index 523ead1d95..2ec12c4477 100644 --- a/cache.h +++ b/cache.h @@ -1543,14 +1543,14 @@ extern int ident_cmp(const struct ident_split *, const struct ident_split *); struct checkout { struct index_state *istate; const char *base_dir; - struct delayed_checkout *delayed_checkout; int base_dir_len; + struct delayed_checkout *delayed_checkout; unsigned force:1, quiet:1, not_new:1, refresh_cache:1; }; -#define CHECKOUT_INIT { NULL, "", NULL } +#define CHECKOUT_INIT { NULL, "" } #define TEMPORARY_FILENAME_LENGTH 25 diff --git a/convert.c b/convert.c index c4df174378..6452ab546a 100644 --- a/convert.c +++ b/convert.c @@ -572,9 +572,9 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) static void handle_filter_error(const struct strbuf *filter_status, struct cmd2process *entry, const unsigned int wanted_capability) { - if (!strcmp(filter_status->buf, "error")) { - /* The filter signaled a problem with the file. */ - } else if (!strcmp(filter_status->buf, "abort") && wanted_capability) { + if (!strcmp(filter_status->buf, "error")) + ; /* The filter signaled a problem with the file. */ + else if (!strcmp(filter_status->buf, "abort") && wanted_capability) { /* * The filter signaled a permanent problem. Don't try to filter * files with the same command for the lifetime of the current @@ -626,12 +626,12 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len } process = &entry->subprocess.process; - if (!(wanted_capability & entry->supported_capabilities)) + if (!(entry->supported_capabilities & wanted_capability)) return 0; - if (CAP_CLEAN & wanted_capability) + if (wanted_capability & CAP_CLEAN) filter_type = "clean"; - else if (CAP_SMUDGE & wanted_capability) + else if (wanted_capability & CAP_SMUDGE) filter_type = "smudge"; else die("unexpected filter type"); @@ -653,7 +653,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len if (err) goto done; - if (CAP_DELAY & entry->supported_capabilities && + if ((entry->supported_capabilities & CAP_DELAY) && dco && dco->state == CE_CAN_DELAY) { can_delay = 1; err = packet_write_fmt_gently(process->in, "can-delay=1\n"); @@ -736,16 +736,12 @@ int async_query_available_blobs(const char *cmd, struct string_list *delayed_pat if (err) goto done; - for (;;) { - const char* pre = "pathname="; - const int pre_len = strlen(pre); - line = packet_read_line(process->out, NULL); - if (!line) - break; - err = strlen(line) <= pre_len || strncmp(line, pre, pre_len); - if (err) - goto done; - string_list_insert(delayed_paths, xstrdup(line+pre_len)); + while ((line = packet_read_line(process->out, NULL))) { + const char *path; + if (skip_prefix(line, "pathname=", &path)) + string_list_insert(delayed_paths, xstrdup(path)); + else + ; /* ignore unknown keys */ } err = subprocess_read_status(process->out, &filter_status); @@ -784,9 +780,9 @@ static int apply_filter(const char *path, const char *src, size_t len, if (!dst) return 1; - if ((CAP_CLEAN & wanted_capability) && !drv->process && drv->clean) + if ((wanted_capability & CAP_CLEAN) && !drv->process && drv->clean) cmd = drv->clean; - else if ((CAP_SMUDGE & wanted_capability) && !drv->process && drv->smudge) + else if ((wanted_capability & CAP_SMUDGE) && !drv->process && drv->smudge) cmd = drv->smudge; if (cmd && *cmd) diff --git a/convert.h b/convert.h index c4beaa5101..51f0fb7fbe 100644 --- a/convert.h +++ b/convert.h @@ -42,6 +42,10 @@ enum ce_delay_state { }; struct delayed_checkout { + /* State of the currently processed cache entry. If the state is + CE_CAN_DELAY, then the filter can change this to CE_DELAYED. If + the state is CE_RETRY, then this signals the filter that the cache + entry was requested before. */ enum ce_delay_state state; /* List of filter drivers that signaled delayed blobs. */ struct string_list filters; diff --git a/entry.c b/entry.c index c9cb557559..c6d5cc01dc 100644 --- a/entry.c +++ b/entry.c @@ -159,10 +159,10 @@ int finish_delayed_checkout(struct checkout *state) struct string_list_item *filter, *path; struct delayed_checkout *dco = state->delayed_checkout; - if (!state->delayed_checkout) { + if (!state->delayed_checkout) return errs; - } + dco->state = CE_RETRY; while (dco->filters.nr > 0) { for_each_string_list_item(filter, &dco->filters) { struct string_list available_paths; @@ -195,7 +195,7 @@ int finish_delayed_checkout(struct checkout *state) struct cache_entry* ce = index_file_exists( state->istate, path->string, strlen(path->string), 0); - dco->state = CE_RETRY; + assert(dco->state == CE_RETRY); errs |= (ce ? checkout_entry(ce, state, NULL) : 1); } } @@ -205,6 +205,10 @@ int finish_delayed_checkout(struct checkout *state) /* At this point we should not have any delayed paths anymore. */ errs |= dco->paths.nr; + for_each_string_list_item(path, &dco->paths) { + warning(_("%s was not filtered properly."), path->string); + } + string_list_clear(&dco->paths, 0); free(dco); state->delayed_checkout = NULL; Lars Schneider (6): t0021: keep filter log files on comparison t0021: make debug log file name configurable t0021: write "OUT" only on success convert: put the flags field before the flag itself for consistent style convert: move multiple file filter error handling to separate function convert: add "status=delayed" to filter process protocol Documentation/gitattributes.txt | 65 ++++++++++++- builtin/checkout.c | 3 + cache.h | 4 + convert.c | 168 ++++++++++++++++++++++++--------- convert.h | 25 +++++ entry.c | 114 ++++++++++++++++++++++- t/t0021-conversion.sh | 136 ++++++++++++++++++++------- t/t0021/rot13-filter.pl | 199 ++++++++++++++++++++++++++-------------- unpack-trees.c | 2 + 9 files changed, 567 insertions(+), 149 deletions(-) base-commit: 0339965c70d68fd3831c9a5306443c869de3f6a8 -- 2.13.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v6 1/6] t0021: keep filter log files on comparison 2017-06-25 18:21 [PATCH v6 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider @ 2017-06-25 18:21 ` Lars Schneider 2017-06-25 22:12 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 2/6] t0021: make debug log file name configurable Lars Schneider ` (4 subsequent siblings) 5 siblings, 1 reply; 24+ messages in thread From: Lars Schneider @ 2017-06-25 18:21 UTC (permalink / raw) To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben The filter log files are modified on comparison. Write the modified log files to temp files for comparison to fix this. This is useful for the subsequent patch 'convert: add "status=delayed" to filter process protocol'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- t/t0021-conversion.sh | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 161f560446..ff2424225b 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -42,10 +42,10 @@ test_cmp_count () { for FILE in "$expect" "$actual" do sort "$FILE" | uniq -c | - sed -e "s/^ *[0-9][0-9]*[ ]*IN: /x IN: /" >"$FILE.tmp" && - mv "$FILE.tmp" "$FILE" || return + sed -e "s/^ *[0-9][0-9]*[ ]*IN: /x IN: /" >"$FILE.tmp" done && - test_cmp "$expect" "$actual" + test_cmp "$expect.tmp" "$actual.tmp" && + rm "$expect.tmp" "$actual.tmp" } # Compare two files but exclude all `clean` invocations because Git can @@ -56,10 +56,10 @@ test_cmp_exclude_clean () { actual=$2 for FILE in "$expect" "$actual" do - grep -v "IN: clean" "$FILE" >"$FILE.tmp" && - mv "$FILE.tmp" "$FILE" + grep -v "IN: clean" "$FILE" >"$FILE.tmp" done && - test_cmp "$expect" "$actual" + test_cmp "$expect.tmp" "$actual.tmp" && + rm "$expect.tmp" "$actual.tmp" } # Check that the contents of two files are equal and that their rot13 version -- 2.13.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v6 1/6] t0021: keep filter log files on comparison 2017-06-25 18:21 ` [PATCH v6 1/6] t0021: keep filter log files on comparison Lars Schneider @ 2017-06-25 22:12 ` Junio C Hamano 2017-06-26 9:02 ` Lars Schneider 0 siblings, 1 reply; 24+ messages in thread From: Junio C Hamano @ 2017-06-25 22:12 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Lars Schneider <larsxschneider@gmail.com> writes: > The filter log files are modified on comparison. Write the modified log files > to temp files for comparison to fix this. The phrase "to fix this" implies that it is _wrong_ to modify after comparing it, but it is unclear _why_ you think is wrong. After all, the purpose of this comparison helper is to see if these two are the same with cruft removed, and once the helper finds the answer to that question, the current users of the comparison helper do not reuse these files, so from _their_ point of view, there is nothing to "fix", is there? It would become a problem _if_ we want future users of this helper to reuse the same expect (or actual) multiple times and start from an unmodified one. There may be some other reason why you do not want the comparison to smudge these files. Please state what that reason is before saying "fix this". Thanks. > > This is useful for the subsequent patch 'convert: add "status=delayed" to > filter process protocol'. > > Signed-off-by: Lars Schneider <larsxschneider@gmail.com> > --- > t/t0021-conversion.sh | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh > index 161f560446..ff2424225b 100755 > --- a/t/t0021-conversion.sh > +++ b/t/t0021-conversion.sh > @@ -42,10 +42,10 @@ test_cmp_count () { > for FILE in "$expect" "$actual" > do > sort "$FILE" | uniq -c | > - sed -e "s/^ *[0-9][0-9]*[ ]*IN: /x IN: /" >"$FILE.tmp" && > - mv "$FILE.tmp" "$FILE" || return > + sed -e "s/^ *[0-9][0-9]*[ ]*IN: /x IN: /" >"$FILE.tmp" > done && > - test_cmp "$expect" "$actual" > + test_cmp "$expect.tmp" "$actual.tmp" && > + rm "$expect.tmp" "$actual.tmp" > } > > # Compare two files but exclude all `clean` invocations because Git can > @@ -56,10 +56,10 @@ test_cmp_exclude_clean () { > actual=$2 > for FILE in "$expect" "$actual" > do > - grep -v "IN: clean" "$FILE" >"$FILE.tmp" && > - mv "$FILE.tmp" "$FILE" > + grep -v "IN: clean" "$FILE" >"$FILE.tmp" > done && > - test_cmp "$expect" "$actual" > + test_cmp "$expect.tmp" "$actual.tmp" && > + rm "$expect.tmp" "$actual.tmp" > } > > # Check that the contents of two files are equal and that their rot13 version ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 1/6] t0021: keep filter log files on comparison 2017-06-25 22:12 ` Junio C Hamano @ 2017-06-26 9:02 ` Lars Schneider 2017-06-26 17:31 ` Junio C Hamano 0 siblings, 1 reply; 24+ messages in thread From: Lars Schneider @ 2017-06-26 9:02 UTC (permalink / raw) To: Junio C Hamano Cc: Git Users, Jeff King, Torsten Bögershausen, Eric Wong, Taylor Blau, peartben > On 26 Jun 2017, at 00:12, Junio C Hamano <gitster@pobox.com> wrote: > > Lars Schneider <larsxschneider@gmail.com> writes: > >> The filter log files are modified on comparison. Write the modified log files >> to temp files for comparison to fix this. > > The phrase "to fix this" implies that it is _wrong_ to modify after > comparing it, but it is unclear _why_ you think is wrong. After > all, the purpose of this comparison helper is to see if these two > are the same with cruft removed, and once the helper finds the > answer to that question, the current users of the comparison helper > do not reuse these files, so from _their_ point of view, there is > nothing to "fix", is there? > > It would become a problem _if_ we want future users of this helper > to reuse the same expect (or actual) multiple times and start from > an unmodified one. There may be some other reason why you do not > want the comparison to smudge these files. Please state what that > reason is before saying "fix this". Understood. How about this? The filter log files are modified on comparison. That might be unexpected by the caller. It would be even undesirable if the caller wants to reuse the original log files. Address these issues by using temp files for modifications. This is useful for the subsequent patch 'convert: add "status=delayed" to filter process protocol'. If this is OK, then do you want me to resend the series or can you fix it in place? Thanks, Lars ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 1/6] t0021: keep filter log files on comparison 2017-06-26 9:02 ` Lars Schneider @ 2017-06-26 17:31 ` Junio C Hamano 0 siblings, 0 replies; 24+ messages in thread From: Junio C Hamano @ 2017-06-26 17:31 UTC (permalink / raw) To: Lars Schneider Cc: Git Users, Jeff King, Torsten Bögershausen, Eric Wong, Taylor Blau, peartben Lars Schneider <larsxschneider@gmail.com> writes: >> It would become a problem _if_ we want future users of this helper >> to reuse the same expect (or actual) multiple times and start from >> an unmodified one. There may be some other reason why you do not >> want the comparison to smudge these files. Please state what that >> reason is before saying "fix this". > > Understood. How about this? > > The filter log files are modified on comparison. That might be > unexpected by the caller. It would be even undesirable if the caller > wants to reuse the original log files. > > Address these issues by using temp files for modifications. This is > useful for the subsequent patch 'convert: add "status=delayed" to > filter process protocol'. The updated one is much more understandable. Thanks. > If this is OK, then do you want me to resend the series or can you fix it > in place? In general, I am OK running "rebase -i" to polish the log message unless there are other changes to the patches planned. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v6 2/6] t0021: make debug log file name configurable 2017-06-25 18:21 [PATCH v6 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider 2017-06-25 18:21 ` [PATCH v6 1/6] t0021: keep filter log files on comparison Lars Schneider @ 2017-06-25 18:21 ` Lars Schneider 2017-06-25 22:15 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 3/6] t0021: write "OUT" only on success Lars Schneider ` (3 subsequent siblings) 5 siblings, 1 reply; 24+ messages in thread From: Lars Schneider @ 2017-06-25 18:21 UTC (permalink / raw) To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben The "rot13-filter.pl" helper wrote its debug logs always to "rot13-filter.log". Make this configurable by defining the log file as first parameter of "rot13-filter.pl". This is useful if "rot13-filter.pl" is configured multiple times similar to the subsequent patch 'convert: add "status=delayed" to filter process protocol'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- t/t0021-conversion.sh | 44 ++++++++++++++++++++++---------------------- t/t0021/rot13-filter.pl | 8 +++++--- 2 files changed, 27 insertions(+), 25 deletions(-) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index ff2424225b..0139b460e7 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -28,7 +28,7 @@ file_size () { } filter_git () { - rm -f rot13-filter.log && + rm -f *.log && git "$@" } @@ -342,7 +342,7 @@ test_expect_success 'diff does not reuse worktree files that need cleaning' ' ' test_expect_success PERL 'required process filter should filter data' ' - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && test_config_global filter.protocol.required true && rm -rf repo && mkdir repo && @@ -375,7 +375,7 @@ test_expect_success PERL 'required process filter should filter data' ' IN: clean testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] STOP EOF - test_cmp_count expected.log rot13-filter.log && + test_cmp_count expected.log debug.log && git commit -m "test commit 2" && rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x=.r" && @@ -388,7 +388,7 @@ test_expect_success PERL 'required process filter should filter data' ' IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && filter_git checkout --quiet --no-progress empty-branch && cat >expected.log <<-EOF && @@ -397,7 +397,7 @@ test_expect_success PERL 'required process filter should filter data' ' IN: clean test.r $S [OK] -- OUT: $S . [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && filter_git checkout --quiet --no-progress master && cat >expected.log <<-EOF && @@ -409,7 +409,7 @@ test_expect_success PERL 'required process filter should filter data' ' IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r && test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r && @@ -419,7 +419,7 @@ test_expect_success PERL 'required process filter should filter data' ' test_expect_success PERL 'required process filter takes precedence' ' test_config_global filter.protocol.clean false && - test_config_global filter.protocol.process "rot13-filter.pl clean" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" && test_config_global filter.protocol.required true && rm -rf repo && mkdir repo && @@ -439,12 +439,12 @@ test_expect_success PERL 'required process filter takes precedence' ' IN: clean test.r $S [OK] -- OUT: $S . [OK] STOP EOF - test_cmp_count expected.log rot13-filter.log + test_cmp_count expected.log debug.log ) ' test_expect_success PERL 'required process filter should be used only for "clean" operation only' ' - test_config_global filter.protocol.process "rot13-filter.pl clean" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" && rm -rf repo && mkdir repo && ( @@ -462,7 +462,7 @@ test_expect_success PERL 'required process filter should be used only for "clean IN: clean test.r $S [OK] -- OUT: $S . [OK] STOP EOF - test_cmp_count expected.log rot13-filter.log && + test_cmp_count expected.log debug.log && rm test.r && @@ -474,12 +474,12 @@ test_expect_success PERL 'required process filter should be used only for "clean init handshake complete STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log + test_cmp_exclude_clean expected.log debug.log ) ' test_expect_success PERL 'required process filter should process multiple packets' ' - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && test_config_global filter.protocol.required true && rm -rf repo && @@ -514,7 +514,7 @@ test_expect_success PERL 'required process filter should process multiple packet IN: clean 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK] STOP EOF - test_cmp_count expected.log rot13-filter.log && + test_cmp_count expected.log debug.log && rm -f *.file && @@ -529,7 +529,7 @@ test_expect_success PERL 'required process filter should process multiple packet IN: smudge 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && for FILE in *.file do @@ -539,7 +539,7 @@ test_expect_success PERL 'required process filter should process multiple packet ' test_expect_success PERL 'required process filter with clean error should fail' ' - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && test_config_global filter.protocol.required true && rm -rf repo && mkdir repo && @@ -558,7 +558,7 @@ test_expect_success PERL 'required process filter with clean error should fail' ' test_expect_success PERL 'process filter should restart after unexpected write failure' ' - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && rm -rf repo && mkdir repo && ( @@ -579,7 +579,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f git add . && rm -f *.r && - rm -f rot13-filter.log && + rm -f debug.log && git checkout --quiet --no-progress . 2>git-stderr.log && grep "smudge write error at" git-stderr.log && @@ -595,7 +595,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r && test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r && @@ -609,7 +609,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f ' test_expect_success PERL 'process filter should not be restarted if it signals an error' ' - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && rm -rf repo && mkdir repo && ( @@ -639,7 +639,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r && test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r && @@ -648,7 +648,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a ' test_expect_success PERL 'process filter abort stops processing of all further files' ' - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && rm -rf repo && mkdir repo && ( @@ -676,7 +676,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && test_cmp "$TEST_ROOT/test.o" test.r && test_cmp "$TEST_ROOT/test2.o" test2.r && diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl index 617f581e56..0b943bb377 100644 --- a/t/t0021/rot13-filter.pl +++ b/t/t0021/rot13-filter.pl @@ -2,8 +2,9 @@ # Example implementation for the Git filter protocol version 2 # See Documentation/gitattributes.txt, section "Filter Protocol" # -# The script takes the list of supported protocol capabilities as -# arguments ("clean", "smudge", etc). +# The first argument defines a debug log file that the script write to. +# All remaining arguments define a list of supported protocol +# capabilities ("clean", "smudge", etc). # # This implementation supports special test cases: # (1) If data with the pathname "clean-write-fail.r" is processed with @@ -24,9 +25,10 @@ use warnings; use IO::File; my $MAX_PACKET_CONTENT_SIZE = 65516; +my $log_file = shift @ARGV; my @capabilities = @ARGV; -open my $debug, ">>", "rot13-filter.log" or die "cannot open log file: $!"; +open my $debug, ">>", $log_file or die "cannot open log file: $!"; sub rot13 { my $str = shift; -- 2.13.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v6 2/6] t0021: make debug log file name configurable 2017-06-25 18:21 ` [PATCH v6 2/6] t0021: make debug log file name configurable Lars Schneider @ 2017-06-25 22:15 ` Junio C Hamano 0 siblings, 0 replies; 24+ messages in thread From: Junio C Hamano @ 2017-06-25 22:15 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Lars Schneider <larsxschneider@gmail.com> writes: > The "rot13-filter.pl" helper wrote its debug logs always to "rot13-filter.log". > Make this configurable by defining the log file as first parameter of > "rot13-filter.pl". > > This is useful if "rot13-filter.pl" is configured multiple times similar to the > subsequent patch 'convert: add "status=delayed" to filter process protocol'. Makes sense. If you didn't rename the output file to "debug.log" the patch may have been less noisy (because you do not have to touch the test_cmp_count calls), but I am guessing that it helped you to make sure that you covered all the existing users of rot13 filter, so I am OK with that part of the change as well. Looks good. Thanks. > > Signed-off-by: Lars Schneider <larsxschneider@gmail.com> > --- > t/t0021-conversion.sh | 44 ++++++++++++++++++++++---------------------- > t/t0021/rot13-filter.pl | 8 +++++--- > 2 files changed, 27 insertions(+), 25 deletions(-) > > diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh > index ff2424225b..0139b460e7 100755 > --- a/t/t0021-conversion.sh > +++ b/t/t0021-conversion.sh > @@ -28,7 +28,7 @@ file_size () { > } > > filter_git () { > - rm -f rot13-filter.log && > + rm -f *.log && > git "$@" > } > > @@ -342,7 +342,7 @@ test_expect_success 'diff does not reuse worktree files that need cleaning' ' > ' > > test_expect_success PERL 'required process filter should filter data' ' > - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && > + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && > test_config_global filter.protocol.required true && > rm -rf repo && > mkdir repo && > @@ -375,7 +375,7 @@ test_expect_success PERL 'required process filter should filter data' ' > IN: clean testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] > STOP > EOF > - test_cmp_count expected.log rot13-filter.log && > + test_cmp_count expected.log debug.log && > > git commit -m "test commit 2" && > rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x=.r" && > @@ -388,7 +388,7 @@ test_expect_success PERL 'required process filter should filter data' ' > IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] > STOP > EOF > - test_cmp_exclude_clean expected.log rot13-filter.log && > + test_cmp_exclude_clean expected.log debug.log && > > filter_git checkout --quiet --no-progress empty-branch && > cat >expected.log <<-EOF && > @@ -397,7 +397,7 @@ test_expect_success PERL 'required process filter should filter data' ' > IN: clean test.r $S [OK] -- OUT: $S . [OK] > STOP > EOF > - test_cmp_exclude_clean expected.log rot13-filter.log && > + test_cmp_exclude_clean expected.log debug.log && > > filter_git checkout --quiet --no-progress master && > cat >expected.log <<-EOF && > @@ -409,7 +409,7 @@ test_expect_success PERL 'required process filter should filter data' ' > IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] > STOP > EOF > - test_cmp_exclude_clean expected.log rot13-filter.log && > + test_cmp_exclude_clean expected.log debug.log && > > test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r && > test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r && > @@ -419,7 +419,7 @@ test_expect_success PERL 'required process filter should filter data' ' > > test_expect_success PERL 'required process filter takes precedence' ' > test_config_global filter.protocol.clean false && > - test_config_global filter.protocol.process "rot13-filter.pl clean" && > + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" && > test_config_global filter.protocol.required true && > rm -rf repo && > mkdir repo && > @@ -439,12 +439,12 @@ test_expect_success PERL 'required process filter takes precedence' ' > IN: clean test.r $S [OK] -- OUT: $S . [OK] > STOP > EOF > - test_cmp_count expected.log rot13-filter.log > + test_cmp_count expected.log debug.log > ) > ' > > test_expect_success PERL 'required process filter should be used only for "clean" operation only' ' > - test_config_global filter.protocol.process "rot13-filter.pl clean" && > + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" && > rm -rf repo && > mkdir repo && > ( > @@ -462,7 +462,7 @@ test_expect_success PERL 'required process filter should be used only for "clean > IN: clean test.r $S [OK] -- OUT: $S . [OK] > STOP > EOF > - test_cmp_count expected.log rot13-filter.log && > + test_cmp_count expected.log debug.log && > > rm test.r && > > @@ -474,12 +474,12 @@ test_expect_success PERL 'required process filter should be used only for "clean > init handshake complete > STOP > EOF > - test_cmp_exclude_clean expected.log rot13-filter.log > + test_cmp_exclude_clean expected.log debug.log > ) > ' > > test_expect_success PERL 'required process filter should process multiple packets' ' > - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && > + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && > test_config_global filter.protocol.required true && > > rm -rf repo && > @@ -514,7 +514,7 @@ test_expect_success PERL 'required process filter should process multiple packet > IN: clean 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK] > STOP > EOF > - test_cmp_count expected.log rot13-filter.log && > + test_cmp_count expected.log debug.log && > > rm -f *.file && > > @@ -529,7 +529,7 @@ test_expect_success PERL 'required process filter should process multiple packet > IN: smudge 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK] > STOP > EOF > - test_cmp_exclude_clean expected.log rot13-filter.log && > + test_cmp_exclude_clean expected.log debug.log && > > for FILE in *.file > do > @@ -539,7 +539,7 @@ test_expect_success PERL 'required process filter should process multiple packet > ' > > test_expect_success PERL 'required process filter with clean error should fail' ' > - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && > + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && > test_config_global filter.protocol.required true && > rm -rf repo && > mkdir repo && > @@ -558,7 +558,7 @@ test_expect_success PERL 'required process filter with clean error should fail' > ' > > test_expect_success PERL 'process filter should restart after unexpected write failure' ' > - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && > + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && > rm -rf repo && > mkdir repo && > ( > @@ -579,7 +579,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f > git add . && > rm -f *.r && > > - rm -f rot13-filter.log && > + rm -f debug.log && > git checkout --quiet --no-progress . 2>git-stderr.log && > > grep "smudge write error at" git-stderr.log && > @@ -595,7 +595,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f > IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK] > STOP > EOF > - test_cmp_exclude_clean expected.log rot13-filter.log && > + test_cmp_exclude_clean expected.log debug.log && > > test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r && > test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r && > @@ -609,7 +609,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f > ' > > test_expect_success PERL 'process filter should not be restarted if it signals an error' ' > - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && > + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && > rm -rf repo && > mkdir repo && > ( > @@ -639,7 +639,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a > IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK] > STOP > EOF > - test_cmp_exclude_clean expected.log rot13-filter.log && > + test_cmp_exclude_clean expected.log debug.log && > > test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r && > test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r && > @@ -648,7 +648,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a > ' > > test_expect_success PERL 'process filter abort stops processing of all further files' ' > - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && > + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && > rm -rf repo && > mkdir repo && > ( > @@ -676,7 +676,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f > IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT] > STOP > EOF > - test_cmp_exclude_clean expected.log rot13-filter.log && > + test_cmp_exclude_clean expected.log debug.log && > > test_cmp "$TEST_ROOT/test.o" test.r && > test_cmp "$TEST_ROOT/test2.o" test2.r && > diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl > index 617f581e56..0b943bb377 100644 > --- a/t/t0021/rot13-filter.pl > +++ b/t/t0021/rot13-filter.pl > @@ -2,8 +2,9 @@ > # Example implementation for the Git filter protocol version 2 > # See Documentation/gitattributes.txt, section "Filter Protocol" > # > -# The script takes the list of supported protocol capabilities as > -# arguments ("clean", "smudge", etc). > +# The first argument defines a debug log file that the script write to. > +# All remaining arguments define a list of supported protocol > +# capabilities ("clean", "smudge", etc). > # > # This implementation supports special test cases: > # (1) If data with the pathname "clean-write-fail.r" is processed with > @@ -24,9 +25,10 @@ use warnings; > use IO::File; > > my $MAX_PACKET_CONTENT_SIZE = 65516; > +my $log_file = shift @ARGV; > my @capabilities = @ARGV; > > -open my $debug, ">>", "rot13-filter.log" or die "cannot open log file: $!"; > +open my $debug, ">>", $log_file or die "cannot open log file: $!"; > > sub rot13 { > my $str = shift; ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v6 3/6] t0021: write "OUT" only on success 2017-06-25 18:21 [PATCH v6 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider 2017-06-25 18:21 ` [PATCH v6 1/6] t0021: keep filter log files on comparison Lars Schneider 2017-06-25 18:21 ` [PATCH v6 2/6] t0021: make debug log file name configurable Lars Schneider @ 2017-06-25 18:21 ` Lars Schneider 2017-06-25 22:17 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 4/6] convert: put the flags field before the flag itself for consistent style Lars Schneider ` (2 subsequent siblings) 5 siblings, 1 reply; 24+ messages in thread From: Lars Schneider @ 2017-06-25 18:21 UTC (permalink / raw) To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben "rot13-filter.pl" used to write "OUT <size>" to the debug log even in case of an abort or error. Fix this by writing "OUT <size>" to the debug log only in the successful case if output is actually written. This is useful for the subsequent patch 'convert: add "status=delayed" to filter process protocol'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- t/t0021-conversion.sh | 6 +++--- t/t0021/rot13-filter.pl | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 0139b460e7..0c04d346a1 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -588,7 +588,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f cat >expected.log <<-EOF && START init handshake complete - IN: smudge smudge-write-fail.r $SF [OK] -- OUT: $SF [WRITE FAIL] + IN: smudge smudge-write-fail.r $SF [OK] -- [WRITE FAIL] START init handshake complete IN: smudge test.r $S [OK] -- OUT: $S . [OK] @@ -634,7 +634,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a cat >expected.log <<-EOF && START init handshake complete - IN: smudge error.r $SE [OK] -- OUT: 0 [ERROR] + IN: smudge error.r $SE [OK] -- [ERROR] IN: smudge test.r $S [OK] -- OUT: $S . [OK] IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK] STOP @@ -673,7 +673,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f cat >expected.log <<-EOF && START init handshake complete - IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT] + IN: smudge abort.r $SA [OK] -- [ABORT] STOP EOF test_cmp_exclude_clean expected.log debug.log && diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl index 0b943bb377..5e43faeec1 100644 --- a/t/t0021/rot13-filter.pl +++ b/t/t0021/rot13-filter.pl @@ -153,9 +153,6 @@ while (1) { die "bad command '$command'"; } - print $debug "OUT: " . length($output) . " "; - $debug->flush(); - if ( $pathname eq "error.r" ) { print $debug "[ERROR]\n"; $debug->flush(); @@ -178,6 +175,9 @@ while (1) { die "${command} write error"; } + print $debug "OUT: " . length($output) . " "; + $debug->flush(); + while ( length($output) > 0 ) { my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE ); packet_bin_write($packet); -- 2.13.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v6 3/6] t0021: write "OUT" only on success 2017-06-25 18:21 ` [PATCH v6 3/6] t0021: write "OUT" only on success Lars Schneider @ 2017-06-25 22:17 ` Junio C Hamano 2017-06-26 9:26 ` Lars Schneider 0 siblings, 1 reply; 24+ messages in thread From: Junio C Hamano @ 2017-06-25 22:17 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Lars Schneider <larsxschneider@gmail.com> writes: > "rot13-filter.pl" used to write "OUT <size>" to the debug log even in case of > an abort or error. Fix this by writing "OUT <size>" to the debug log only in > the successful case if output is actually written. Again, use of "Fix this" without clarifying what the problem is. Is this change needed because the size may not be known when the new filter protocol is in use, or something? ` ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 3/6] t0021: write "OUT" only on success 2017-06-25 22:17 ` Junio C Hamano @ 2017-06-26 9:26 ` Lars Schneider 2017-06-26 17:50 ` Junio C Hamano 0 siblings, 1 reply; 24+ messages in thread From: Lars Schneider @ 2017-06-26 9:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, peff, tboegi, e, ttaylorr, peartben > On 26 Jun 2017, at 00:17, Junio C Hamano <gitster@pobox.com> wrote: > > Lars Schneider <larsxschneider@gmail.com> writes: > >> "rot13-filter.pl" used to write "OUT <size>" to the debug log even in case of >> an abort or error. Fix this by writing "OUT <size>" to the debug log only in >> the successful case if output is actually written. > > Again, use of "Fix this" without clarifying what the problem is. Is > this change needed because the size may not be known when the new > filter protocol is in use, or something? How about this? "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end of an interaction. This works without issues for the existing cases "abort", "error", and "success". In a subsequent patch 'convert: add "status=delayed" to filter process protocol' we will add a new case "delayed". In that case we do not send the data right away and it would be wrong/misleading to the reader if we would write "OUT <size>" to the debug log. Address this issue by writing "OUT <size>" to the debug log only if output is actually written in the successful case. - Lars ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 3/6] t0021: write "OUT" only on success 2017-06-26 9:26 ` Lars Schneider @ 2017-06-26 17:50 ` Junio C Hamano 2017-06-26 18:32 ` Lars Schneider 0 siblings, 1 reply; 24+ messages in thread From: Junio C Hamano @ 2017-06-26 17:50 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Lars Schneider <larsxschneider@gmail.com> writes: >> On 26 Jun 2017, at 00:17, Junio C Hamano <gitster@pobox.com> wrote: >> >> Lars Schneider <larsxschneider@gmail.com> writes: >> >>> "rot13-filter.pl" used to write "OUT <size>" to the debug log even in case of >>> an abort or error. Fix this by writing "OUT <size>" to the debug log only in >>> the successful case if output is actually written. >> >> Again, use of "Fix this" without clarifying what the problem is. Is >> this change needed because the size may not be known when the new >> filter protocol is in use, or something? > > How about this? > > "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end > of an interaction. > > This works without issues for the existing cases "abort", "error", and > "success". In a subsequent patch 'convert: add "status=delayed" to > filter process protocol' we will add a new case "delayed". In that case > we do not send the data right away and it would be wrong/misleading to > the reader if we would write "OUT <size>" to the debug log. I still do not quite get "we do not send the data right away" as opposed to "at the end of an interaction" makes it wrong/misleading to write "OUT <size>"? A new response "delayed" that will be introduced in subsequent patches accepts the input, without giving the filtered result right away, and at that moment, we do not even have the output size yet. might be a very reasonable rationale for omitting "OUT: <size>" in the output when "delayed" request is seen, but that still does not explain why it is sensible to drop "OUT: <size>" for error or abort case. Having said all that, I tend to agree with the actual change. When we abort or error, we aren't producing any useful output, so it may be sensible _not_ to say "OUT: 0" in the log output from the test helper in these cases. And if the change is explained that way, readers would understand why this step is a good thing to have, with or without the future "delayed" step in the series. Thanks. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 3/6] t0021: write "OUT" only on success 2017-06-26 17:50 ` Junio C Hamano @ 2017-06-26 18:32 ` Lars Schneider 0 siblings, 0 replies; 24+ messages in thread From: Lars Schneider @ 2017-06-26 18:32 UTC (permalink / raw) To: Junio C Hamano Cc: Git Users, Jeff King, Torsten Bögershausen, Eric Wong, Taylor Blau, peartben > On 26 Jun 2017, at 19:50, Junio C Hamano <gitster@pobox.com> wrote: > > Lars Schneider <larsxschneider@gmail.com> writes: > >>> On 26 Jun 2017, at 00:17, Junio C Hamano <gitster@pobox.com> wrote: >>> >>> Lars Schneider <larsxschneider@gmail.com> writes: >>> >>>> "rot13-filter.pl" used to write "OUT <size>" to the debug log even in case of >>>> an abort or error. Fix this by writing "OUT <size>" to the debug log only in >>>> the successful case if output is actually written. >>> >>> Again, use of "Fix this" without clarifying what the problem is. Is >>> this change needed because the size may not be known when the new >>> filter protocol is in use, or something? >> >> How about this? >> >> "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end >> of an interaction. >> >> This works without issues for the existing cases "abort", "error", and >> "success". In a subsequent patch 'convert: add "status=delayed" to >> filter process protocol' we will add a new case "delayed". In that case >> we do not send the data right away and it would be wrong/misleading to >> the reader if we would write "OUT <size>" to the debug log. > > I still do not quite get "we do not send the data right away" as > opposed to "at the end of an interaction" makes it wrong/misleading > to write "OUT <size>"? > > A new response "delayed" that will be introduced in subsequent > patches accepts the input, without giving the filtered result > right away, and at that moment, we do not even have the output > size yet. > > might be a very reasonable rationale for omitting "OUT: <size>" in > the output when "delayed" request is seen, but that still does not > explain why it is sensible to drop "OUT: <size>" for error or abort > case. Well, "rot13-filter.pl" *does* have the output available right away even in the delayed case (because of the mock implementation). If we print "OUT: <size>" all the time then this would lead to misleading log messages in this situation. It's not necessary to drop "OUT: <size>" for abort and error. It was just the way that required the least number of lines of code. > Having said all that, I tend to agree with the actual change. When > we abort or error, we aren't producing any useful output, so it may > be sensible _not_ to say "OUT: 0" in the log output from the test > helper in these cases. And if the change is explained that way, > readers would understand why this step is a good thing to have, with > or without the future "delayed" step in the series. How about this? "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end of a response. This works without issues for the existing responses "abort", "error", and "success". A new response "delayed", that will be introduced in subsequent patches, accepts the input without giving the filtered result right away. Since we actually have the data already available in our mock filter the debug log output would be wrong/misleading. Therefore, we do not write "OUT <size>" for "delayed" responses. To simplify the code we do not write "OUT <size>" for "abort" and "error" responses, too, as their size is always zero. - Lars ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v6 4/6] convert: put the flags field before the flag itself for consistent style 2017-06-25 18:21 [PATCH v6 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider ` (2 preceding siblings ...) 2017-06-25 18:21 ` [PATCH v6 3/6] t0021: write "OUT" only on success Lars Schneider @ 2017-06-25 18:21 ` Lars Schneider 2017-06-25 22:19 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 5/6] convert: move multiple file filter error handling to separate function Lars Schneider 2017-06-25 18:21 ` [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol Lars Schneider 5 siblings, 1 reply; 24+ messages in thread From: Lars Schneider @ 2017-06-25 18:21 UTC (permalink / raw) To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- convert.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/convert.c b/convert.c index f1e168bc30..9907e3b9ba 100644 --- a/convert.c +++ b/convert.c @@ -597,12 +597,12 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len } process = &entry->subprocess.process; - if (!(wanted_capability & entry->supported_capabilities)) + if (!(entry->supported_capabilities & wanted_capability)) return 0; - if (CAP_CLEAN & wanted_capability) + if (wanted_capability & CAP_CLEAN) filter_type = "clean"; - else if (CAP_SMUDGE & wanted_capability) + else if (wanted_capability & CAP_SMUDGE) filter_type = "smudge"; else die("unexpected filter type"); @@ -703,9 +703,9 @@ static int apply_filter(const char *path, const char *src, size_t len, if (!dst) return 1; - if ((CAP_CLEAN & wanted_capability) && !drv->process && drv->clean) + if ((wanted_capability & CAP_CLEAN) && !drv->process && drv->clean) cmd = drv->clean; - else if ((CAP_SMUDGE & wanted_capability) && !drv->process && drv->smudge) + else if ((wanted_capability & CAP_SMUDGE) && !drv->process && drv->smudge) cmd = drv->smudge; if (cmd && *cmd) -- 2.13.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v6 4/6] convert: put the flags field before the flag itself for consistent style 2017-06-25 18:21 ` [PATCH v6 4/6] convert: put the flags field before the flag itself for consistent style Lars Schneider @ 2017-06-25 22:19 ` Junio C Hamano 0 siblings, 0 replies; 24+ messages in thread From: Junio C Hamano @ 2017-06-25 22:19 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Lars Schneider <larsxschneider@gmail.com> writes: > Suggested-by: Jeff King <peff@peff.net> > Signed-off-by: Lars Schneider <larsxschneider@gmail.com> > --- > convert.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) An obviously correct no-op. I do not particularly see the first one making any improvement (but it is not making things any worse, either), but turning (CONST & var) to (var & CONST) may make it easier to read for some people so I wouldn't complain. Will queue. > > diff --git a/convert.c b/convert.c > index f1e168bc30..9907e3b9ba 100644 > --- a/convert.c > +++ b/convert.c > @@ -597,12 +597,12 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len > } > process = &entry->subprocess.process; > > - if (!(wanted_capability & entry->supported_capabilities)) > + if (!(entry->supported_capabilities & wanted_capability)) > return 0; > > - if (CAP_CLEAN & wanted_capability) > + if (wanted_capability & CAP_CLEAN) > filter_type = "clean"; > - else if (CAP_SMUDGE & wanted_capability) > + else if (wanted_capability & CAP_SMUDGE) > filter_type = "smudge"; > else > die("unexpected filter type"); > @@ -703,9 +703,9 @@ static int apply_filter(const char *path, const char *src, size_t len, > if (!dst) > return 1; > > - if ((CAP_CLEAN & wanted_capability) && !drv->process && drv->clean) > + if ((wanted_capability & CAP_CLEAN) && !drv->process && drv->clean) > cmd = drv->clean; > - else if ((CAP_SMUDGE & wanted_capability) && !drv->process && drv->smudge) > + else if ((wanted_capability & CAP_SMUDGE) && !drv->process && drv->smudge) > cmd = drv->smudge; > > if (cmd && *cmd) ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v6 5/6] convert: move multiple file filter error handling to separate function 2017-06-25 18:21 [PATCH v6 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider ` (3 preceding siblings ...) 2017-06-25 18:21 ` [PATCH v6 4/6] convert: put the flags field before the flag itself for consistent style Lars Schneider @ 2017-06-25 18:21 ` Lars Schneider 2017-06-26 17:56 ` Junio C Hamano 2017-06-26 18:00 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol Lars Schneider 5 siblings, 2 replies; 24+ messages in thread From: Lars Schneider @ 2017-06-25 18:21 UTC (permalink / raw) To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben Refactoring the filter error handling is useful for the subsequent patch 'convert: add "status=delayed" to filter process protocol'. In addition, replace the parentheses around the empty "if" block with a single semicolon to adhere to the Git style guide. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- convert.c | 47 ++++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/convert.c b/convert.c index 9907e3b9ba..e55c034d86 100644 --- a/convert.c +++ b/convert.c @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) return err; } +static void handle_filter_error(const struct strbuf *filter_status, + struct cmd2process *entry, + const unsigned int wanted_capability) { + if (!strcmp(filter_status->buf, "error")) + ; /* The filter signaled a problem with the file. */ + else if (!strcmp(filter_status->buf, "abort") && wanted_capability) { + /* + * The filter signaled a permanent problem. Don't try to filter + * files with the same command for the lifetime of the current + * Git process. + */ + entry->supported_capabilities &= ~wanted_capability; + } else { + /* + * Something went wrong with the protocol filter. + * Force shutdown and restart if another blob requires filtering. + */ + error("external filter '%s' failed", entry->subprocess.cmd); + subprocess_stop(&subprocess_map, &entry->subprocess); + free(entry); + } +} + static int apply_multi_file_filter(const char *path, const char *src, size_t len, int fd, struct strbuf *dst, const char *cmd, const unsigned int wanted_capability) @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len done: sigchain_pop(SIGPIPE); - if (err) { - if (!strcmp(filter_status.buf, "error")) { - /* The filter signaled a problem with the file. */ - } else if (!strcmp(filter_status.buf, "abort")) { - /* - * The filter signaled a permanent problem. Don't try to filter - * files with the same command for the lifetime of the current - * Git process. - */ - entry->supported_capabilities &= ~wanted_capability; - } else { - /* - * Something went wrong with the protocol filter. - * Force shutdown and restart if another blob requires filtering. - */ - error("external filter '%s' failed", cmd); - subprocess_stop(&subprocess_map, &entry->subprocess); - free(entry); - } - } else { + if (err) + handle_filter_error(&filter_status, entry, wanted_capability); + else strbuf_swap(dst, &nbuf); - } strbuf_release(&nbuf); return !err; } -- 2.13.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v6 5/6] convert: move multiple file filter error handling to separate function 2017-06-25 18:21 ` [PATCH v6 5/6] convert: move multiple file filter error handling to separate function Lars Schneider @ 2017-06-26 17:56 ` Junio C Hamano 2017-06-27 2:51 ` Stefan Beller 2017-06-26 18:00 ` Junio C Hamano 1 sibling, 1 reply; 24+ messages in thread From: Junio C Hamano @ 2017-06-26 17:56 UTC (permalink / raw) To: Stefan Beller; +Cc: git, peff, Lars Schneider Not about this patch, but viewing this with git show -w --color-moved=zebra gives an interesting result. The bulk of the part moved are re-indented, and the comment string gets zebra stripes, as if the line movement detection code does not realize that these came from the same place. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 5/6] convert: move multiple file filter error handling to separate function 2017-06-26 17:56 ` Junio C Hamano @ 2017-06-27 2:51 ` Stefan Beller 0 siblings, 0 replies; 24+ messages in thread From: Stefan Beller @ 2017-06-27 2:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: git@vger.kernel.org, Jeff King, Lars Schneider On Mon, Jun 26, 2017 at 10:56 AM, Junio C Hamano <gitster@pobox.com> wrote: > Not about this patch, but viewing this with > > git show -w --color-moved=zebra > > gives an interesting result. The bulk of the part moved are > re-indented, and the comment string gets zebra stripes, as if the > line movement detection code does not realize that these came from > the same place. > Thanks for the pointer, I'll include the whitespace-less comparison in tests. Thanks, Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 5/6] convert: move multiple file filter error handling to separate function 2017-06-25 18:21 ` [PATCH v6 5/6] convert: move multiple file filter error handling to separate function Lars Schneider 2017-06-26 17:56 ` Junio C Hamano @ 2017-06-26 18:00 ` Junio C Hamano 1 sibling, 0 replies; 24+ messages in thread From: Junio C Hamano @ 2017-06-26 18:00 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Lars Schneider <larsxschneider@gmail.com> writes: > Refactoring the filter error handling is useful for the subsequent patch > 'convert: add "status=delayed" to filter process protocol'. > > In addition, replace the parentheses around the empty "if" block with a > single semicolon to adhere to the Git style guide. > > Signed-off-by: Lars Schneider <larsxschneider@gmail.com> > --- > convert.c | 47 ++++++++++++++++++++++++++--------------------- > 1 file changed, 26 insertions(+), 21 deletions(-) The patch looks obviously correct. Thanks. > diff --git a/convert.c b/convert.c > index 9907e3b9ba..e55c034d86 100644 > --- a/convert.c > +++ b/convert.c > @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) > return err; > } > > +static void handle_filter_error(const struct strbuf *filter_status, > + struct cmd2process *entry, > + const unsigned int wanted_capability) { > + if (!strcmp(filter_status->buf, "error")) > + ; /* The filter signaled a problem with the file. */ > + else if (!strcmp(filter_status->buf, "abort") && wanted_capability) { > + /* > + * The filter signaled a permanent problem. Don't try to filter > + * files with the same command for the lifetime of the current > + * Git process. > + */ > + entry->supported_capabilities &= ~wanted_capability; > + } else { > + /* > + * Something went wrong with the protocol filter. > + * Force shutdown and restart if another blob requires filtering. > + */ > + error("external filter '%s' failed", entry->subprocess.cmd); > + subprocess_stop(&subprocess_map, &entry->subprocess); > + free(entry); > + } > +} > + > static int apply_multi_file_filter(const char *path, const char *src, size_t len, > int fd, struct strbuf *dst, const char *cmd, > const unsigned int wanted_capability) > @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len > done: > sigchain_pop(SIGPIPE); > > - if (err) { > - if (!strcmp(filter_status.buf, "error")) { > - /* The filter signaled a problem with the file. */ > - } else if (!strcmp(filter_status.buf, "abort")) { > - /* > - * The filter signaled a permanent problem. Don't try to filter > - * files with the same command for the lifetime of the current > - * Git process. > - */ > - entry->supported_capabilities &= ~wanted_capability; > - } else { > - /* > - * Something went wrong with the protocol filter. > - * Force shutdown and restart if another blob requires filtering. > - */ > - error("external filter '%s' failed", cmd); > - subprocess_stop(&subprocess_map, &entry->subprocess); > - free(entry); > - } > - } else { > + if (err) > + handle_filter_error(&filter_status, entry, wanted_capability); > + else > strbuf_swap(dst, &nbuf); > - } > strbuf_release(&nbuf); > return !err; > } ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol 2017-06-25 18:21 [PATCH v6 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider ` (4 preceding siblings ...) 2017-06-25 18:21 ` [PATCH v6 5/6] convert: move multiple file filter error handling to separate function Lars Schneider @ 2017-06-25 18:21 ` Lars Schneider 2017-06-26 19:02 ` Junio C Hamano 5 siblings, 1 reply; 24+ messages in thread From: Lars Schneider @ 2017-06-25 18:21 UTC (permalink / raw) To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben Some `clean` / `smudge` filters might require a significant amount of time to process a single blob (e.g. the Git LFS smudge filter might perform network requests). During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol (introduced in edcc858) to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations for now. The optimization is most effective in these code paths as all files of the tree are processed. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- Documentation/gitattributes.txt | 65 +++++++++++++- builtin/checkout.c | 3 + cache.h | 4 + convert.c | 115 ++++++++++++++++++++---- convert.h | 25 ++++++ entry.c | 114 ++++++++++++++++++++++-- t/t0021-conversion.sh | 74 ++++++++++++++++ t/t0021/rot13-filter.pl | 189 ++++++++++++++++++++++++++-------------- unpack-trees.c | 2 + 9 files changed, 501 insertions(+), 90 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 4736483865..5489e8b723 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -425,8 +425,8 @@ packet: git< capability=clean packet: git< capability=smudge packet: git< 0000 ------------------------ -Supported filter capabilities in version 2 are "clean" and -"smudge". +Supported filter capabilities in version 2 are "clean", "smudge", +and "delay". Afterwards Git sends a list of "key=value" pairs terminated with a flush packet. The list will contain at least the filter command @@ -512,12 +512,69 @@ the protocol then Git will stop the filter process and restart it with the next file that needs to be processed. Depending on the `filter.<driver>.required` flag Git will interpret that as error. -After the filter has processed a blob it is expected to wait for -the next "key=value" list containing a command. Git will close +After the filter has processed a command it is expected to wait for +a "key=value" list containing the next command. Git will close the command pipe on exit. The filter is expected to detect EOF and exit gracefully on its own. Git will wait until the filter process has stopped. +Delay +^^^^^ + +If the filter supports the "delay" capability, then Git can send the +flag "can-delay" after the filter command and pathname. This flag +denotes that the filter can delay filtering the current blob (e.g. to +compensate network latencies) by responding with no content but with +the status "delayed" and a flush packet. +------------------------ +packet: git> command=smudge +packet: git> pathname=path/testfile.dat +packet: git> can-delay=1 +packet: git> 0000 +packet: git> CONTENT +packet: git> 0000 +packet: git< status=delayed +packet: git< 0000 +------------------------ + +If the filter supports the "delay" capability then it must support the +"list_available_blobs" command. If Git sends this command, then the +filter is expected to return a list of pathnames of blobs that are +available. The list must be terminated with a flush packet followed +by a "success" status that is also terminated with a flush packet. If +no blobs for the delayed paths are available, yet, then the filter is +expected to block the response until at least one blob becomes +available. The filter can tell Git that it has no more delayed blobs +by sending an empty list. +------------------------ +packet: git> command=list_available_blobs +packet: git> 0000 +packet: git< pathname=path/testfile.dat +packet: git< pathname=path/otherfile.dat +packet: git< 0000 +packet: git< status=success +packet: git< 0000 +------------------------ + +After Git received the pathnames, it will request the corresponding +blobs again. These requests contain a pathname and an empty content +section. The filter is expected to respond with the smudged content +in the usual way as explained above. +------------------------ +packet: git> command=smudge +packet: git> pathname=path/testfile.dat +packet: git> 0000 +packet: git> 0000 # empty content! +packet: git< status=success +packet: git< 0000 +packet: git< SMUDGED_CONTENT +packet: git< 0000 +packet: git< 0000 # empty list, keep "status=success" unchanged! +------------------------ + +Example +^^^^^^^ + A long running filter demo implementation can be found in `contrib/long-running-filter/example.pl` located in the Git core repository. If you develop your own long running filter diff --git a/builtin/checkout.c b/builtin/checkout.c index a6b2af39d3..c1a256df8d 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -376,6 +376,8 @@ static int checkout_paths(const struct checkout_opts *opts, state.force = 1; state.refresh_cache = 1; state.istate = &the_index; + + enable_delayed_checkout(&state); for (pos = 0; pos < active_nr; pos++) { struct cache_entry *ce = active_cache[pos]; if (ce->ce_flags & CE_MATCHED) { @@ -390,6 +392,7 @@ static int checkout_paths(const struct checkout_opts *opts, pos = skip_same_name(ce, pos) - 1; } } + errs |= finish_delayed_checkout(&state); if (write_locked_index(&the_index, lock_file, COMMIT_LOCK)) die(_("unable to write new index file")); diff --git a/cache.h b/cache.h index ae4c45d379..2ec12c4477 100644 --- a/cache.h +++ b/cache.h @@ -1544,6 +1544,7 @@ struct checkout { struct index_state *istate; const char *base_dir; int base_dir_len; + struct delayed_checkout *delayed_checkout; unsigned force:1, quiet:1, not_new:1, @@ -1551,8 +1552,11 @@ struct checkout { }; #define CHECKOUT_INIT { NULL, "" } + #define TEMPORARY_FILENAME_LENGTH 25 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath); +extern void enable_delayed_checkout(struct checkout *state); +extern int finish_delayed_checkout(struct checkout *state); struct cache_def { struct strbuf path; diff --git a/convert.c b/convert.c index e55c034d86..6452ab546a 100644 --- a/convert.c +++ b/convert.c @@ -496,6 +496,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le #define CAP_CLEAN (1u<<0) #define CAP_SMUDGE (1u<<1) +#define CAP_DELAY (1u<<2) struct cmd2process { struct subprocess_entry subprocess; /* must be the first member! */ @@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) if (err) goto done; - err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL); + err = packet_writel(process->in, + "capability=clean", "capability=smudge", "capability=delay", NULL); for (;;) { cap_buf = packet_read_line(process->out, NULL); @@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) entry->supported_capabilities |= CAP_CLEAN; } else if (!strcmp(cap_name, "smudge")) { entry->supported_capabilities |= CAP_SMUDGE; + } else if (!strcmp(cap_name, "delay")) { + entry->supported_capabilities |= CAP_DELAY; } else { warning( "external filter '%s' requested unsupported filter capability '%s'", @@ -590,9 +594,11 @@ static void handle_filter_error(const struct strbuf *filter_status, static int apply_multi_file_filter(const char *path, const char *src, size_t len, int fd, struct strbuf *dst, const char *cmd, - const unsigned int wanted_capability) + const unsigned int wanted_capability, + struct delayed_checkout *dco) { int err; + int can_delay = 0; struct cmd2process *entry; struct child_process *process; struct strbuf nbuf = STRBUF_INIT; @@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len if (err) goto done; + if ((entry->supported_capabilities & CAP_DELAY) && + dco && dco->state == CE_CAN_DELAY) { + can_delay = 1; + err = packet_write_fmt_gently(process->in, "can-delay=1\n"); + if (err) + goto done; + } + err = packet_flush_gently(process->in); if (err) goto done; @@ -662,14 +676,74 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len if (err) goto done; - err = strcmp(filter_status.buf, "success"); + if (can_delay && !strcmp(filter_status.buf, "delayed")) { + dco->state = CE_DELAYED; + string_list_insert(&dco->filters, cmd); + string_list_insert(&dco->paths, path); + } else { + /* The filter got the blob and wants to send us a response. */ + err = strcmp(filter_status.buf, "success"); + if (err) + goto done; + + err = read_packetized_to_strbuf(process->out, &nbuf) < 0; + if (err) + goto done; + + err = subprocess_read_status(process->out, &filter_status); + if (err) + goto done; + + err = strcmp(filter_status.buf, "success"); + } + +done: + sigchain_pop(SIGPIPE); + + if (err) + handle_filter_error(&filter_status, entry, wanted_capability); + else + strbuf_swap(dst, &nbuf); + strbuf_release(&nbuf); + return !err; +} + + +int async_query_available_blobs(const char *cmd, struct string_list *delayed_paths) +{ + int err; + char *line; + struct cmd2process *entry; + struct child_process *process; + struct strbuf filter_status = STRBUF_INIT; + + assert(subprocess_map_initialized); + entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd); + if (!entry) { + error("external filter '%s' is not available anymore although " + "not all paths have been filtered", cmd); + return 0; + } + process = &entry->subprocess.process; + sigchain_push(SIGPIPE, SIG_IGN); + + err = packet_write_fmt_gently( + process->in, "command=list_available_blobs\n"); if (err) goto done; - err = read_packetized_to_strbuf(process->out, &nbuf) < 0; + err = packet_flush_gently(process->in); if (err) goto done; + while ((line = packet_read_line(process->out, NULL))) { + const char *path; + if (skip_prefix(line, "pathname=", &path)) + string_list_insert(delayed_paths, xstrdup(path)); + else + ; /* ignore unknown keys */ + } + err = subprocess_read_status(process->out, &filter_status); if (err) goto done; @@ -680,10 +754,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len sigchain_pop(SIGPIPE); if (err) - handle_filter_error(&filter_status, entry, wanted_capability); - else - strbuf_swap(dst, &nbuf); - strbuf_release(&nbuf); + handle_filter_error(&filter_status, entry, 0); return !err; } @@ -698,7 +769,8 @@ static struct convert_driver { static int apply_filter(const char *path, const char *src, size_t len, int fd, struct strbuf *dst, struct convert_driver *drv, - const unsigned int wanted_capability) + const unsigned int wanted_capability, + struct delayed_checkout *dco) { const char *cmd = NULL; @@ -716,7 +788,8 @@ static int apply_filter(const char *path, const char *src, size_t len, if (cmd && *cmd) return apply_single_file_filter(path, src, len, fd, dst, cmd); else if (drv->process && *drv->process) - return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability); + return apply_multi_file_filter(path, src, len, fd, dst, + drv->process, wanted_capability, dco); return 0; } @@ -1057,7 +1130,7 @@ int would_convert_to_git_filter_fd(const char *path) if (!ca.drv->required) return 0; - return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN); + return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL); } const char *get_convert_attr_ascii(const char *path) @@ -1094,7 +1167,7 @@ int convert_to_git(const char *path, const char *src, size_t len, convert_attrs(&ca, path); - ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN); + ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN, NULL); if (!ret && ca.drv && ca.drv->required) die("%s: clean filter '%s' failed", path, ca.drv->name); @@ -1119,7 +1192,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst, assert(ca.drv); assert(ca.drv->clean || ca.drv->process); - if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN)) + if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN, NULL)) die("%s: clean filter '%s' failed", path, ca.drv->name); crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe); @@ -1128,7 +1201,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst, static int convert_to_working_tree_internal(const char *path, const char *src, size_t len, struct strbuf *dst, - int normalizing) + int normalizing, struct delayed_checkout *dco) { int ret = 0, ret_filter = 0; struct conv_attrs ca; @@ -1153,21 +1226,29 @@ static int convert_to_working_tree_internal(const char *path, const char *src, } } - ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE); + ret_filter = apply_filter( + path, src, len, -1, dst, ca.drv, CAP_SMUDGE, dco); if (!ret_filter && ca.drv && ca.drv->required) die("%s: smudge filter %s failed", path, ca.drv->name); return ret | ret_filter; } +int async_convert_to_working_tree(const char *path, const char *src, + size_t len, struct strbuf *dst, + void *dco) +{ + return convert_to_working_tree_internal(path, src, len, dst, 0, dco); +} + int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst) { - return convert_to_working_tree_internal(path, src, len, dst, 0); + return convert_to_working_tree_internal(path, src, len, dst, 0, NULL); } int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst) { - int ret = convert_to_working_tree_internal(path, src, len, dst, 1); + int ret = convert_to_working_tree_internal(path, src, len, dst, 1, NULL); if (ret) { src = dst->buf; len = dst->len; diff --git a/convert.h b/convert.h index 82871a11d5..51f0fb7fbe 100644 --- a/convert.h +++ b/convert.h @@ -4,6 +4,8 @@ #ifndef CONVERT_H #define CONVERT_H +#include "string-list.h" + enum safe_crlf { SAFE_CRLF_FALSE = 0, SAFE_CRLF_FAIL = 1, @@ -32,6 +34,25 @@ enum eol { #endif }; +enum ce_delay_state { + CE_NO_DELAY = 0, + CE_CAN_DELAY = 1, + CE_DELAYED = 2, + CE_RETRY = 3 +}; + +struct delayed_checkout { + /* State of the currently processed cache entry. If the state is + CE_CAN_DELAY, then the filter can change this to CE_DELAYED. If + the state is CE_RETRY, then this signals the filter that the cache + entry was requested before. */ + enum ce_delay_state state; + /* List of filter drivers that signaled delayed blobs. */ + struct string_list filters; + /* List of delayed blobs identified by their path. */ + struct string_list paths; +}; + extern enum eol core_eol; extern const char *get_cached_convert_stats_ascii(const char *path); extern const char *get_wt_convert_stats_ascii(const char *path); @@ -42,6 +63,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len, struct strbuf *dst, enum safe_crlf checksafe); extern int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst); +extern int async_convert_to_working_tree(const char *path, const char *src, + size_t len, struct strbuf *dst, + void *dco); +extern int async_query_available_blobs(const char *cmd, struct string_list *delayed_paths); extern int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst); static inline int would_convert_to_git(const char *path) diff --git a/entry.c b/entry.c index d6b263f78e..c6d5cc01dc 100644 --- a/entry.c +++ b/entry.c @@ -137,6 +137,85 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path, return result; } +void enable_delayed_checkout(struct checkout *state) +{ + if (!state->delayed_checkout) { + state->delayed_checkout = xmalloc(sizeof(*state->delayed_checkout)); + state->delayed_checkout->state = CE_CAN_DELAY; + string_list_init(&state->delayed_checkout->filters, 0); + string_list_init(&state->delayed_checkout->paths, 0); + } +} + +static int remove_available_paths(struct string_list_item *item, void *cb_data) +{ + struct string_list *available_paths = cb_data; + return !string_list_has_string(available_paths, item->string); +} + +int finish_delayed_checkout(struct checkout *state) +{ + int errs = 0; + struct string_list_item *filter, *path; + struct delayed_checkout *dco = state->delayed_checkout; + + if (!state->delayed_checkout) + return errs; + + dco->state = CE_RETRY; + while (dco->filters.nr > 0) { + for_each_string_list_item(filter, &dco->filters) { + struct string_list available_paths; + string_list_init(&available_paths, 0); + + if (!async_query_available_blobs(filter->string, &available_paths)) { + /* Filter reported an error */ + errs = 1; + filter->string = ""; + continue; + } + if (available_paths.nr <= 0) { + /* Filter responded with no entries. That means + the filter is done and we can remove the + filter from the list (see + "string_list_remove_empty_items" call below). + */ + filter->string = ""; + continue; + } + + /* In dco->paths we store a list of all delayed paths. + The filter just send us a list of available paths. + Remove them from the list. + */ + filter_string_list(&dco->paths, 0, + &remove_available_paths, &available_paths); + + for_each_string_list_item(path, &available_paths) { + struct cache_entry* ce = index_file_exists( + state->istate, path->string, + strlen(path->string), 0); + assert(dco->state == CE_RETRY); + errs |= (ce ? checkout_entry(ce, state, NULL) : 1); + } + } + string_list_remove_empty_items(&dco->filters, 0); + } + string_list_clear(&dco->filters, 0); + + /* At this point we should not have any delayed paths anymore. */ + errs |= dco->paths.nr; + for_each_string_list_item(path, &dco->paths) { + warning(_("%s was not filtered properly."), path->string); + } + string_list_clear(&dco->paths, 0); + + free(dco); + state->delayed_checkout = NULL; + + return errs; +} + static int write_entry(struct cache_entry *ce, char *path, const struct checkout *state, int to_tempfile) { @@ -179,11 +258,36 @@ static int write_entry(struct cache_entry *ce, /* * Convert from git internal format to working tree format */ - if (ce_mode_s_ifmt == S_IFREG && - convert_to_working_tree(ce->name, new, size, &buf)) { - free(new); - new = strbuf_detach(&buf, &newsize); - size = newsize; + if (ce_mode_s_ifmt == S_IFREG) { + struct delayed_checkout *dco = state->delayed_checkout; + if (dco && dco->state != CE_NO_DELAY) { + /* Do not send the blob in case of a retry. */ + if (dco->state == CE_RETRY) { + new = NULL; + size = 0; + } + ret = async_convert_to_working_tree( + ce->name, new, size, &buf, dco); + if (ret && dco->state == CE_DELAYED) { + free(new); + /* Reset the state of the next blob */ + dco->state = CE_CAN_DELAY; + goto finish; + } + } else + ret = convert_to_working_tree( + ce->name, new, size, &buf); + + if (ret) { + free(new); + new = strbuf_detach(&buf, &newsize); + size = newsize; + } + /* + * No "else" here as errors from convert are OK at this + * point. If the error would have been fatal (e.g. + * filter is required), then we would have died already. + */ } fd = open_output_fd(path, ce, to_tempfile); diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 0c04d346a1..4b5a45fd43 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -701,4 +701,78 @@ test_expect_success PERL 'invalid process filter must fail (and not hang!)' ' ) ' +test_expect_success PERL 'delayed checkout in process filter' ' + test_config_global filter.a.process "rot13-filter.pl a.log clean smudge delay" && + test_config_global filter.a.required true && + test_config_global filter.b.process "rot13-filter.pl b.log clean smudge delay" && + test_config_global filter.b.required true && + + rm -rf repo && + mkdir repo && + ( + cd repo && + git init && + echo "*.a filter=a" >.gitattributes && + echo "*.b filter=b" >>.gitattributes && + cp "$TEST_ROOT/test.o" test.a && + cp "$TEST_ROOT/test.o" test-delay10.a && + cp "$TEST_ROOT/test.o" test-delay11.a && + cp "$TEST_ROOT/test.o" test-delay20.a && + cp "$TEST_ROOT/test.o" test-delay10.b && + git add . && + git commit -m "test commit 1" + ) && + + S=$(file_size "$TEST_ROOT/test.o") && + cat >a.exp <<-EOF && + START + init handshake complete + IN: smudge test.a $S [OK] -- OUT: $S . [OK] + IN: smudge test-delay10.a $S [OK] -- [DELAYED] + IN: smudge test-delay11.a $S [OK] -- [DELAYED] + IN: smudge test-delay20.a $S [OK] -- [DELAYED] + IN: list_available_blobs test-delay10.a test-delay11.a [OK] + IN: smudge test-delay10.a 0 [OK] -- OUT: $S . [OK] + IN: smudge test-delay11.a 0 [OK] -- OUT: $S . [OK] + IN: list_available_blobs test-delay20.a [OK] + IN: smudge test-delay20.a 0 [OK] -- OUT: $S . [OK] + IN: list_available_blobs [OK] + STOP + EOF + cat >b.exp <<-EOF && + START + init handshake complete + IN: smudge test-delay10.b $S [OK] -- [DELAYED] + IN: list_available_blobs test-delay10.b [OK] + IN: smudge test-delay10.b 0 [OK] -- OUT: $S . [OK] + IN: list_available_blobs [OK] + STOP + EOF + + rm -rf repo-cloned && + filter_git clone repo repo-cloned && + test_cmp_count a.exp repo-cloned/a.log && + test_cmp_count b.exp repo-cloned/b.log && + + ( + cd repo-cloned && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.a && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.a && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay11.a && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay20.a && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.b && + + rm *.a *.b && + filter_git checkout . && + test_cmp_count ../a.exp a.log && + test_cmp_count ../b.exp b.log && + + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.a && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.a && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay11.a && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay20.a && + test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.b + ) +' + test_done diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl index 5e43faeec1..f0dc0aff4a 100644 --- a/t/t0021/rot13-filter.pl +++ b/t/t0021/rot13-filter.pl @@ -18,6 +18,11 @@ # operation then the filter signals that it cannot or does not want # to process the file and any file after that is processed with the # same command. +# (5) If data with a pathname that is a key in the DELAY hash is +# requested (e.g. 'test-delay10.a') then the filter responds with +# a "delay" status and sets the "requested" field in the DELAY hash. +# The filter will signal the availability of this object after +# "count" (field in DELAY hash) "list_available_blobs" commands. # use strict; @@ -30,6 +35,13 @@ my @capabilities = @ARGV; open my $debug, ">>", $log_file or die "cannot open log file: $!"; +my %DELAY = ( + 'test-delay10.a' => { "requested" => 0, "count" => 1 }, + 'test-delay11.a' => { "requested" => 0, "count" => 1 }, + 'test-delay20.a' => { "requested" => 0, "count" => 2 }, + 'test-delay10.b' => { "requested" => 0, "count" => 1 }, +); + sub rot13 { my $str = shift; $str =~ y/A-Za-z/N-ZA-Mn-za-m/; @@ -66,7 +78,7 @@ sub packet_bin_read { sub packet_txt_read { my ( $res, $buf ) = packet_bin_read(); - unless ( $buf =~ s/\n$// ) { + unless ( $buf eq '' or $buf =~ s/\n$// ) { die "A non-binary line MUST be terminated by an LF."; } return ( $res, $buf ); @@ -101,6 +113,7 @@ packet_flush(); ( packet_txt_read() eq ( 0, "capability=clean" ) ) || die "bad capability"; ( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability"; +( packet_txt_read() eq ( 0, "capability=delay" ) ) || die "bad capability"; ( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end"; foreach (@capabilities) { @@ -115,84 +128,132 @@ while (1) { print $debug "IN: $command"; $debug->flush(); - my ($pathname) = packet_txt_read() =~ /^pathname=(.+)$/; - print $debug " $pathname"; - $debug->flush(); - - if ( $pathname eq "" ) { - die "bad pathname '$pathname'"; - } + if ( $command eq "list_available_blobs" ) { + # Flush + packet_bin_read(); - # Flush - packet_bin_read(); - - my $input = ""; - { - binmode(STDIN); - my $buffer; - my $done = 0; - while ( !$done ) { - ( $done, $buffer ) = packet_bin_read(); - $input .= $buffer; + foreach my $pathname (sort keys %DELAY) { + if ( $DELAY{$pathname}{"requested"} >= 1 ) { + $DELAY{$pathname}{"count"} = $DELAY{$pathname}{"count"} - 1; + if ($DELAY{$pathname}{"count"} == 0 ) { + print $debug " $pathname"; + packet_txt_write("pathname=$pathname"); + } + } } - print $debug " " . length($input) . " [OK] -- "; - $debug->flush(); - } - - my $output; - if ( $pathname eq "error.r" or $pathname eq "abort.r" ) { - $output = ""; - } - elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) { - $output = rot13($input); - } - elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) { - $output = rot13($input); - } - else { - die "bad command '$command'"; - } - if ( $pathname eq "error.r" ) { - print $debug "[ERROR]\n"; - $debug->flush(); - packet_txt_write("status=error"); packet_flush(); - } - elsif ( $pathname eq "abort.r" ) { - print $debug "[ABORT]\n"; + + print $debug " [OK]\n"; $debug->flush(); - packet_txt_write("status=abort"); + packet_txt_write("status=success"); packet_flush(); } else { - packet_txt_write("status=success"); - packet_flush(); + my ($pathname) = packet_txt_read() =~ /^pathname=(.+)$/; + print $debug " $pathname"; + $debug->flush(); + + if ( $pathname eq "" ) { + die "bad pathname '$pathname'"; + } + + # Read until flush + my ( $done, $buffer ) = packet_txt_read(); + while ( $buffer ne '' ) { + if ( $buffer eq "can-delay=1" ) { + if ( exists $DELAY{$pathname} and $DELAY{$pathname}{"requested"} == 0 ) { + $DELAY{$pathname}{"requested"} = 1; + } + } else { + die "Unknown message '$buffer'"; + } - if ( $pathname eq "${command}-write-fail.r" ) { - print $debug "[WRITE FAIL]\n"; + ( $done, $buffer ) = packet_txt_read(); + } + + my $input = ""; + { + binmode(STDIN); + my $buffer; + my $done = 0; + while ( !$done ) { + ( $done, $buffer ) = packet_bin_read(); + $input .= $buffer; + } + print $debug " " . length($input) . " [OK] -- "; $debug->flush(); - die "${command} write error"; } - print $debug "OUT: " . length($output) . " "; - $debug->flush(); + my $output; + if ( exists $DELAY{$pathname} and exists $DELAY{$pathname}{"output"} ) { + $output = $DELAY{$pathname}{"output"} + } + elsif ( $pathname eq "error.r" or $pathname eq "abort.r" ) { + $output = ""; + } + elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) { + $output = rot13($input); + } + elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) { + $output = rot13($input); + } + else { + die "bad command '$command'"; + } + + if ( $pathname eq "error.r" ) { + print $debug "[ERROR]\n"; + $debug->flush(); + packet_txt_write("status=error"); + packet_flush(); + } + elsif ( $pathname eq "abort.r" ) { + print $debug "[ABORT]\n"; + $debug->flush(); + packet_txt_write("status=abort"); + packet_flush(); + } + elsif ( $command eq "smudge" and + exists $DELAY{$pathname} and + $DELAY{$pathname}{"requested"} == 1 + ) { + print $debug "[DELAYED]\n"; + $debug->flush(); + packet_txt_write("status=delayed"); + packet_flush(); + $DELAY{$pathname}{"requested"} = 2; + $DELAY{$pathname}{"output"} = $output; + } + else { + packet_txt_write("status=success"); + packet_flush(); - while ( length($output) > 0 ) { - my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE ); - packet_bin_write($packet); - # dots represent the number of packets - print $debug "."; - if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) { - $output = substr( $output, $MAX_PACKET_CONTENT_SIZE ); + if ( $pathname eq "${command}-write-fail.r" ) { + print $debug "[WRITE FAIL]\n"; + $debug->flush(); + die "${command} write error"; } - else { - $output = ""; + + print $debug "OUT: " . length($output) . " "; + $debug->flush(); + + while ( length($output) > 0 ) { + my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE ); + packet_bin_write($packet); + # dots represent the number of packets + print $debug "."; + if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) { + $output = substr( $output, $MAX_PACKET_CONTENT_SIZE ); + } + else { + $output = ""; + } } + packet_flush(); + print $debug " [OK]\n"; + $debug->flush(); + packet_flush(); } - packet_flush(); - print $debug " [OK]\n"; - $debug->flush(); - packet_flush(); } } diff --git a/unpack-trees.c b/unpack-trees.c index d38c37e38c..491a2562ad 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -379,6 +379,7 @@ static int check_updates(struct unpack_trees_options *o) if (should_update_submodules() && o->update && !o->dry_run) reload_gitmodules_file(index, &state); + enable_delayed_checkout(&state); for (i = 0; i < index->cache_nr; i++) { struct cache_entry *ce = index->cache[i]; @@ -393,6 +394,7 @@ static int check_updates(struct unpack_trees_options *o) } } } + errs |= finish_delayed_checkout(&state); stop_progress(&progress); if (o->update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); -- 2.13.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol 2017-06-25 18:21 ` [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol Lars Schneider @ 2017-06-26 19:02 ` Junio C Hamano 2017-06-26 21:28 ` Lars Schneider 0 siblings, 1 reply; 24+ messages in thread From: Junio C Hamano @ 2017-06-26 19:02 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Lars Schneider <larsxschneider@gmail.com> writes: > Some `clean` / `smudge` filters might require a significant amount of > time to process a single blob (e.g. the Git LFS smudge filter might > perform network requests). During this process the Git checkout > operation is blocked and Git needs to wait until the filter is done to > continue with the checkout. Good motivation. I'd say s/might/may/ to stress the fact that this is addressing a real-world problem, i.e. some filters incur network latencies. > Teach the filter process protocol (introduced in edcc858) to accept the When referring to an old commit, we recommend this format edcc8581 ("convert: add filter.<driver>.process option", 2016-10-16) (with or without dq around the title) which helps readers by telling them how old the thing is and what it was about. > @@ -512,12 +512,69 @@ the protocol then Git will stop the filter process and restart it > with the next file that needs to be processed. Depending on the > `filter.<driver>.required` flag Git will interpret that as error. > > -After the filter has processed a blob it is expected to wait for > -the next "key=value" list containing a command. Git will close > +After the filter has processed a command it is expected to wait for > +a "key=value" list containing the next command. Git will close Good generalization. > +Delay > +^^^^^ > + > +If the filter supports the "delay" capability, then Git can send the > +flag "can-delay" after the filter command and pathname. This flag > +denotes that the filter can delay filtering the current blob (e.g. to > +compensate network latencies) by responding with no content but with > +the status "delayed" and a flush packet. > +------------------------ > +packet: git> command=smudge > +packet: git> pathname=path/testfile.dat > +packet: git> can-delay=1 > +packet: git> 0000 > +packet: git> CONTENT > +packet: git> 0000 > +packet: git< status=delayed > +packet: git< 0000 > +------------------------ > + > +If the filter supports the "delay" capability then it must support the > +"list_available_blobs" command. If Git sends this command, then the > +filter is expected to return a list of pathnames of blobs that are > +available. The list must be terminated with a flush packet followed Is it correct to read the above "pathnames of blobs that are availble" as "pathnames, among which Git earlier requested to be filtered with "can-delay=1", for which the filtered result is ready"? I do not mean to suggest this awkward wording to replace yours, but I found yours a bit too fuzzy for first time readers. Perhaps my brain has rotten by hearing the "delayed/lazy transfer of large blobs", but it went "Huh?" upon seeing "...are available". > +by a "success" status that is also terminated with a flush packet. If > +no blobs for the delayed paths are available, yet, then the filter is > +expected to block the response until at least one blob becomes > +available. Ahh, this is better, at least you use "the delayed paths". What if the result never gets available (e.g. resulted in an error)? Or is it considered "we _now_ know the result is an error" and such a delayed path that failed to retrieve from LFS store "available" at that point? It may be too late to raise to a series that went to v6, but this smells more like "ready" not "available" to me. > diff --git a/builtin/checkout.c b/builtin/checkout.c > index a6b2af39d3..c1a256df8d 100644 > --- a/builtin/checkout.c > +++ b/builtin/checkout.c > @@ -376,6 +376,8 @@ static int checkout_paths(const struct checkout_opts *opts, > state.force = 1; > state.refresh_cache = 1; > state.istate = &the_index; > + > + enable_delayed_checkout(&state); > for (pos = 0; pos < active_nr; pos++) { > struct cache_entry *ce = active_cache[pos]; > if (ce->ce_flags & CE_MATCHED) { > @@ -390,6 +392,7 @@ static int checkout_paths(const struct checkout_opts *opts, > pos = skip_same_name(ce, pos) - 1; > } > } > + errs |= finish_delayed_checkout(&state); OK. > if (write_locked_index(&the_index, lock_file, COMMIT_LOCK)) > die(_("unable to write new index file")); > diff --git a/cache.h b/cache.h > index ae4c45d379..2ec12c4477 100644 > --- a/cache.h > +++ b/cache.h > @@ -1551,8 +1552,11 @@ struct checkout { > }; > #define CHECKOUT_INIT { NULL, "" } > > + > #define TEMPORARY_FILENAME_LENGTH 25 You probably do not need that new blank line. > extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath); > +extern void enable_delayed_checkout(struct checkout *state); > +extern int finish_delayed_checkout(struct checkout *state); OK. > diff --git a/convert.c b/convert.c > index e55c034d86..6452ab546a 100644 > --- a/convert.c > +++ b/convert.c > @@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) > if (err) > goto done; > > - err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL); > + err = packet_writel(process->in, > + "capability=clean", "capability=smudge", "capability=delay", NULL); > > for (;;) { > cap_buf = packet_read_line(process->out, NULL); > @@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) > entry->supported_capabilities |= CAP_CLEAN; > } else if (!strcmp(cap_name, "smudge")) { > entry->supported_capabilities |= CAP_SMUDGE; > + } else if (!strcmp(cap_name, "delay")) { > + entry->supported_capabilities |= CAP_DELAY; > } else { > warning( > "external filter '%s' requested unsupported filter capability '%s'", The above makes me wonder if we want to introduce a table, e.g. static const struct { const *name; unsigned cap; } known_caps[] = { { "clean", CAP_CLEAN }, { "smudge", CAP_SMUDGE }, { "delay", CAP_DELAY }, }; and drive everything (i.e. capability announcement in hunk +534,8 and parsing of request in hunk +551,8) off of it. That would be overkill for two capabilities, but adding another to make it three is when such a refactoring starts to become plausible. > @@ -590,9 +594,11 @@ static void handle_filter_error(const struct strbuf *filter_status, > > static int apply_multi_file_filter(const char *path, const char *src, size_t len, > int fd, struct strbuf *dst, const char *cmd, > - const unsigned int wanted_capability) > + const unsigned int wanted_capability, > + struct delayed_checkout *dco) > { > int err; > + int can_delay = 0; > struct cmd2process *entry; > struct child_process *process; > struct strbuf nbuf = STRBUF_INIT; > @@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len > if (err) > goto done; > > + if ((entry->supported_capabilities & CAP_DELAY) && > + dco && dco->state == CE_CAN_DELAY) { > + can_delay = 1; > + err = packet_write_fmt_gently(process->in, "can-delay=1\n"); > + if (err) > + goto done; > + } > + > err = packet_flush_gently(process->in); > if (err) > goto done; > @@ -662,14 +676,74 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len > if (err) > goto done; > > - err = strcmp(filter_status.buf, "success"); > + if (can_delay && !strcmp(filter_status.buf, "delayed")) { > + dco->state = CE_DELAYED; > + string_list_insert(&dco->filters, cmd); > + string_list_insert(&dco->paths, path); The dco->state CE_CAN_DELAY is global to all the paths involved in this checkout, but filter_status.buf being "delayed" is per path (i.e. the filter can be told "can-delay=1" but still respond with success right away), and in such a case dco->state will not advance from CE_CAN_DELAY to CE_DELAYED (which in turn means we will show "can-delay=1" to the next path). On the other hand, once we are responded by "delayed" after sending "can-delay=1", we will go into CE_DELAYED state and will not say "can-delay=1" to any subsequent path. This feels somewhat unsatisfactory. From the protocol exchange description in the documentation part of this patch, I was sort of expecting that Git can selectively say "You can delay response to this path" and "I am willing to wait until you give the filtered result" for each command/pathname pair. But this code structure does not seem to allow that. I would understand it if CE_CAN_DELAY and CE_DELAYED are unified, and the assignment to dco->state here is removed. You may be using these two states in other parts of the code to optimize between the case where *no* delayed path exists (even if Git is capable of delaying) and *some* delayed path exist, but I think that can be done by checking if dco->paths is empty, or something like that. Such a change will allow us later to add more logic (perhaps driven by attributes) to the decision to send "can-delay=1" in hunk +653,14 above, to express "this won't be delayed". Alternatively, the semantics of "the entire checkout exchange is allowed to be delayed" can be kept but then the protocol looks too confusing---it shouldn't have "can-delay=1" after a command/pathname pair, as if it can be specified per path. > + } else { > + /* The filter got the blob and wants to send us a response. */ > + err = strcmp(filter_status.buf, "success"); > + if (err) > + goto done; > + > + err = read_packetized_to_strbuf(process->out, &nbuf) < 0; > + if (err) > + goto done; > + > + err = subprocess_read_status(process->out, &filter_status); > + if (err) > + goto done; > + > + err = strcmp(filter_status.buf, "success"); > + } > + > +done: > + sigchain_pop(SIGPIPE); > + > + if (err) > + handle_filter_error(&filter_status, entry, wanted_capability); > + else > + strbuf_swap(dst, &nbuf); > + strbuf_release(&nbuf); > + return !err; > +} OK. > +int async_query_available_blobs(const char *cmd, struct string_list *delayed_paths) > +{ > + int err; > + char *line; > + struct cmd2process *entry; > + struct child_process *process; > + struct strbuf filter_status = STRBUF_INIT; > + > + assert(subprocess_map_initialized); > + entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd); > + if (!entry) { > + error("external filter '%s' is not available anymore although " > + "not all paths have been filtered", cmd); > + return 0; > + } > + process = &entry->subprocess.process; > + sigchain_push(SIGPIPE, SIG_IGN); > + > + err = packet_write_fmt_gently( > + process->in, "command=list_available_blobs\n"); > if (err) > goto done; > > - err = read_packetized_to_strbuf(process->out, &nbuf) < 0; > + err = packet_flush_gently(process->in); > if (err) > goto done; > > + while ((line = packet_read_line(process->out, NULL))) { > + const char *path; > + if (skip_prefix(line, "pathname=", &path)) > + string_list_insert(delayed_paths, xstrdup(path)); I am assuming that the caller passes an empty string list to delayed_paths variable (shouldn't it be called available_paths, or ready_paths if we follow the earlier discussion above, by the way?), and it will compare the resulting set with the set of paths that got "delayed" response in the apply_multi_file_filter() function earlier. I am wondering whose responsibility it will be to deal with a path "list-available" reports that are *not* asked by Git or Git got no "delayed" response. The list subtraction done by the caller is probably the logical place to do so. > diff --git a/entry.c b/entry.c > index d6b263f78e..c6d5cc01dc 100644 > --- a/entry.c > +++ b/entry.c > @@ -137,6 +137,85 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path, > ... > +static int remove_available_paths(struct string_list_item *item, void *cb_data) > +{ > + struct string_list *available_paths = cb_data; > + return !string_list_has_string(available_paths, item->string); > +} > + > +int finish_delayed_checkout(struct checkout *state) > +{ > + int errs = 0; > + struct string_list_item *filter, *path; > + struct delayed_checkout *dco = state->delayed_checkout; > + > + if (!state->delayed_checkout) > + return errs; > + > + dco->state = CE_RETRY; > + while (dco->filters.nr > 0) { > + for_each_string_list_item(filter, &dco->filters) { > + struct string_list available_paths; > + string_list_init(&available_paths, 0); STRING_LIST_INIT_NODUP? > + > + if (!async_query_available_blobs(filter->string, &available_paths)) { > + /* Filter reported an error */ > + errs = 1; > + filter->string = ""; > + continue; > + } > + if (available_paths.nr <= 0) { > + /* Filter responded with no entries. That means > + the filter is done and we can remove the > + filter from the list (see > + "string_list_remove_empty_items" call below). > + */ /* * Our multi-line comments * look like this. */ > + filter->string = ""; > + continue; > + } > + > + /* In dco->paths we store a list of all delayed paths. > + The filter just send us a list of available paths. > + Remove them from the list. > + */ > + filter_string_list(&dco->paths, 0, > + &remove_available_paths, &available_paths); We first remove from the outstanding request list (dco->paths) what are now ready... > + for_each_string_list_item(path, &available_paths) { ...and go over those paths that are now ready. > + struct cache_entry* ce = index_file_exists( > + state->istate, path->string, > + strlen(path->string), 0); > + assert(dco->state == CE_RETRY); > + errs |= (ce ? checkout_entry(ce, state, NULL) : 1); > + } But we never checked if the contents of this available_paths list is a subset of the set of paths we originally asked the external process to filter. This will allow the process to overwrite any random path that is not even involved in the checkout. > + } > + string_list_remove_empty_items(&dco->filters, 0); > + } > + string_list_clear(&dco->filters, 0); > + > + /* At this point we should not have any delayed paths anymore. */ > + errs |= dco->paths.nr; > + for_each_string_list_item(path, &dco->paths) { > + warning(_("%s was not filtered properly."), path->string); > + } > + string_list_clear(&dco->paths, 0); And "list-available" that says "path X is ready" when we never asked for X gets away free without detected as a bug, either. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol 2017-06-26 19:02 ` Junio C Hamano @ 2017-06-26 21:28 ` Lars Schneider 2017-06-26 22:13 ` Junio C Hamano 0 siblings, 1 reply; 24+ messages in thread From: Lars Schneider @ 2017-06-26 21:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, peff, tboegi, e, ttaylorr, peartben > On 26 Jun 2017, at 21:02, Junio C Hamano <gitster@pobox.com> wrote: > > Lars Schneider <larsxschneider@gmail.com> writes: > >> Some `clean` / `smudge` filters might require a significant amount of >> time to process a single blob (e.g. the Git LFS smudge filter might >> perform network requests). During this process the Git checkout >> operation is blocked and Git needs to wait until the filter is done to >> continue with the checkout. > > Good motivation. I'd say s/might/may/ to stress the fact that this > is addressing a real-world problem, i.e. some filters incur network > latencies. Agreed. >> Teach the filter process protocol (introduced in edcc858) to accept the > > When referring to an old commit, we recommend this format > > edcc8581 ("convert: add filter.<driver>.process option", 2016-10-16) > > (with or without dq around the title) which helps readers by telling > them how old the thing is and what it was about. Agreed. > ... > >> +Delay >> +^^^^^ >> + >> +If the filter supports the "delay" capability, then Git can send the >> +flag "can-delay" after the filter command and pathname. This flag >> +denotes that the filter can delay filtering the current blob (e.g. to >> +compensate network latencies) by responding with no content but with >> +the status "delayed" and a flush packet. >> +------------------------ >> +packet: git> command=smudge >> +packet: git> pathname=path/testfile.dat >> +packet: git> can-delay=1 >> +packet: git> 0000 >> +packet: git> CONTENT >> +packet: git> 0000 >> +packet: git< status=delayed >> +packet: git< 0000 >> +------------------------ >> + >> +If the filter supports the "delay" capability then it must support the >> +"list_available_blobs" command. If Git sends this command, then the >> +filter is expected to return a list of pathnames of blobs that are >> +available. The list must be terminated with a flush packet followed > > Is it correct to read the above "pathnames of blobs that are > availble" as "pathnames, among which Git earlier requested to be > filtered with "can-delay=1", for which the filtered result is > ready"? I do not mean to suggest this awkward wording to replace > yours, but I found yours a bit too fuzzy for first time readers. > > Perhaps my brain has rotten by hearing the "delayed/lazy transfer of > large blobs", but it went "Huh?" upon seeing "...are available". Maybe this? [...] If Git sends this command, then the filter is expected to return a list of pathnames representing blobs that have been delayed earlier and are now available. [...] >> +by a "success" status that is also terminated with a flush packet. If >> +no blobs for the delayed paths are available, yet, then the filter is >> +expected to block the response until at least one blob becomes >> +available. > > Ahh, this is better, at least you use "the delayed paths". > > What if the result never gets available (e.g. resulted in an error)? As soon as the filter responds with an empty list, Git stops asking. All blobs that Git has not received at this point are considered an error. Should I mention that explicitly? > Or is it considered "we _now_ know the result is an error" and such > a delayed path that failed to retrieve from LFS store "available" at > that point? No. "list_available_blobs" only returns blobs that are immediately available. Errors are not available. > It may be too late to raise to a series that went to v6, but this > smells more like "ready" not "available" to me. You mean you would call it "list_ready_blobs"? I am no native speaker but I feel "available" sounds better. I also contemplated "retrievable". I think I understand your reasoning, though. In the case of GitLFS all these blobs are "available". Only the ones that GitLFS has downloaded are ready, though. However, other filters might delay blobs for non-network related reasons and then "available" and "ready" would be the same, no? I would like to keep "available". > ... >> diff --git a/cache.h b/cache.h >> index ae4c45d379..2ec12c4477 100644 >> --- a/cache.h >> +++ b/cache.h >> @@ -1551,8 +1552,11 @@ struct checkout { >> }; >> #define CHECKOUT_INIT { NULL, "" } >> >> + >> #define TEMPORARY_FILENAME_LENGTH 25 > > You probably do not need that new blank line. Agreed! I have no idea why/how that got in. > ... >> diff --git a/convert.c b/convert.c >> index e55c034d86..6452ab546a 100644 >> --- a/convert.c >> +++ b/convert.c >> @@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) >> if (err) >> goto done; >> >> - err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL); >> + err = packet_writel(process->in, >> + "capability=clean", "capability=smudge", "capability=delay", NULL); >> >> for (;;) { >> cap_buf = packet_read_line(process->out, NULL); >> @@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess) >> entry->supported_capabilities |= CAP_CLEAN; >> } else if (!strcmp(cap_name, "smudge")) { >> entry->supported_capabilities |= CAP_SMUDGE; >> + } else if (!strcmp(cap_name, "delay")) { >> + entry->supported_capabilities |= CAP_DELAY; >> } else { >> warning( >> "external filter '%s' requested unsupported filter capability '%s'", > > The above makes me wonder if we want to introduce a table, e.g. > > static const struct { > const *name; > unsigned cap; > } known_caps[] = { > { "clean", CAP_CLEAN }, > { "smudge", CAP_SMUDGE }, > { "delay", CAP_DELAY }, > }; > > and drive everything (i.e. capability announcement in hunk +534,8 > and parsing of request in hunk +551,8) off of it. > > That would be overkill for two capabilities, but adding another to > make it three is when such a refactoring starts to become plausible. I agree with this way for many capabilities. I'll check if it makes sense for 3. > >> @@ -590,9 +594,11 @@ static void handle_filter_error(const struct strbuf *filter_status, >> >> static int apply_multi_file_filter(const char *path, const char *src, size_t len, >> int fd, struct strbuf *dst, const char *cmd, >> - const unsigned int wanted_capability) >> + const unsigned int wanted_capability, >> + struct delayed_checkout *dco) >> { >> int err; >> + int can_delay = 0; >> struct cmd2process *entry; >> struct child_process *process; >> struct strbuf nbuf = STRBUF_INIT; >> @@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len >> if (err) >> goto done; >> >> + if ((entry->supported_capabilities & CAP_DELAY) && >> + dco && dco->state == CE_CAN_DELAY) { >> + can_delay = 1; >> + err = packet_write_fmt_gently(process->in, "can-delay=1\n"); >> + if (err) >> + goto done; >> + } >> + >> err = packet_flush_gently(process->in); >> if (err) >> goto done; >> @@ -662,14 +676,74 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len >> if (err) >> goto done; >> >> - err = strcmp(filter_status.buf, "success"); >> + if (can_delay && !strcmp(filter_status.buf, "delayed")) { >> + dco->state = CE_DELAYED; >> + string_list_insert(&dco->filters, cmd); >> + string_list_insert(&dco->paths, path); > > The dco->state CE_CAN_DELAY is global to all the paths involved in > this checkout, but filter_status.buf being "delayed" is per path > (i.e. the filter can be told "can-delay=1" but still respond with > success right away), and in such a case dco->state will not advance > from CE_CAN_DELAY to CE_DELAYED (which in turn means we will show > "can-delay=1" to the next path). On the other hand, once we are > responded by "delayed" after sending "can-delay=1", we will go into > CE_DELAYED state and will not say "can-delay=1" to any subsequent > path. I am not 100% happy with design of "dco->state". You are right, I use it as "global" flag to indicate if paths can be delayed. But I *also* use it as "flag" to indicate if a certain path has been delayed. Afterwards it is reset (see entry.c:write_entry). Peff stumbled over that as well [1]. I need to change that! Maybe with an additional flag "dco->delayed". This flag is reset before any "async_convert_to_working_tree" and evaluated after. [1] "Hmm. This "reset the state" bit at the end surprised me...." http://public-inbox.org/git/20170624141941.usy2pyhid3jrf3ku@sigill.intra.peff.net/ > This feels somewhat unsatisfactory. From the protocol exchange > description in the documentation part of this patch, I was sort of > expecting that Git can selectively say "You can delay response to > this path" and "I am willing to wait until you give the filtered > result" for each command/pathname pair. But this code structure > does not seem to allow that. > > I would understand it if CE_CAN_DELAY and CE_DELAYED are unified, > and the assignment to dco->state here is removed. You may be using > these two states in other parts of the code to optimize between the > case where *no* delayed path exists (even if Git is capable of > delaying) and *some* delayed path exist, but I think that can be > done by checking if dco->paths is empty, or something like that. > > Such a change will allow us later to add more logic (perhaps driven > by attributes) to the decision to send "can-delay=1" in hunk +653,14 > above, to express "this won't be delayed". > > Alternatively, the semantics of "the entire checkout exchange is > allowed to be delayed" can be kept but then the protocol looks too > confusing---it shouldn't have "can-delay=1" after a command/pathname > pair, as if it can be specified per path. Actually a delay "per path" is possible. I'll try to make this more clear with a little refactoring. > ... > >> +int async_query_available_blobs(const char *cmd, struct string_list *delayed_paths) >> +{ >> + int err; >> + char *line; >> + struct cmd2process *entry; >> + struct child_process *process; >> + struct strbuf filter_status = STRBUF_INIT; >> + >> + assert(subprocess_map_initialized); >> + entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd); >> + if (!entry) { >> + error("external filter '%s' is not available anymore although " >> + "not all paths have been filtered", cmd); >> + return 0; >> + } >> + process = &entry->subprocess.process; >> + sigchain_push(SIGPIPE, SIG_IGN); >> + >> + err = packet_write_fmt_gently( >> + process->in, "command=list_available_blobs\n"); >> if (err) >> goto done; >> >> - err = read_packetized_to_strbuf(process->out, &nbuf) < 0; >> + err = packet_flush_gently(process->in); >> if (err) >> goto done; >> >> + while ((line = packet_read_line(process->out, NULL))) { >> + const char *path; >> + if (skip_prefix(line, "pathname=", &path)) >> + string_list_insert(delayed_paths, xstrdup(path)); > > I am assuming that the caller passes an empty string list to > delayed_paths variable (shouldn't it be called available_paths, or > ready_paths if we follow the earlier discussion above, by the way?), > and it will compare the resulting set with the set of paths that got > "delayed" response in the apply_multi_file_filter() function earlier. Correct. I also agree with the rename although it seems a bit odd to name the function "...available_blobs" and the variable "available_paths". The paths itself are not "available" anyways. They just represent the blobs that are available. Therefore it should be OK and it probably better than just "paths". > I am wondering whose responsibility it will be to deal with a path > "list-available" reports that are *not* asked by Git or Git got no > "delayed" response. The list subtraction done by the caller is > probably the logical place to do so. Correct. Git (the caller) will notice that not all "delayed" paths are listed by the filter and throw an error at the end. > >> diff --git a/entry.c b/entry.c >> index d6b263f78e..c6d5cc01dc 100644 >> --- a/entry.c >> +++ b/entry.c >> @@ -137,6 +137,85 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path, >> ... >> +static int remove_available_paths(struct string_list_item *item, void *cb_data) >> +{ >> + struct string_list *available_paths = cb_data; >> + return !string_list_has_string(available_paths, item->string); >> +} >> + >> +int finish_delayed_checkout(struct checkout *state) >> +{ >> + int errs = 0; >> + struct string_list_item *filter, *path; >> + struct delayed_checkout *dco = state->delayed_checkout; >> + >> + if (!state->delayed_checkout) >> + return errs; >> + >> + dco->state = CE_RETRY; >> + while (dco->filters.nr > 0) { >> + for_each_string_list_item(filter, &dco->filters) { >> + struct string_list available_paths; >> + string_list_init(&available_paths, 0); > > STRING_LIST_INIT_NODUP? Of course! > >> + >> + if (!async_query_available_blobs(filter->string, &available_paths)) { >> + /* Filter reported an error */ >> + errs = 1; >> + filter->string = ""; >> + continue; >> + } >> + if (available_paths.nr <= 0) { >> + /* Filter responded with no entries. That means >> + the filter is done and we can remove the >> + filter from the list (see >> + "string_list_remove_empty_items" call below). >> + */ > > /* > * Our multi-line comments > * look like this. > */ Oops. Will fix. BTW: In [1] you mentioned that you run checkpatch. Would checkpatch catch this kind of error? If yes, is your version publicly available? [1] http://public-inbox.org/git/xmqq1ste2zwu.fsf@gitster.mtv.corp.google.com/ >> + filter->string = ""; >> + continue; >> + } >> + >> + /* In dco->paths we store a list of all delayed paths. >> + The filter just send us a list of available paths. >> + Remove them from the list. >> + */ >> + filter_string_list(&dco->paths, 0, >> + &remove_available_paths, &available_paths); > > We first remove from the outstanding request list (dco->paths) what > are now ready... > >> + for_each_string_list_item(path, &available_paths) { > > ...and go over those paths that are now ready. > >> + struct cache_entry* ce = index_file_exists( >> + state->istate, path->string, >> + strlen(path->string), 0); >> + assert(dco->state == CE_RETRY); >> + errs |= (ce ? checkout_entry(ce, state, NULL) : 1); >> + } > > But we never checked if the contents of this available_paths list is > a subset of the set of paths we originally asked the external > process to filter. Correct. > This will allow the process to overwrite any > random path that is not even involved in the checkout. No, not "any random path". Only paths that are part of the checkout. There are three cases: (1) available_path is a path that was delayed before (= happy case!) (2) available_path is a path that was not delayed before, but filtered (= no problem, as filtering is a idempotent operation) (3) available_path is a path that was neither delayed nor filtered before (= if the filter returns the blob as-is then this would be no problem. otherwise we would indeed have a screwed checkout) Case 3 might introduce a problem if the filter is buggy. Would you be OK with this check to catch case 3? dco_path_count = dco->paths.nr; filter_string_list(&dco->paths, 0, &remove_available_paths, &available_paths); if (dco_path_count - dco->paths.nr != available_paths.nr) { /* The filter responded with entries that have not * been delay earlier. Do not ask the filter * for available blobs, again, as the filter is * likely buggy. This will generate an error at * the end as some files are not filtered properly. */ filter->string = ""; error(_("The external filter '%s' responded with " "available blobs which have not been delayed " "earlier."), filter->string); continue; } >> + } >> + string_list_remove_empty_items(&dco->filters, 0); >> + } >> + string_list_clear(&dco->filters, 0); >> + >> + /* At this point we should not have any delayed paths anymore. */ >> + errs |= dco->paths.nr; >> + for_each_string_list_item(path, &dco->paths) { >> + warning(_("%s was not filtered properly."), path->string); >> + } >> + string_list_clear(&dco->paths, 0); > > And "list-available" that says "path X is ready" when we never asked > for X gets away free without detected as a bug, either. With the addition above it would! Thanks for the review :-) - Lars ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol 2017-06-26 21:28 ` Lars Schneider @ 2017-06-26 22:13 ` Junio C Hamano 2017-06-26 22:38 ` Junio C Hamano 2017-06-27 12:07 ` Lars Schneider 0 siblings, 2 replies; 24+ messages in thread From: Junio C Hamano @ 2017-06-26 22:13 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Lars Schneider <larsxschneider@gmail.com> writes: > Maybe this? > [...] If Git sends this command, then the > filter is expected to return a list of pathnames representing blobs > that have been delayed earlier and are now available. [...] OK. >>> +by a "success" status that is also terminated with a flush packet. If >>> +no blobs for the delayed paths are available, yet, then the filter is >>> +expected to block the response until at least one blob becomes >>> +available. >> >> Ahh, this is better, at least you use "the delayed paths". >> >> What if the result never gets available (e.g. resulted in an error)? > > As soon as the filter responds with an empty list, Git stops asking. > All blobs that Git has not received at this point are considered an > error. > > Should I mention that explicitly? Otherwise I wouldn't have wondered "what if". >> I am wondering whose responsibility it will be to deal with a path >> "list-available" reports that are *not* asked by Git or Git got no >> "delayed" response. The list subtraction done by the caller is >> probably the logical place to do so. > > Correct. Git (the caller) will notice that not all "delayed" paths > are listed by the filter and throw an error at the end. I am wondering about the other case. Git didn't ask for a path to be filtered at all, but the filter sneaked in a path that happens to in the index in its response---Git should at least ignore it, and better yet, diagnose it as an error in the filter process. >>> + filter->string = ""; >>> + continue; >>> + } >>> + >>> + /* In dco->paths we store a list of all delayed paths. >>> + The filter just send us a list of available paths. >>> + Remove them from the list. >>> + */ >>> + filter_string_list(&dco->paths, 0, >>> + &remove_available_paths, &available_paths); >> >> We first remove from the outstanding request list (dco->paths) what >> are now ready... >> >>> + for_each_string_list_item(path, &available_paths) { >> >> ...and go over those paths that are now ready. >> >>> + struct cache_entry* ce = index_file_exists( >>> + state->istate, path->string, >>> + strlen(path->string), 0); >>> + assert(dco->state == CE_RETRY); >>> + errs |= (ce ? checkout_entry(ce, state, NULL) : 1); >>> + } >> >> But we never checked if the contents of this available_paths list is >> a subset of the set of paths we originally asked the external >> process to filter. > > Correct. > >> This will allow the process to overwrite any >> random path that is not even involved in the checkout. > > No, not "any random path". Only paths that are part of the checkout. Isn't it "any path that index_file_exists() returns a CE for". Did you filter out elements in available_paths that weren't part of dco->paths? I thought the filter-string-list you have are for the other way around (which is necessary to keep track of work to be done, but that filtering does not help rejecting rogue responses at all). > There are three cases: > > (1) available_path is a path that was delayed before (= happy case!) > (2) available_path is a path that was not delayed before, > but filtered (= no problem, as filtering is a idempotent operation) > (3) available_path is a path that was neither delayed nor filtered > before (= if the filter returns the blob as-is then this would > be no problem. otherwise we would indeed have a screwed checkout) > > Case 3 might introduce a problem if the filter is buggy. > Would you be OK with this check to catch case 3? I'd be very suspicious about anything you would do only with .nr field, without filtering the other way around. After all, we may have asked it for 3 paths to be filtered, and it may have answered with its own 3 different paths. > dco_path_count = dco->paths.nr; > filter_string_list(&dco->paths, 0, > &remove_available_paths, &available_paths); > > if (dco_path_count - dco->paths.nr != available_paths.nr) { > /* The filter responded with entries that have not > * been delay earlier. Do not ask the filter > * for available blobs, again, as the filter is > * likely buggy. This will generate an error at > * the end as some files are not filtered properly. > */ > filter->string = ""; > error(_("The external filter '%s' responded with " > "available blobs which have not been delayed " > "earlier."), filter->string); > continue; > } > > >>> + } >>> + string_list_remove_empty_items(&dco->filters, 0); >>> + } >>> + string_list_clear(&dco->filters, 0); >>> + >>> + /* At this point we should not have any delayed paths anymore. */ >>> + errs |= dco->paths.nr; >>> + for_each_string_list_item(path, &dco->paths) { >>> + warning(_("%s was not filtered properly."), path->string); >>> + } >>> + string_list_clear(&dco->paths, 0); >> >> And "list-available" that says "path X is ready" when we never asked >> for X gets away free without detected as a bug, either. > > With the addition above it would! > > > Thanks for the review :-) > > - Lars ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol 2017-06-26 22:13 ` Junio C Hamano @ 2017-06-26 22:38 ` Junio C Hamano 2017-06-27 12:07 ` Lars Schneider 1 sibling, 0 replies; 24+ messages in thread From: Junio C Hamano @ 2017-06-26 22:38 UTC (permalink / raw) To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben Junio C Hamano <gitster@pobox.com> writes: >>>> + filter->string = ""; >>>> + continue; >>>> + } >>>> + >>>> + /* In dco->paths we store a list of all delayed paths. >>>> + The filter just send us a list of available paths. >>>> + Remove them from the list. >>>> + */ >>>> + filter_string_list(&dco->paths, 0, >>>> + &remove_available_paths, &available_paths); >>> >>> We first remove from the outstanding request list (dco->paths) what >>> are now ready... >>> >>>> + for_each_string_list_item(path, &available_paths) { >>> >>> ...and go over those paths that are now ready. >>> >>>> + struct cache_entry* ce = index_file_exists( >>>> + state->istate, path->string, >>>> + strlen(path->string), 0); >>>> + assert(dco->state == CE_RETRY); >>>> + errs |= (ce ? checkout_entry(ce, state, NULL) : 1); >>>> + } >>> >>> But we never checked if the contents of this available_paths list is >>> a subset of the set of paths we originally asked the external >>> process to filter. >> >> Correct. >> >>> This will allow the process to overwrite any >>> random path that is not even involved in the checkout. >> >> No, not "any random path". Only paths that are part of the checkout. > > Isn't it "any path that index_file_exists() returns a CE for". Did > you filter out elements in available_paths that weren't part of > dco->paths? I thought the filter-string-list you have are for the > other way around (which is necessary to keep track of work to be > done, but that filtering does not help rejecting rogue responses at > all). That is, something along the lines of this on top of the series. When going over the list of delayed paths to see if any of them is answered, in remove_available_paths(), we mark the util field of the answered one. Later when we go over the list of answered one, we make sure that it was actually matched with something on dco->paths before calling checkout_entry(). entry.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/entry.c b/entry.c index c6d5cc01dc..f2af67e015 100644 --- a/entry.c +++ b/entry.c @@ -150,7 +150,12 @@ void enable_delayed_checkout(struct checkout *state) static int remove_available_paths(struct string_list_item *item, void *cb_data) { struct string_list *available_paths = cb_data; - return !string_list_has_string(available_paths, item->string); + struct string_list_item *available; + + available = string_list_lookup(available_paths, item->string); + if (available) + avaiable->util = (void *)item->string; + return !available; } int finish_delayed_checkout(struct checkout *state) @@ -192,9 +197,16 @@ int finish_delayed_checkout(struct checkout *state) &remove_available_paths, &available_paths); for_each_string_list_item(path, &available_paths) { - struct cache_entry* ce = index_file_exists( - state->istate, path->string, - strlen(path->string), 0); + struct cache_entry* ce; + + if (!path->util) { + error("filter gave '%s' which was unasked for", + path->string); + errs |= 1; + continue; + } + ce = index_file_exists(state->istate, path->string, + strlen(path->string), 0); assert(dco->state == CE_RETRY); errs |= (ce ? checkout_entry(ce, state, NULL) : 1); } ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol 2017-06-26 22:13 ` Junio C Hamano 2017-06-26 22:38 ` Junio C Hamano @ 2017-06-27 12:07 ` Lars Schneider 1 sibling, 0 replies; 24+ messages in thread From: Lars Schneider @ 2017-06-27 12:07 UTC (permalink / raw) To: Junio C Hamano Cc: Git Users, Jeff King, Torsten Bögershausen, Eric Wong, Taylor Blau, peartben > On 27 Jun 2017, at 00:13, Junio C Hamano <gitster@pobox.com> wrote: > > Lars Schneider <larsxschneider@gmail.com> writes: > >> ... > >>> I am wondering whose responsibility it will be to deal with a path >>> "list-available" reports that are *not* asked by Git or Git got no >>> "delayed" response. The list subtraction done by the caller is >>> probably the logical place to do so. >> >> Correct. Git (the caller) will notice that not all "delayed" paths >> are listed by the filter and throw an error at the end. > > I am wondering about the other case. Git didn't ask for a path to > be filtered at all, but the filter sneaked in a path that happens to > in the index in its response---Git should at least ignore it, and > better yet, diagnose it as an error in the filter process. Agreed. I've used your implementation suggestion [1] and added two test cases to ensure we signal a proper error in case of a buggy filter. Thanks, Lars [1] http://public-inbox.org/git/xmqqzicu2x4a.fsf@gitster.mtv.corp.google.com/ ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2017-06-27 12:07 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-25 18:21 [PATCH v6 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider 2017-06-25 18:21 ` [PATCH v6 1/6] t0021: keep filter log files on comparison Lars Schneider 2017-06-25 22:12 ` Junio C Hamano 2017-06-26 9:02 ` Lars Schneider 2017-06-26 17:31 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 2/6] t0021: make debug log file name configurable Lars Schneider 2017-06-25 22:15 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 3/6] t0021: write "OUT" only on success Lars Schneider 2017-06-25 22:17 ` Junio C Hamano 2017-06-26 9:26 ` Lars Schneider 2017-06-26 17:50 ` Junio C Hamano 2017-06-26 18:32 ` Lars Schneider 2017-06-25 18:21 ` [PATCH v6 4/6] convert: put the flags field before the flag itself for consistent style Lars Schneider 2017-06-25 22:19 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 5/6] convert: move multiple file filter error handling to separate function Lars Schneider 2017-06-26 17:56 ` Junio C Hamano 2017-06-27 2:51 ` Stefan Beller 2017-06-26 18:00 ` Junio C Hamano 2017-06-25 18:21 ` [PATCH v6 6/6] convert: add "status=delayed" to filter process protocol Lars Schneider 2017-06-26 19:02 ` Junio C Hamano 2017-06-26 21:28 ` Lars Schneider 2017-06-26 22:13 ` Junio C Hamano 2017-06-26 22:38 ` Junio C Hamano 2017-06-27 12:07 ` Lars Schneider
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).