From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>,
"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend
Date: Mon, 12 Apr 2021 19:15:07 +0200 [thread overview]
Message-ID: <cover-00.22-00000000000-20210412T170457Z-avarab@gmail.com> (raw)
In-Reply-To: <20210216115801.4773-1-avarab@gmail.com>
This much-delayed re-roll of v2[2] is a test and refactoring change to
diffcore-pickaxe.c to allow an eventual removal of the kwset.[ch]
files and to add a PCRE backend.
This series is now based on my "test-lib.sh: new test_commit args,
simplification & fixes" series. The trivial dependency between the two
is using the new test_commit --printf option.
I'll not summarize the range-diff too much, but since v2 I addressed
all outstanding feedback from Junio. There's 2x new patches at the end
of the series to change the existing early return feature now using
"discard_hunk_line" to use a flag.
In v2 I had what I realized was a WIP migration of that to using
return values in the callback instead, but unlike the
xdiff_emit_line_fn I don't think that makes sense in
xdiff_emit_hunk_fn.
1. https://lore.kernel.org/git/20210216115801.4773-1-avarab@gmail.com/
2. https://lore.kernel.org/git/cover-00.16-00000000000-20210412T110456Z-avarab@gmail.com/
Ævar Arnfjörð Bjarmason (22):
grep/pcre2 tests: reword comments referring to kwset
pickaxe tests: refactor to use test_commit --append --printf
pickaxe tests: add test for diffgrep_consume() internals
pickaxe tests: add test for "log -S" not being a regex
pickaxe tests: test for -G, -S and --find-object incompatibility
pickaxe tests: add missing test for --no-pickaxe-regex being an error
pickaxe: die when -G and --pickaxe-regex are combined
pickaxe: die when --find-object and --pickaxe-all are combined
diff.h: move pickaxe fields together again
pickaxe/style: consolidate declarations and assignments
perf: add performance test for pickaxe
pickaxe: refactor function selection in diffcore-pickaxe()
pickaxe: assert that we must have a needle under -G or -S
pickaxe -S: support content with NULs under --pickaxe-regex
pickaxe: rename variables in has_changes() for brevity
pickaxe -S: slightly optimize contains()
xdiff-interface: prepare for allowing early return
xdiff-interface: allow early return from xdiff_emit_line_fn
pickaxe -G: terminate early on matching lines
pickaxe -G: don't special-case create/delete
xdiff users: use designated initializers for out_line
xdiff-interface: replace discard_hunk_line() with a flag
builtin/merge-tree.c | 5 +-
builtin/rerere.c | 4 +-
combine-diff.c | 5 +-
diff.c | 39 +++++++----
diff.h | 7 +-
diffcore-pickaxe.c | 106 ++++++++++++++++++------------
range-diff.c | 3 +-
t/perf/p4209-pickaxe.sh | 70 ++++++++++++++++++++
t/t4209-log-pickaxe.sh | 114 ++++++++++++++++++++++++++++-----
t/t7816-grep-binary-pattern.sh | 4 +-
xdiff-interface.c | 27 ++++----
xdiff-interface.h | 31 ++++++---
xdiff/xdiff.h | 1 +
xdiff/xemit.c | 3 +-
14 files changed, 312 insertions(+), 107 deletions(-)
create mode 100755 t/perf/p4209-pickaxe.sh
Range-diff against v2:
1: 75bfc8eba13 = 1: cfe934d6081 grep/pcre2 tests: reword comments referring to kwset
2: cddb1f75f6c < -: ----------- test-lib-functions: document and test test_commit --no-tag
3: 44540f6039e < -: ----------- test-lib-functions: reword "test_commit --append" docs
4: 0e1f133476f < -: ----------- test-lib functions: add --printf option to test_commit
5: 2a814e8d53a ! 2: 413a330d3d6 pickaxe tests: refactor to use test_commit --append --printf
@@ Metadata
## Commit message ##
pickaxe tests: refactor to use test_commit --append --printf
- Refactor existing tests added in e0e7cb8080c (log -G: ignore binary
- files, 2018-12-14) to use the --append option I added in
+ Refactor the existing tests added in e0e7cb8080c (log -G: ignore
+ binary files, 2018-12-14) to use the --append option I added in
3373518cc8b (test-lib functions: add an --append option to
- test_commit, 2021-01-12) and the --printf option added in a preceding
- commit.
+ test_commit, 2021-01-12) and the --printf option added as part of an
+ in-flight topic of mine this commit depends on.
+
+ While I'm at it change some of the setup of the test to use a more
+ sensible pattern, e.g. setting up a temporary repo instead of creating
+ an orphan branch.
+
+ Since the -G and -S options will behave the same way with truncated
+ and removed content also change the "git rm" to emptying data.bin,
+ that's just catering to how test_commit works. The resulting test is
+ shorter.
See also f5d79bf7dd6 (tests: refactor a few tests to use "test_commit
--append", 2021-01-12) for prior similar refactoring.
@@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S --no-textconv (missing textc
- git commit -m "delete binary file" data.bin &&
- git log >full-log
+ test_create_repo GS-bin-txt &&
-+ test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&
++ test_commit -C GS-bin-txt --printf A data.bin "a\na\0a\n" &&
+ test_commit -C GS-bin-txt --append --printf B data.bin "a\na\0a\n" &&
+ test_commit -C GS-bin-txt C data.bin "" &&
+ git -C GS-bin-txt log >full-log
6: f49eb6c95e5 ! 3: ddd2224836b pickaxe tests: add test for diffgrep_consume() internals
@@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S --no-textconv (missing textc
+
test_expect_success 'setup log -[GS] binary & --text' '
test_create_repo GS-bin-txt &&
- test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&
+ test_commit -C GS-bin-txt --printf A data.bin "a\na\0a\n" &&
7: 80c62fb0448 = 4: ca6340c1fa7 pickaxe tests: add test for "log -S" not being a regex
8: 6d329e0c3b1 ! 5: 0c4657189a8 pickaxe tests: test for -G, -S and --find-object incompatibility
@@ t/t4209-log-pickaxe.sh: test_expect_success setup '
+test_expect_success 'usage' '
+ test_expect_code 128 git log -Gregex -Sstring 2>err &&
-+ test_i18ngrep "mutually exclusive" err &&
++ grep "mutually exclusive" err &&
+
+ test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
-+ test_i18ngrep "mutually exclusive" err &&
++ grep "mutually exclusive" err &&
+
-+ test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
-+ test_i18ngrep "mutually exclusive" err
++ test_expect_code 128 git log -Sstring --find-object=HEAD 2>err &&
++ grep "mutually exclusive" err
+'
+
test_log expect_initial --grep initial
-: ----------- > 6: 1696076bb09 pickaxe tests: add missing test for --no-pickaxe-regex being an error
9: bd0c3b7e3b0 ! 7: 83e7b4793b6 pickaxe: die when -G and --pickaxe-regex are combined
@@ Commit message
matching regular expressions, 2006-03-29) only the -S option
existed. Then when -G was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23) neither the
- documentation for --pickaxe-regex was updater accordingly, nor was
+ documentation for --pickaxe-regex was updated accordingly, nor was
something like this assertion added.
Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
2013-05-31) we've claimed that --pickaxe-regex should only be used
- with -S, but have silently toileted combining it with -G, let's die
+ with -S, but have silently tolerated combining it with -G, let's die
instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
@@ diff.h: int git_config_rename(const char *var, const char *value);
## t/t4209-log-pickaxe.sh ##
@@ t/t4209-log-pickaxe.sh: test_expect_success 'usage' '
- test_expect_code 128 git log -Gregex -Sstring 2>err &&
- test_i18ngrep "mutually exclusive" err &&
+ grep "mutually exclusive" err
+ '
++test_expect_success 'usage: --pickaxe-regex' '
+ test_expect_code 128 git log -Gregex --pickaxe-regex 2>err &&
-+ test_i18ngrep "mutually exclusive" err &&
++ grep "mutually exclusive" err
++'
+
- test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
- test_i18ngrep "mutually exclusive" err &&
-
+ test_expect_success 'usage: --no-pickaxe-regex' '
+ cat >expect <<-\EOF &&
+ fatal: unrecognized argument: --no-pickaxe-regex
10: 72075874f5c ! 8: 749c3ca3f98 pickaxe: die when --find-object and --pickaxe-all are combined
@@ diff.h: int git_config_rename(const char *var, const char *value);
## t/t4209-log-pickaxe.sh ##
@@ t/t4209-log-pickaxe.sh: test_expect_success 'usage' '
- test_i18ngrep "mutually exclusive" err &&
+ grep "mutually exclusive" err &&
- test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
-+ test_i18ngrep "mutually exclusive" err &&
+ test_expect_code 128 git log -Sstring --find-object=HEAD 2>err &&
++ grep "mutually exclusive" err &&
+
+ test_expect_code 128 git log --pickaxe-all --find-object=HEAD 2>err &&
- test_i18ngrep "mutually exclusive" err
+ grep "mutually exclusive" err
'
11: f8116a2b814 = 9: fe4e75c39d2 diff.h: move pickaxe fields together again
12: 4778357cfc7 = 10: afe70b163a2 pickaxe/style: consolidate declarations and assignments
13: 7449a59b104 = 11: 97616d741c7 perf: add performance test for pickaxe
14: 160f7d8b0f2 ! 12: c29deb428b1 pickaxe: refactor function selection in diffcore-pickaxe()
@@ diffcore-pickaxe.c: void diffcore_pickaxe(struct diff_options *o)
regcomp_or_die(®ex, needle, cflags);
regexp = ®ex;
+
-+ /* diff.c errors on -G and --pickaxe-regex for us */
+ if (opts & DIFF_PICKAXE_KIND_G)
+ fn = diff_grep;
+ else if (opts & DIFF_PICKAXE_REGEX)
+ fn = has_changes;
+ else
++ /*
++ * We don't need to check the combination of
++ * -G and --pickaxe-regex, by the time we get
++ * here diff.c has already died if they're
++ * combined. See the usage tests in
++ * t4209-log-pickaxe.sh.
++ */
+ BUG("unreachable");
} else if (opts & DIFF_PICKAXE_KIND_S) {
if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&
15: 05fc57e54e1 ! 13: 115a369d067 pickaxe: assert that we must have a needle under -G or -S
@@ diffcore-pickaxe.c: void diffcore_pickaxe(struct diff_options *o)
kwset_t kws = NULL;
pickaxe_fn fn;
-+ if (opts & ~DIFF_PICKAXE_KIND_OBJFIND && !needle)
++ if (opts & ~DIFF_PICKAXE_KIND_OBJFIND &&
++ (!needle || !*needle))
+ BUG("should have needle under -G or -S");
if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
int cflags = REG_EXTENDED | REG_NEWLINE;
@@ t/t4209-log-pickaxe.sh: test_expect_success setup '
+ test_i18ngrep "switch.*requires a value" err &&
+
test_expect_code 128 git log -Gregex -Sstring 2>err &&
- test_i18ngrep "mutually exclusive" err &&
+ grep "mutually exclusive" err &&
16: 550620ec13b ! 14: a86032792b6 pickaxe -S: support content with NULs under --pickaxe-regex
@@ Commit message
for pickaxe matching regular expressions, 2006-03-29).
It wasn't needed anymore, and as the now-passing test shows, actively
- getting in our way.
+ getting in our way. Since we always require REG_STARTEND support we do
+ not need to stop at NULs. If we are dealing with a haystack with NUL
+ in it. The needle may be behind that NUL.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
@@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S looks into binary files' '
+ git -C GS-bin-txt log --pickaxe-regex -Sa >log &&
+ test_cmp log full-log &&
+
-+ git -C GS-bin-txt log --pickaxe-regex -S[a] >log &&
++ git -C GS-bin-txt log --pickaxe-regex -S"[a]" >log &&
+ test_cmp log full-log
+'
+
17: 985e077d561 = 15: 10f85edcff7 pickaxe: rename variables in has_changes() for brevity
18: 648e6e5f11b ! 16: ed83c3add89 pickaxe -S: slightly optimize contains()
@@ Commit message
pre-image and post-image of a change. As soon as we know we had e.g. 1
before and 2 now we can stop, we don't need to keep counting past 2.
+ With this change a diff between A and B may have different performance
+ characteristics than between B and A. That's OK in this case, since
+ we'll emit the same output, and the effect is to make one of them
+ better.
+
+ I'm picking a check of "one" first on the assumption that it's a more
+ common case to have files grow over time than not.
+
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## diffcore-pickaxe.c ##
19: b991660fbf5 ! 17: 62c306275c7 xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
+ xdiff-interface: prepare for allowing early return
- Change the function prototype of xdiff_emit_{line,hunk}_fn to return
- an int instead of void. This will allow for returning early from hunk
- & diff consumers that want some of the data, but not all of it.
+ Change the function prototype of xdiff_emit_line_fn to return an "int"
+ instead of "void". Change all of those functions to "return 0",
+ nothing checks those return values yet, and no behavior is being
+ changed.
- No behavior is being changed here, just replacing the equivalent of
- "return" with "return 0", nothing acts on the changed return values
- yet.
-
- There was some work in this area of xdiff-interface.[ch] recently with
- 3b40a090fd4 (diff: avoid generating unused hunk header lines,
- 2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
- 2018-11-02).
-
- In combination those two changes allow us to not do any work on the
- hunks and diff at all, but didn't change the status quo with regards
- to consumers that e.g. want the diff lines, but might want to abort
- early.
-
- Whereas soon we can abort e.g. on the first "-line" of a 1000 line
- diff if that's all we needed.
-
- This interface is rather scary as noted in the comment to
- xdiff-interface.h being added here, but it will be useful for
- diffcore-pickaxe.c in a subsequent commit. A future change could
- e.g. add more exit codes, and hack xdl_emit_diff() and friends to
- ignore or skip things more selectively as a result.
+ In subsequent commits the interface will be changed to allow early
+ return via this new return value.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## combine-diff.c ##
-@@ combine-diff.c: struct combine_diff_state {
- struct sline *lost_bucket;
- };
-
--static void consume_hunk(void *state_,
-+static int consume_hunk(void *state_,
- long ob, long on,
- long nb, long nn,
- const char *funcline, long funclen)
@@ combine-diff.c: static void consume_hunk(void *state_,
- state->sline[state->nb-1].p_lno =
- xcalloc(state->num_parent, sizeof(unsigned long));
state->sline[state->nb-1].p_lno[state->n] = state->ob;
-+
-+ return 0;
}
-static void consume_line(void *state_, char *line, unsigned long len)
@@ combine-diff.c: static void consume_line(void *state_, char *line, unsigned long
static void combine_diff(struct repository *r,
## diff.c ##
-@@ diff.c: static int color_words_output_graph_prefix(struct diff_words_data *diff_words)
- }
- }
-
--static void fn_out_diff_words_aux(void *priv,
-- long minus_first, long minus_len,
-- long plus_first, long plus_len,
-- const char *func, long funclen)
-+static int fn_out_diff_words_aux(void *priv,
-+ long minus_first, long minus_len,
-+ long plus_first, long plus_len,
-+ const char *func, long funclen)
- {
- struct diff_words_data *diff_words = priv;
- struct diff_words_style *style = diff_words->style;
-@@ diff.c: static void fn_out_diff_words_aux(void *priv,
-
- diff_words->current_plus = plus_end;
- diff_words->last_minus = minus_first;
-+
-+ return 0;
- }
-
- /* This function starts looking at *begin, and returns 0 iff a word was found. */
@@ diff.c: static void find_lno(const char *line, struct emit_callback *ecbdata)
ecbdata->lno_in_postimage = strtol(p + 1, NULL, 10);
}
@@ diff.c: static void diffstat_consume(void *priv, char *line, unsigned long len)
}
const char mime_boundary_leader[] = "------------";
-@@ diff.c: static int is_conflict_marker(const char *line, int marker_size, unsigned long l
- return 1;
- }
-
--static void checkdiff_consume_hunk(void *priv,
-+static int checkdiff_consume_hunk(void *priv,
- long ob, long on, long nb, long nn,
- const char *func, long funclen)
-
- {
- struct checkdiff_t *data = priv;
+@@ diff.c: static void checkdiff_consume_hunk(void *priv,
data->lineno = nb - 1;
-+ return 0;
}
-static void checkdiff_consume(void *priv, char *line, unsigned long len)
@@ range-diff.c: static void find_exact_matches(struct string_list *a, struct strin
+ return 0;
}
--static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
-- const char *funcline, long funclen)
-+static int diffsize_hunk(void *data, long ob, long on, long nb, long nn,
-+ const char *funcline, long funclen)
- {
- diffsize_consume(data, NULL, 0);
-+ return 0;
- }
-
- static int diffsize(const char *a, const char *b)
+ static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
## xdiff-interface.c ##
@@ xdiff-interface.c: static int xdiff_out_hunk(void *priv_,
@@ xdiff-interface.c: static void consume_one(void *priv_, char *s, unsigned long s
}
static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
-@@ xdiff-interface.c: int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
- return xdl_diff(&a, &b, xpp, xecfg, xecb);
- }
-
--void discard_hunk_line(void *priv,
-- long ob, long on, long nb, long nn,
-- const char *func, long funclen)
-+int discard_hunk_line(void *priv,
-+ long ob, long on, long nb, long nn,
-+ const char *func, long funclen)
- {
-+ return 0;
- }
-
- int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
## xdiff-interface.h ##
@@
@@ xdiff-interface.h
#define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
-typedef void (*xdiff_emit_line_fn)(void *, char *, unsigned long);
--typedef void (*xdiff_emit_hunk_fn)(void *data,
-- long old_begin, long old_nr,
-- long new_begin, long new_nr,
-- const char *func, long funclen);
-+/*
-+ * The xdiff_emit_{line,hunk}_fn consumers can return -1 to abort
-+ * early, or 0 to continue processing. Note that doing so is an
-+ * all-or-nothing affair, as returning -1 will return all the way to
-+ * the top-level, e.g. the xdi_diff_outf() call to generate the diff.
-+ *
-+ * Thus returning -1 from a hunk header callback means you won't be
-+ * getting any more hunks, or diffs, and likewise returning from a
-+ * line callback means you won't be getting anymore lines.
-+ *
-+ * We may extend the interface in the future to understand other more
-+ * granular return values, but for now use it carefully, or consider
-+ * e.g. using discard_hunk_line() if you say just don't care about
-+ * hunk headers.
-+ */
+typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
-+typedef int (*xdiff_emit_hunk_fn)(void *data,
-+ long old_begin, long old_nr,
-+ long new_begin, long new_nr,
-+ const char *func, long funclen);
-
- int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
- int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
-@@ xdiff-interface.h: extern int git_xmerge_style;
- * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
- * one just sends the hunk line to the line_fn callback).
- */
--void discard_hunk_line(void *priv,
-- long ob, long on, long nb, long nn,
-- const char *func, long funclen);
-+int discard_hunk_line(void *priv,
-+ long ob, long on, long nb, long nn,
-+ const char *func, long funclen);
-
- /*
- * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
+ typedef void (*xdiff_emit_hunk_fn)(void *data,
+ long old_begin, long old_nr,
+ long new_begin, long new_nr,
20: 69b061832b3 ! 18: 76d667f152f xdiff-interface: support early exit in xdiff_outf()
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- xdiff-interface: support early exit in xdiff_outf()
+ xdiff-interface: allow early return from xdiff_emit_line_fn
- Bridge the gap between the preceding "xdiff-interface: allow early
- return from xdiff_emit_{line,hunk}_fn" change and the public
- interface.
+ Finish the change started in the preceding commit and allow an early
+ return from "xdiff_emit_line_fn" callbacks, this will allows
+ diffcore-pickaxe.c to save itself redundant work.
- This change was split off from the preceding commit as it wasn't a
- purely mechanical addition of "return 0".
+ Our xdiff interface also had the limitation of not being able to abort
+ early since the beginning, see d9ea73e0564 (combine-diff: refactor
+ built-in xdiff interface., 2006-04-05). Although at that time
+ "xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
+ "xdiff_emit_hunk_fn" didn't exist yet.
- Here we want to be able to abort early, but do so in a way that
- doesn't skip the appropriate strbuf_reset() invocations.
+ There was some work in this area of xdiff-interface.[ch] recently with
+ 3b40a090fd4 (diff: avoid generating unused hunk header lines,
+ 2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
+ 2018-11-02).
+
+ In combination those two changes allow us to not do any work on the
+ hunks and diff at all, but didn't change the status quo with regards
+ to consumers that e.g. want the diff lines, but might want to abort
+ early.
+
+ Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
+ if that's all we needed.
+
+ This interface is rather scary as noted in the comment to
+ xdiff-interface.h being added here, as noted there a future change
+ could add more exit codes, and hack xdl_emit_diff() and friends to
+ ignore or skip things more selectively as a result.
+
+ I did not see an inherent reason for why xdl_emit_{diffrec,record}()
+ could not be changed to ferry the "xdiff_emit_line_fn" error code
+ upwards instead of returning -1 on all "ret < 0".
- The use of -1 as a return value is consistent with the rest of the
- xdiff codebase, where doing so signals an abort or error that'll
- propagate up the stack.
+ But doing so would require corresponding changes in xdl_emit_diff(),
+ xdl_diff(). I didn't see any issue with narrowly doing that to
+ accomplish what I needed here, but it would leave xdiff's own return
+ values in an inconsistent state.
+
+ Instead I've left it at returning a more conventional (for git's own
+ codebase) 1 for an early return, and translating it (or rather, all
+ non-zero) to -1 for xdiff's consumption.
+
+ The reason for most of the "stop" complexity in xdiff_outf() is
+ because we want to be able to abort early, but do so in a way that
+ doesn't skip the appropriate strbuf_reset() invocations.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## xdiff-interface.c ##
@@ xdiff-interface.c: static int consume_one(void *priv_, char *s, unsigned long size)
+ char *ep;
+ while (size) {
unsigned long this_size;
++ int ret;
ep = memchr(s, '\n', size);
this_size = (ep == NULL) ? size : (ep - s + 1);
- priv->line_fn(priv->consume_callback_data, s, this_size);
-+ if (priv->line_fn(priv->consume_callback_data, s, this_size))
-+ return -1;
++ ret = priv->line_fn(priv->consume_callback_data, s, this_size);
++ if (ret)
++ return ret;
size -= this_size;
s += this_size;
}
@@ xdiff-interface.c: static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
for (i = 0; i < nbuf; i++) {
+ if (stop)
-+ return -1;
++ return 1;
if (mb[i].ptr[mb[i].size-1] != '\n') {
/* Incomplete line */
strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
@@ xdiff-interface.c: static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
return 0;
}
+
+ ## xdiff-interface.h ##
+@@
+ */
+ #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
+
++/**
++ * The `xdiff_emit_line_fn` function can return 1 to abort early, or 0
++ * to continue processing. Note that doing so is an all-or-nothing
++ * affair, as returning 1 will return all the way to the top-level,
++ * e.g. the xdi_diff_outf() call to generate the diff.
++ *
++ * Thus returning 1 means you won't be getting any more diff lines. If
++ * you need something in-between those two options you'll to use
++ * `xdl_emit_hunk_consume_func_t` and implement your own version of
++ * xdl_emit_diff().
++ *
++ * We may extend the interface in the future to understand other more
++ * granular return values. While you should return 1 to exit early,
++ * doing so will currently make your early return indistinguishable
++ * from an error internal to xdiff, xdiff itself will see that
++ * non-zero return and translate it to -1.
++ */
+ typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
+ typedef void (*xdiff_emit_hunk_fn)(void *data,
+ long old_begin, long old_nr,
21: fc0aa61d093 ! 19: 53e9405f849 pickaxe -G: terminate early on matching lines
@@ Commit message
Solve a long-standing item for "git log -Grx" of us e.g. finding "+
str" in the diff context and noting that we had a "hit", but xdiff
- diligently continuing to generate and spew the rest of the diff at us.
+ diligently continuing to generate and spew the rest of the diff at
+ us. This makes use of a new "early return" xdiff interface added by
+ preceding commits.
- The TODO item has been there since "git log -G" was implemented. See
- f506b8e8b5f (git log/diff: add -G<regexp> that greps in the patch
- text, 2010-08-23).
+ The TODO item (or, the NEEDSWORK comment) has been there since "git
+ log -G" was implemented. See f506b8e8b5f (git log/diff: add -G<regexp>
+ that greps in the patch text, 2010-08-23).
- Our xdiff interface also had the limitation of not being able to abort
- early since the beginning, see d9ea73e0564 (combine-diff: refactor
- built-in xdiff interface., 2006-04-05). Although at that time
- "xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
- "xdiff_emit_hunk_fn" didn't exist yet.
-
- But now with the support added in the preceding ""xdiff-interface:
- allow early return from xdiff_emit_{line,hunk}_fn" commit we can
- return early, and furthermore test the functionality of the new
- early-exit xdiff-interface by having a BUG() call here to die if it
- ever starts handing us needless work again.
+ But now with the support added in the preceding changes to the
+ xdiff-interface we can return early. Let's assert the behavior of that
+ new early-return xdiff-interface by having a BUG() call here to die if
+ it ever starts handing us needless work again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
@@ diffcore-pickaxe.c: static int diffgrep_consume(void *priv, char *line, unsigned
+ if (!regexec_buf(data->regexp, line + 1, len - 1, 1,
+ ®match, 0)) {
+ data->hit = 1;
-+ return -1;
++ return 1;
+ }
return 0;
}
@@ diffcore-pickaxe.c: static int diff_grep(mmfile_t *one, mmfile_t *two,
+
+ /*
+ * An xdiff error might be our "data->hit" from above. See the
-+ * comment for xdiff_emit_{line,hunk}_fn in xdiff-interface.h
-+ * for why.
++ * comment for xdiff_emit_line_fn in xdiff-interface.h
+ */
+ ret = xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
+ &ecbdata, &xpp, &xecfg);
@@ diffcore-pickaxe.c: static int diff_grep(mmfile_t *one, mmfile_t *two,
## xdiff-interface.h ##
@@
- * granular return values, but for now use it carefully, or consider
- * e.g. using discard_hunk_line() if you say just don't care about
- * hunk headers.
+ * doing so will currently make your early return indistinguishable
+ * from an error internal to xdiff, xdiff itself will see that
+ * non-zero return and translate it to -1.
+ *
-+ * Note that just returning -1 will make your early return
-+ * indistinguishable from an error internal to xdiff. See "diff_grep"
-+ * in diffcore-pickaxe.c for a trick to work around this, i.e. using
-+ * the "consume_callback_data" to note the desired early return.
++ * See "diff_grep" in diffcore-pickaxe.c for a trick to work around
++ * this, i.e. using the "consume_callback_data" to note the desired
++ * early return.
*/
typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
- typedef int (*xdiff_emit_hunk_fn)(void *data,
+ typedef void (*xdiff_emit_hunk_fn)(void *data,
22: c81b18ca4c7 = 20: 76de6ebc8b8 pickaxe -G: don't special-case create/delete
-: ----------- > 21: 9bb7ac910f3 xdiff users: use designated initializers for out_line
-: ----------- > 22: 1178956fb3d xdiff-interface: replace discard_hunk_line() with a flag
--
2.31.1.639.g3d04783866f
next prev parent reply other threads:[~2021-04-12 17:15 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-03 3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 01/25] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 02/25] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 03/25] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 04/25] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 05/25] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 06/25] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 07/25] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 08/25] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 09/25] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 10/25] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 11/25] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 12/25] pickaxe tests: refactor to use test_commit --append Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 13/25] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop Ævar Arnfjörð Bjarmason
2021-02-04 16:16 ` René Scharfe
2021-02-04 17:56 ` Junio C Hamano
2021-02-04 21:13 ` Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 15/25] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 16/25] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 17/25] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 18/25] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 19/25] pickaxe -G: set -U0 for diff generation Ævar Arnfjörð Bjarmason
2021-02-03 14:26 ` Ævar Arnfjörð Bjarmason
2021-02-03 19:42 ` Junio C Hamano
2021-02-03 3:28 ` [PATCH 20/25] grep.h: make patmatch() a public function Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 21/25] pickaxe: use PCREv2 for -G and -S Ævar Arnfjörð Bjarmason
2021-02-03 20:44 ` Ævar Arnfjörð Bjarmason
2021-02-04 18:11 ` Junio C Hamano
2021-02-04 18:22 ` Junio C Hamano
2021-02-03 3:28 ` [PATCH 22/25] Remove unused kwset.[ch] Ævar Arnfjörð Bjarmason
[not found] ` <CAPUEspgBmuTBHVZWY9fRtjbHWBRr0zHravLL1Czepc6jmib4HA@mail.gmail.com>
2021-02-03 14:13 ` Ævar Arnfjörð Bjarmason
[not found] ` <CAPUEsphN7QuSVsC1Tr4xE8yQgPTtpF7wL7zbk1crQU3n-5g6JQ@mail.gmail.com>
2021-02-03 16:45 ` Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 23/25] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
2021-02-04 18:16 ` Junio C Hamano
2021-02-03 3:28 ` [PATCH 25/25] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-02-03 12:38 ` [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes Ævar Arnfjörð Bjarmason
2021-02-16 22:23 ` Junio C Hamano
2021-02-17 1:19 ` Junio C Hamano
2021-04-12 17:15 ` Ævar Arnfjörð Bjarmason [this message]
2021-04-12 17:15 ` [PATCH v3 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 02/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 03/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 04/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 05/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 06/22] pickaxe tests: add missing test for --no-pickaxe-regex being an error Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 07/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 08/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 09/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 10/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 11/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 12/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 13/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 14/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 15/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 16/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 17/22] xdiff-interface: prepare for allowing early return Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 18/22] xdiff-interface: allow early return from xdiff_emit_line_fn Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 19/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 20/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 21/22] xdiff users: use designated initializers for out_line Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 22/22] xdiff-interface: replace discard_hunk_line() with a flag Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag Ævar Arnfjörð Bjarmason
2021-03-30 23:14 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 03/22] test-lib-functions: reword "test_commit --append" docs Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 04/22] test-lib functions: add --printf option to test_commit Ævar Arnfjörð Bjarmason
2021-03-30 23:11 ` Junio C Hamano
2021-04-12 13:19 ` Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
2021-03-30 23:26 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 06/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 07/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
2021-03-30 23:32 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
2021-03-30 23:36 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 10/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 11/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 12/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 13/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
2021-03-30 23:45 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
2021-03-30 23:50 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-03-30 23:54 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 17/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 18/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
2021-03-30 23:58 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
2021-03-31 0:04 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 20/22] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
2021-02-16 11:58 ` [PATCH v2 21/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-03-31 0:11 ` Junio C Hamano
2021-02-16 11:58 ` [PATCH v2 22/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
2021-03-31 0:14 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover-00.22-00000000000-20210412T170457Z-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=carenas@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).