From: "Michał Kępień" <michal@isc.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes
Date: Wed, 7 Oct 2020 21:48:21 +0200 [thread overview]
Message-ID: <20201007194821.GA20549@larwa.hq.kempniu.pl> (raw)
In-Reply-To: <xmqqy2kpbye9.fsf@gitster.c.googlers.com>
> > Apart from adding a new field to struct diff_options, also define a new
> > flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure.
> > This is done because the xpparam_t structure (which is used for passing
> > around the regular expression supplied to -I) is not zeroed out in all
> > call stacks involving xdl_diff() and thus only performing a NULL check
> > on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating
> > garbage on the stack as a regular expression. As the 'flags' field of
> > xpparam_t is initialized in all call stacks involving xdl_diff(), adding
> > a flag check avoids that problem.
>
> You mentioned in your cover letter about this, and I tend to agree
> that this feels like a hack, at least from the first glance. The
> next feature that wants to have an optional pointer in xpparam_t and
> have the code behave differently with the data pointed by it would
> need to waste another bit the same way.
Agreed.
> Do you already know (read:
> I am not asking you to dig to find out immediately---just asking if
> you already know the answer) if there is an inherent reason why they
> cannot be memset(0) before use? It seems like a much better
> approach to ensure that we clear the structure.
I do not know of any reason for xpparam_t structures to not be
zero-initialized. In fact, they _are_ zero-initialized most of the
time; AFAICT, there are just three places in the tree in which they are
not.
Would you like me to address that in a separate patch in this patch
series?
> Doesn't existing
> anchors array share the same issue (at least anchors_nr must be
> cleared if there is no interesting anchors) already? IOW, should
> "git grep anchors_nr" be a valid way to identify _all_ places where
> you need to clear the ignore_regex field?
Yes, the 'anchors' and 'anchors_nr' fields of xpparam_t are also
affected by the same problem, but the symptoms are more benign in their
case because these fields are only used in xdiff/xpatience.c, i.e. when
XDF_PATIENCE_DIFF is set in 'flags' - and as I already mentioned in
commit messages, that field is always initialized properly, so there is
currently no practical possibility of stack garbage being interpreted as
e.g. a pointer to the anchor array.
> > +-I<regex>::
> > + Ignore changes whose all lines match <regex>.
> > +
>
> The implementation seems to allow only one regex, but I suspect we'd
> want to mimic
>
> $ printf "%s\n" a a a >test_a
> $ printf "%s\n" b b b >test_b
> $ diff -Ia test_a test_b
> $ diff -Ib test_a test_b
> $ diff -Ia -Ib test_a test_b
Ah, sure. After skimming through various man pages available online, I
was not sure whether all standalone versions of diff which support -I
allow that switch to be used multiple times and I thought it would be
better to start with the simplest thing possible. If you would rather
have the above implemented immediately, I can sure try that in v2.
> and until that happens, the explanation needs to say something about
> earlier <regex> being discarded when this option is given more than
> once.
Sorry, where do I look for that explanation? I tried to make -I behave
similarly to -G and -S, each of which can be specified more than once,
but the last value provided overrides the earlier ones - and I could not
find any mention of that behavior in the docs. Please help?
> > @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt,
> > return 0;
> > }
> >
> > +static int diff_opt_ignore_regex(const struct option *opt,
> > + const char *arg, int unset)
> > +{
> > + struct diff_options *options = opt->value;
> > +
> > + BUG_ON_OPT_NEG(unset);
> > + options->xdl_opts |= XDF_IGNORE_REGEX;
> > + options->ignore_regex = arg;
>
> When given twice or more, the earlier value gets lost (it does not
> leak, luckily, though).
Right, as I mentioned above, I just wanted to start with something
simple that resembles -G/-S. I will revise this part in v2 if you would
like to see support for multiple regular expressions in this patch
series.
> > + return 0;
> > +}
>
> If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not
> have to have this as a callback. Instead, it can be OPT_STRING with
> the current semantics of "only the last one is used", or we can use
> OPT_STRING_LIST to keep all the expressions.
I see, thanks for the hints!
> On the other hand, I wonder if it would be a valid approach to make
> the new field(s) in diff_options a "regex_t *ignore_regex" (and add
> an "int ignore_regex_nr" next to it, if we shoot for multiple -I
> options), instead of "const char *". For that, we would need a
> callback even without XDF_IGNORE_REGEX bit having to futz with
> xdl_opts field.
>
> Doing so would give us a chance to compile and notice a malformed
> regular expression in diff.c, before it hits xdiff/ layer, which may
> or may not be a good thing.
I have not thought about this approach before, but it sounds elegant to
me, at least at first glance - option parsing code sounds like the right
place for sanitizing user-provided parameters. Doubly so if we take the
multiple -I options approach. I will try this in v2.
> > @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
> > }
> > }
>
> I agree with what you said in the cover letter that it would be a
> good idea to name the existing xdl_mark_ignorable() and the new one
> in such a way that they look similar and parallel, by renaming the
> former to xdl_mark_ignorable_lines or something.
Then I will do that in v2. Separate patch?
> > +static void
> > +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> > + const char *ignore_regex)
>
> I know some coding standard do chomp line immediately before the
> function name (I grew up with one), but that is not what this
> project uses, and judging from the surrounding code, it is not what
> the upstream xdiff source we borrowed uses, either.
Oh, right, sorry - force of habit. I will fix this in v2.
> > +{
> > + xdchange_t *xch;
> > + regex_t regex;
> > +
> > + if (regcomp(®ex, ignore_regex, REG_EXTENDED | REG_NEWLINE))
> > + die("invalid regex: %s", ignore_regex);
>
> If we compile in diff.c layer and pass regex_t down here, we won't
> have to fail here this deep in the callchain.
Agreed - I already responded to this remark above.
> > + for (xch = xscr; xch; xch = xch->next) {
> > + regmatch_t regmatch;
> > + xrecord_t **rec;
> > + int ignore = 1;
> > + long i;
> > +
> > + rec = &xe->xdf1.recs[xch->i1];
> > + for (i = 0; i < xch->chg1 && ignore; i++)
> > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size,
> > + 1, ®match, 0);
> > + rec = &xe->xdf2.recs[xch->i2];
> > + for (i = 0; i < xch->chg2 && ignore; i++)
> > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size,
> > + 1, ®match, 0);
> > +
> > + /*
> > + * Do not override --ignore-blank-lines.
> > + */
> > + xch->ignore |= ignore;
>
> Well, you could optimize out the whole regexp matching by adding
>
> if (xch->ignore)
> continue;
>
> before the two loops try to find a non-matching line, no?
Ah, of course, it looks so obvious now that you pointed it out :-) I
guess I was copy-past^W^W^Wtrying to make xdl_mark_ignorable_regex()
look similar to xdl_mark_ignorable(). I will use your suggestion in v2.
> > + }
> > +}
> > +
> > int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> > xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
> > xdchange_t *xscr;
> > @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> > if (xpp->flags & XDF_IGNORE_BLANK_LINES)
> > xdl_mark_ignorable(xscr, &xe, xpp->flags);
> >
> > + if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex)
> > + xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex);
> > +
> > if (ef(&xe, xscr, ecb, xecfg) < 0) {
> >
> > xdl_free_script(xscr);
>
> Thanks for an exciting read ;-)
My pleasure ;-) And thanks for taking a look. Sorry about the long
turnaround time, but unfortunately this is the best I can do at the
moment.
--
Best regards,
Michał Kępień
next prev parent reply other threads:[~2020-10-07 19:48 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
2020-10-01 18:21 ` Junio C Hamano
2020-10-07 19:48 ` Michał Kępień [this message]
2020-10-07 20:08 ` Junio C Hamano
2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień
2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano
2020-10-12 9:17 ` [PATCH v2 0/3] " Michał Kępień
2020-10-12 9:17 ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-12 11:14 ` Johannes Schindelin
2020-10-12 17:09 ` Junio C Hamano
2020-10-12 19:52 ` Junio C Hamano
2020-10-13 6:35 ` Michał Kępień
2020-10-12 9:17 ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-12 11:20 ` Johannes Schindelin
2020-10-12 20:00 ` Junio C Hamano
2020-10-12 20:39 ` Johannes Schindelin
2020-10-12 21:43 ` Junio C Hamano
2020-10-13 6:37 ` Michał Kępień
2020-10-13 15:49 ` Junio C Hamano
2020-10-13 6:36 ` Michał Kępień
2020-10-13 12:02 ` Johannes Schindelin
2020-10-13 15:53 ` Junio C Hamano
2020-10-13 18:45 ` Michał Kępień
2020-10-12 18:01 ` Junio C Hamano
2020-10-13 6:38 ` Michał Kępień
2020-10-12 20:04 ` Junio C Hamano
2020-10-13 6:38 ` Michał Kępień
2020-10-12 9:17 ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień
2020-10-12 11:49 ` Johannes Schindelin
2020-10-13 6:38 ` Michał Kępień
2020-10-13 12:00 ` Johannes Schindelin
2020-10-13 16:00 ` Junio C Hamano
2020-10-13 19:01 ` Michał Kępień
2020-10-15 11:45 ` Johannes Schindelin
2020-10-15 7:24 ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-15 7:24 ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-15 7:24 ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-16 15:32 ` Phillip Wood
2020-10-16 18:04 ` Junio C Hamano
2020-10-19 9:48 ` Michał Kępień
2020-10-16 18:16 ` Junio C Hamano
2020-10-19 9:55 ` Michał Kępień
2020-10-19 17:29 ` Junio C Hamano
2020-10-16 10:00 ` [PATCH v3 0/2] " Johannes Schindelin
2020-10-20 6:48 ` [PATCH v4 " Michał Kępień
2020-10-20 6:48 ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-20 6:48 ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2021-02-05 14:13 ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason
2021-02-10 16:00 ` Johannes Schindelin
2021-02-11 3:00 ` Ævar Arnfjörð Bjarmason
2021-02-11 9:40 ` Johannes Schindelin
2021-02-11 10:21 ` Jeff King
2021-02-11 10:45 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason
2021-02-05 14:13 ` [PATCH " Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201007194821.GA20549@larwa.hq.kempniu.pl \
--to=michal@isc.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).