git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Michał Kępień" <michal@isc.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes
Date: Wed, 7 Oct 2020 21:48:21 +0200	[thread overview]
Message-ID: <20201007194821.GA20549@larwa.hq.kempniu.pl> (raw)
In-Reply-To: <xmqqy2kpbye9.fsf@gitster.c.googlers.com>

> > Apart from adding a new field to struct diff_options, also define a new
> > flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure.
> > This is done because the xpparam_t structure (which is used for passing
> > around the regular expression supplied to -I) is not zeroed out in all
> > call stacks involving xdl_diff() and thus only performing a NULL check
> > on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating
> > garbage on the stack as a regular expression.  As the 'flags' field of
> > xpparam_t is initialized in all call stacks involving xdl_diff(), adding
> > a flag check avoids that problem.
> 
> You mentioned in your cover letter about this, and I tend to agree
> that this feels like a hack, at least from the first glance.  The
> next feature that wants to have an optional pointer in xpparam_t and
> have the code behave differently with the data pointed by it would
> need to waste another bit the same way.

Agreed.

> Do you already know (read:
> I am not asking you to dig to find out immediately---just asking if
> you already know the answer) if there is an inherent reason why they
> cannot be memset(0) before use?  It seems like a much better
> approach to ensure that we clear the structure.

I do not know of any reason for xpparam_t structures to not be
zero-initialized.  In fact, they _are_ zero-initialized most of the
time; AFAICT, there are just three places in the tree in which they are
not.

Would you like me to address that in a separate patch in this patch
series?

> Doesn't existing
> anchors array share the same issue (at least anchors_nr must be
> cleared if there is no interesting anchors) already?  IOW, should
> "git grep anchors_nr" be a valid way to identify _all_ places where
> you need to clear the ignore_regex field?

Yes, the 'anchors' and 'anchors_nr' fields of xpparam_t are also
affected by the same problem, but the symptoms are more benign in their
case because these fields are only used in xdiff/xpatience.c, i.e. when
XDF_PATIENCE_DIFF is set in 'flags' - and as I already mentioned in
commit messages, that field is always initialized properly, so there is
currently no practical possibility of stack garbage being interpreted as
e.g. a pointer to the anchor array.

> > +-I<regex>::
> > +	Ignore changes whose all lines match <regex>.
> > +
> 
> The implementation seems to allow only one regex, but I suspect we'd
> want to mimic
> 
>     $ printf "%s\n" a a a >test_a
>     $ printf "%s\n" b b b >test_b
>     $ diff -Ia     test_a test_b
>     $ diff     -Ib test_a test_b
>     $ diff -Ia -Ib test_a test_b

Ah, sure.  After skimming through various man pages available online, I
was not sure whether all standalone versions of diff which support -I
allow that switch to be used multiple times and I thought it would be
better to start with the simplest thing possible.  If you would rather
have the above implemented immediately, I can sure try that in v2.

> and until that happens, the explanation needs to say something about
> earlier <regex> being discarded when this option is given more than
> once.

Sorry, where do I look for that explanation?  I tried to make -I behave
similarly to -G and -S, each of which can be specified more than once,
but the last value provided overrides the earlier ones - and I could not
find any mention of that behavior in the docs.  Please help?

> > @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt,
> >  	return 0;
> >  }
> >  
> > +static int diff_opt_ignore_regex(const struct option *opt,
> > +				 const char *arg, int unset)
> > +{
> > +	struct diff_options *options = opt->value;
> > +
> > +	BUG_ON_OPT_NEG(unset);
> > +	options->xdl_opts |= XDF_IGNORE_REGEX;
> > +	options->ignore_regex = arg;
> 
> When given twice or more, the earlier value gets lost (it does not
> leak, luckily, though).

Right, as I mentioned above, I just wanted to start with something
simple that resembles -G/-S.  I will revise this part in v2 if you would
like to see support for multiple regular expressions in this patch
series.

> > +	return 0;
> > +}
> 
> If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not
> have to have this as a callback.  Instead, it can be OPT_STRING with
> the current semantics of "only the last one is used", or we can use
> OPT_STRING_LIST to keep all the expressions.

I see, thanks for the hints!

> On the other hand, I wonder if it would be a valid approach to make
> the new field(s) in diff_options a "regex_t *ignore_regex" (and add
> an "int ignore_regex_nr" next to it, if we shoot for multiple -I
> options), instead of "const char *".  For that, we would need a
> callback even without XDF_IGNORE_REGEX bit having to futz with
> xdl_opts field.
> 
> Doing so would give us a chance to compile and notice a malformed
> regular expression in diff.c, before it hits xdiff/ layer, which may
> or may not be a good thing.

I have not thought about this approach before, but it sounds elegant to
me, at least at first glance - option parsing code sounds like the right
place for sanitizing user-provided parameters.  Doubly so if we take the
multiple -I options approach.  I will try this in v2.

> > @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
> >  	}
> >  }
> 
> I agree with what you said in the cover letter that it would be a
> good idea to name the existing xdl_mark_ignorable() and the new one
> in such a way that they look similar and parallel, by renaming the
> former to xdl_mark_ignorable_lines or something.

Then I will do that in v2.  Separate patch?

> > +static void
> > +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> > +			 const char *ignore_regex)
> 
> I know some coding standard do chomp line immediately before the
> function name (I grew up with one), but that is not what this
> project uses, and judging from the surrounding code, it is not what
> the upstream xdiff source we borrowed uses, either.

Oh, right, sorry - force of habit.  I will fix this in v2.

> > +{
> > +	xdchange_t *xch;
> > +	regex_t regex;
> > +
> > +	if (regcomp(&regex, ignore_regex, REG_EXTENDED | REG_NEWLINE))
> > +		die("invalid regex: %s", ignore_regex);
> 
> If we compile in diff.c layer and pass regex_t down here, we won't
> have to fail here this deep in the callchain.

Agreed - I already responded to this remark above.

> > +	for (xch = xscr; xch; xch = xch->next) {
> > +		regmatch_t regmatch;
> > +		xrecord_t **rec;
> > +		int ignore = 1;
> > +		long i;
> > +
> > +		rec = &xe->xdf1.recs[xch->i1];
> > +		for (i = 0; i < xch->chg1 && ignore; i++)
> > +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> > +					      1, &regmatch, 0);
> > +		rec = &xe->xdf2.recs[xch->i2];
> > +		for (i = 0; i < xch->chg2 && ignore; i++)
> > +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> > +					      1, &regmatch, 0);
> > +
> > +		/*
> > +		 * Do not override --ignore-blank-lines.
> > +		 */
> > +		xch->ignore |= ignore;
> 
> Well, you could optimize out the whole regexp matching by adding
> 
> 		if (xch->ignore)
> 			continue;
> 
> before the two loops try to find a non-matching line, no?

Ah, of course, it looks so obvious now that you pointed it out :-)  I
guess I was copy-past^W^W^Wtrying to make xdl_mark_ignorable_regex()
look similar to xdl_mark_ignorable().  I will use your suggestion in v2.

> > +	}
> > +}
> > +
> >  int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> >  	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
> >  	xdchange_t *xscr;
> > @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> >  		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
> >  			xdl_mark_ignorable(xscr, &xe, xpp->flags);
> >  
> > +		if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex)
> > +			xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex);
> > +
> >  		if (ef(&xe, xscr, ecb, xecfg) < 0) {
> >  
> >  			xdl_free_script(xscr);
> 
> Thanks for an exciting read ;-)

My pleasure ;-)  And thanks for taking a look.  Sorry about the long
turnaround time, but unfortunately this is the best I can do at the
moment.

-- 
Best regards,
Michał Kępień

  reply	other threads:[~2020-10-07 19:48 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
2020-10-01 18:21   ` Junio C Hamano
2020-10-07 19:48     ` Michał Kępień [this message]
2020-10-07 20:08       ` Junio C Hamano
2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień
2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano
2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-12 11:14     ` Johannes Schindelin
2020-10-12 17:09       ` Junio C Hamano
2020-10-12 19:52     ` Junio C Hamano
2020-10-13  6:35       ` Michał Kępień
2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-12 11:20     ` Johannes Schindelin
2020-10-12 20:00       ` Junio C Hamano
2020-10-12 20:39         ` Johannes Schindelin
2020-10-12 21:43           ` Junio C Hamano
2020-10-13  6:37             ` Michał Kępień
2020-10-13 15:49               ` Junio C Hamano
2020-10-13  6:36       ` Michał Kępień
2020-10-13 12:02         ` Johannes Schindelin
2020-10-13 15:53           ` Junio C Hamano
2020-10-13 18:45           ` Michał Kępień
2020-10-12 18:01     ` Junio C Hamano
2020-10-13  6:38       ` Michał Kępień
2020-10-12 20:04     ` Junio C Hamano
2020-10-13  6:38       ` Michał Kępień
2020-10-12  9:17   ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień
2020-10-12 11:49     ` Johannes Schindelin
2020-10-13  6:38       ` Michał Kępień
2020-10-13 12:00         ` Johannes Schindelin
2020-10-13 16:00           ` Junio C Hamano
2020-10-13 19:01           ` Michał Kępień
2020-10-15 11:45             ` Johannes Schindelin
2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-15  7:24     ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-15  7:24     ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-16 15:32       ` Phillip Wood
2020-10-16 18:04         ` Junio C Hamano
2020-10-19  9:48           ` Michał Kępień
2020-10-16 18:16       ` Junio C Hamano
2020-10-19  9:55         ` Michał Kępień
2020-10-19 17:29           ` Junio C Hamano
2020-10-16 10:00     ` [PATCH v3 0/2] " Johannes Schindelin
2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
2020-10-20  6:48       ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-20  6:48       ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2021-02-05 14:13       ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason
2021-02-10 16:00         ` Johannes Schindelin
2021-02-11  3:00           ` Ævar Arnfjörð Bjarmason
2021-02-11  9:40             ` Johannes Schindelin
2021-02-11 10:21               ` Jeff King
2021-02-11 10:45                 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45                 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45                 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason
2021-02-05 14:13       ` [PATCH " Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201007194821.GA20549@larwa.hq.kempniu.pl \
    --to=michal@isc.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).