git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Michał Kępień" <michal@isc.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes
Date: Thu, 01 Oct 2020 11:21:18 -0700	[thread overview]
Message-ID: <xmqqy2kpbye9.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <20201001120606.25773-2-michal@isc.org> ("Michał Kępień"'s message of "Thu, 1 Oct 2020 14:06:05 +0200")

Michał Kępień <michal@isc.org> writes:

> Apart from adding a new field to struct diff_options, also define a new
> flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure.
> This is done because the xpparam_t structure (which is used for passing
> around the regular expression supplied to -I) is not zeroed out in all
> call stacks involving xdl_diff() and thus only performing a NULL check
> on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating
> garbage on the stack as a regular expression.  As the 'flags' field of
> xpparam_t is initialized in all call stacks involving xdl_diff(), adding
> a flag check avoids that problem.

You mentioned in your cover letter about this, and I tend to agree
that this feels like a hack, at least from the first glance.  The
next feature that wants to have an optional pointer in xpparam_t and
have the code behave differently with the data pointed by it would
need to waste another bit the same way.  Do you already know (read:
I am not asking you to dig to find out immediately---just asking if
you already know the answer) if there is an inherent reason why they
cannot be memset(0) before use?  It seems like a much better
approach to ensure that we clear the structure.  Doesn't existing
anchors array share the same issue (at least anchors_nr must be
cleared if there is no interesting anchors) already?  IOW, should
"git grep anchors_nr" be a valid way to identify _all_ places where
you need to clear the ignore_regex field?

> +-I<regex>::
> +	Ignore changes whose all lines match <regex>.
> +

The implementation seems to allow only one regex, but I suspect we'd
want to mimic

    $ printf "%s\n" a a a >test_a
    $ printf "%s\n" b b b >test_b
    $ diff -Ia     test_a test_b
    $ diff     -Ib test_a test_b
    $ diff -Ia -Ib test_a test_b

and until that happens, the explanation needs to say something about
earlier <regex> being discarded when this option is given more than
once.

> @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt,
>  	return 0;
>  }
>  
> +static int diff_opt_ignore_regex(const struct option *opt,
> +				 const char *arg, int unset)
> +{
> +	struct diff_options *options = opt->value;
> +
> +	BUG_ON_OPT_NEG(unset);
> +	options->xdl_opts |= XDF_IGNORE_REGEX;
> +	options->ignore_regex = arg;

When given twice or more, the earlier value gets lost (it does not
leak, luckily, though).

> +	return 0;
> +}

If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not
have to have this as a callback.  Instead, it can be OPT_STRING with
the current semantics of "only the last one is used", or we can use
OPT_STRING_LIST to keep all the expressions.

On the other hand, I wonder if it would be a valid approach to make
the new field(s) in diff_options a "regex_t *ignore_regex" (and add
an "int ignore_regex_nr" next to it, if we shoot for multiple -I
options), instead of "const char *".  For that, we would need a
callback even without XDF_IGNORE_REGEX bit having to futz with
xdl_opts field.

Doing so would give us a chance to compile and notice a malformed
regular expression in diff.c, before it hits xdiff/ layer, which may
or may not be a good thing.

> @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
>  	}
>  }

I agree with what you said in the cover letter that it would be a
good idea to name the existing xdl_mark_ignorable() and the new one
in such a way that they look similar and parallel, by renaming the
former to xdl_mark_ignorable_lines or something.

> +static void
> +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> +			 const char *ignore_regex)

I know some coding standard do chomp line immediately before the
function name (I grew up with one), but that is not what this
project uses, and judging from the surrounding code, it is not what
the upstream xdiff source we borrowed uses, either.

> +{
> +	xdchange_t *xch;
> +	regex_t regex;
> +
> +	if (regcomp(&regex, ignore_regex, REG_EXTENDED | REG_NEWLINE))
> +		die("invalid regex: %s", ignore_regex);

If we compile in diff.c layer and pass regex_t down here, we won't
have to fail here this deep in the callchain.

> +	for (xch = xscr; xch; xch = xch->next) {
> +		regmatch_t regmatch;
> +		xrecord_t **rec;
> +		int ignore = 1;
> +		long i;
> +
> +		rec = &xe->xdf1.recs[xch->i1];
> +		for (i = 0; i < xch->chg1 && ignore; i++)
> +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> +					      1, &regmatch, 0);
> +		rec = &xe->xdf2.recs[xch->i2];
> +		for (i = 0; i < xch->chg2 && ignore; i++)
> +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> +					      1, &regmatch, 0);
> +
> +		/*
> +		 * Do not override --ignore-blank-lines.
> +		 */
> +		xch->ignore |= ignore;

Well, you could optimize out the whole regexp matching by adding

		if (xch->ignore)
			continue;

before the two loops try to find a non-matching line, no?

> +	}
> +}
> +
>  int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
>  	xdchange_t *xscr;
> @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
>  			xdl_mark_ignorable(xscr, &xe, xpp->flags);
>  
> +		if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex)
> +			xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex);
> +
>  		if (ef(&xe, xscr, ecb, xecfg) < 0) {
>  
>  			xdl_free_script(xscr);

Thanks for an exciting read ;-)


  reply	other threads:[~2020-10-01 18:21 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
2020-10-01 18:21   ` Junio C Hamano [this message]
2020-10-07 19:48     ` Michał Kępień
2020-10-07 20:08       ` Junio C Hamano
2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień
2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano
2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-12 11:14     ` Johannes Schindelin
2020-10-12 17:09       ` Junio C Hamano
2020-10-12 19:52     ` Junio C Hamano
2020-10-13  6:35       ` Michał Kępień
2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-12 11:20     ` Johannes Schindelin
2020-10-12 20:00       ` Junio C Hamano
2020-10-12 20:39         ` Johannes Schindelin
2020-10-12 21:43           ` Junio C Hamano
2020-10-13  6:37             ` Michał Kępień
2020-10-13 15:49               ` Junio C Hamano
2020-10-13  6:36       ` Michał Kępień
2020-10-13 12:02         ` Johannes Schindelin
2020-10-13 15:53           ` Junio C Hamano
2020-10-13 18:45           ` Michał Kępień
2020-10-12 18:01     ` Junio C Hamano
2020-10-13  6:38       ` Michał Kępień
2020-10-12 20:04     ` Junio C Hamano
2020-10-13  6:38       ` Michał Kępień
2020-10-12  9:17   ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień
2020-10-12 11:49     ` Johannes Schindelin
2020-10-13  6:38       ` Michał Kępień
2020-10-13 12:00         ` Johannes Schindelin
2020-10-13 16:00           ` Junio C Hamano
2020-10-13 19:01           ` Michał Kępień
2020-10-15 11:45             ` Johannes Schindelin
2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-15  7:24     ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-15  7:24     ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-16 15:32       ` Phillip Wood
2020-10-16 18:04         ` Junio C Hamano
2020-10-19  9:48           ` Michał Kępień
2020-10-16 18:16       ` Junio C Hamano
2020-10-19  9:55         ` Michał Kępień
2020-10-19 17:29           ` Junio C Hamano
2020-10-16 10:00     ` [PATCH v3 0/2] " Johannes Schindelin
2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
2020-10-20  6:48       ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-20  6:48       ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2021-02-05 14:13       ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason
2021-02-10 16:00         ` Johannes Schindelin
2021-02-11  3:00           ` Ævar Arnfjörð Bjarmason
2021-02-11  9:40             ` Johannes Schindelin
2021-02-11 10:21               ` Jeff King
2021-02-11 10:45                 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45                 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45                 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason
2021-02-05 14:13       ` [PATCH " Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy2kpbye9.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=michal@isc.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).