git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block
Date: Mon, 14 Aug 2017 15:46:37 -0700	[thread overview]
Message-ID: <CAGZ79kbpdPcyCO9L0o27xOrSEFNy6tn-QFsxrkXJoXzQEbCcBQ@mail.gmail.com> (raw)
In-Reply-To: <288c6c29d212088175b13074dba23c9dbdaa2c67.1502745892.git.jonathantanmy@google.com>

On Mon, Aug 14, 2017 at 2:31 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> The existing documentation states "If there are fewer than 3 adjacent
> moved lines, they are not marked up as moved", which is ambiguous as to
> whether "adjacent moved lines" must be adjacent both at the source and
> at the destination, or be adjacent merely at the source or merely at the
> destination. The behavior of the current code takes the latter
> interpretation, but the behavior of blocks being conceptually painted as
> blocks and then "unpainted" as lines is confusing to me.
>
> Therefore, clarify the ambiguity in the documentation in the stricter
> direction - a block is completely painted or not at all - and update the
> code accordingly.
>
> This requires a change in the test "detect malicious moved code, inside
> file" in that the malicious change is now marked without the move
> colors (because the blocks involved are too small), contrasting with
> the subsequent test where the non-malicious change is marked with move
> colors.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>

I wonder if these changes ought to be a new mode
(C.f. "mountain zebra" and "imperial zebra" for slight
changes in coloring ;) or if we can settle on one true way.

The 3 lines heuristic is a bad heuristic IMHO (it works reasonable well
for little effort but the fact that we discuss this patch makes it a bad
heuristic as we discuss corner cases that are not relevant. The heuristic
originally wanted to filter out stray single braces that were "moved",
it did not want to suppress small original moved pieces of code),
which this covers up a bit.

Maybe we'll cook this in next for a while to see how people
react to it?

> ---
>  Documentation/diff-options.txt |  6 ++--
>  diff.c                         |  6 +++-
>  t/t4015-diff-whitespace.sh     | 71 +++++++++++++++++++++++++++++++++---------
>  3 files changed, 65 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index bc52bd0b9..1ee3ca3f6 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -257,10 +257,10 @@ zebra::
>         Blocks of moved code are detected greedily. The detected blocks are
>         painted using either the 'color.diff.{old,new}Moved' color or
>         'color.diff.{old,new}MovedAlternative'. The change between
> -       the two colors indicates that a new block was detected. If there
> -       are fewer than 3 adjacent moved lines, they are not marked up
> +       the two colors indicates that a new block was detected. If a block
> +       has fewer than 3 adjacent moved lines, it is not marked up
>         as moved, but the regular colors 'color.diff.{old,new}' will be
> -       used.
> +       used instead.

Thanks for clarifying the docs.

> diff --git a/diff.c b/diff.c
> index f598d8a3a..20b784736 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -923,7 +923,6 @@ static void mark_color_as_moved(struct diff_options *o,
>                 }
>
>                 l->flags |= DIFF_SYMBOL_MOVED_LINE;
> -               block_length++;
>
>                 if (o->color_moved == COLOR_MOVED_PLAIN)
>                         continue;
> @@ -953,8 +952,13 @@ static void mark_color_as_moved(struct diff_options *o,
>                         }
>
>                         flipped_block = (flipped_block + 1) % 2;
> +
> +                       adjust_last_block(o, n, block_length);
> +                       block_length = 0;
>                 }
>
> +               block_length++;
> +
>                 if (flipped_block)
>                         l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>         }

This changes the algorithm in a non-obvious way.
When the min-length heuristic is strictly bound to each block,
the function can be simplified more than adding on these tweaks,

1) remove variable block_length, needing to count in the adjust function

2) assign DIFF_SYMBOL_MOVED_LINE either in
    COLOR_MOVED_PLAIN case (and continue) or later (where
    block_length is increased in this patch)

No need to do these, just as thoughts on how to reduce complexity.

> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 6f7758e5c..d0613a189 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1097,13 +1097,13 @@ test_expect_success 'detect malicious moved code, inside file' '

This test would not 'detect malicious moved code, inside file'  any more,
I think instead we'd rather want to have a more realistic test case,
which has more lines in it? (This test is about the block detection
not about the omission of short blocks, which was an after thought)

> +test_expect_success '--color-moved treats adjacent blocks as separate for MIN_BLOCK_LENGTH' '

Thanks for providing a test here! For testing MIN_BLOCK_LENGTH
for each block I would have imagined the tests would have a block of
length (1,)2,3(,4) lines and then we'd see that the blocks are
highlighted or not.
This only has length=1 blocks?

  reply	other threads:[~2017-08-14 22:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
2017-08-14 17:13   ` Stefan Beller
2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
2017-08-14 17:17   ` Stefan Beller
2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
2017-08-14 17:22   ` Stefan Beller
2017-08-12  0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano
2017-08-14 17:29   ` Stefan Beller
2017-08-14 19:37     ` Junio C Hamano
2017-08-14 19:51       ` Stefan Beller
2017-08-15 17:07         ` Junio C Hamano
2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan
2017-08-14 21:31 ` [PATCH v2 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
2017-08-14 21:31 ` [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
2017-08-14 22:46   ` Stefan Beller [this message]
2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan
2017-08-14 23:57 ` [PATCH v3 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
2017-08-14 23:57 ` [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan
2017-08-15  2:29   ` Stefan Beller
2017-08-15 19:54   ` Junio C Hamano
2017-08-15 20:06     ` Stefan Beller
2017-08-15 20:53       ` Junio C Hamano
2017-08-16  1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan
2017-08-16  5:55   ` Stefan Beller
2017-08-16  1:27 ` [PATCH v4 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
2017-08-16  1:27 ` [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
2017-08-16  1:27 ` [PATCH v4 3/3] diff: define block by number of alphanumeric chars Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kbpdPcyCO9L0o27xOrSEFNy6tn-QFsxrkXJoXzQEbCcBQ@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).