git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Stefan Beller <sbeller@google.com>
Cc: jacob.keller@gmail.com, simon@ruderich.org, git@vger.kernel.org
Subject: Re: [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option
Date: Mon, 2 Apr 2018 17:31:35 -0700	[thread overview]
Message-ID: <20180402173135.a6ea2bc1154781abac4dc21b@google.com> (raw)
In-Reply-To: <20180402224854.86922-7-sbeller@google.com>

On Mon,  2 Apr 2018 15:48:53 -0700
Stefan Beller <sbeller@google.com> wrote:

> In the original implementation of the move detection logic we assumed that
> the choice for ignoring white space changes is the same for the move
> detection as it is for the generic diff.  It turns out this is wrong.

I don't think we can say that this is wrong - just that we decided that
it would be useful to allow different whitespace handling methods when
calculating the diff and when detecting moves.

> There are a couple of things where the user wants to input their
> decision into the diff machinery:
> 
> * Whether or not to ignore white space for the general diff detection.
>   This is covered with the -w, -b options already.
> * Whether the move detection ought to pay attention to white spaces
>   in general. And if so, how are white spaces handled for the block
>   detection.
> 
> One example would be reviewing a current patch under discussion, that moves
> code around.  In that case, you may want to have the general diff machinery
> not ignore the white spaces (i.e. exact matching), as that will show you
> the diff as the patch sent. The code moved however may have another
> indentation level, such that you want to ignore white spaces for the move
> detection. The code may be in python, such that spaces matter for program
> flow, though. That is why you'd want each block to have the same change
> in white space.
> 
> This patch decouples white space treatment in the general diff machinery
> from the white space treatment in the move detection.
> 
> This adds all the the options for ignoring white space that the regular
> diff machinery has to the move detection, such that we can cover all the
> cases that were introduced in 7a55427094 (Merge branch
> 'jk/diff-color-moved-fix', 2017-11-06).

I would just write the above as follows:

  Allow the user to specify that whitespace should be ignored
  differently during detection of moved lines than during generation of
  added and removed lines. This is done by providing analogs to the
  --ignore-space-at-eol, -b, and -w options (namely,
  --color-moved-[no-]ignore-space-at-eol <fill in the rest>) that affect
  only the color of the output, and making the existing
  --ignore-space-at-eol, -b, and -w options no longer affect the color
  of the output.

(And if the above is not clear enough that this is a
backwards-incompatible change, we might need to call it out.)

> The example from above (different lines in the diff with -w for the regular
> diff) is covered in a test. Convert the existing tests to be more explicit
> on their choice of white space behavior in the move detection as the tests
> should not fail if the default is changed.

This statement is confusing to me - of course the tests should fail,
since we changed the defaults. I think it is sufficient to just mention
that whitespace handling has to be explicitly specified for the move
detection part.

> +--color-moved-[no-]ignore-all-space::
> +	Ignore whitespace when comparing lines when performing the move
> +	detection for --color-moved.  This ignores differences even if
> +	one line has whitespace where the other line has none.
> +--color-moved-[no-]ignore-space-change::
> +	Ignore changes in amount of whitespace when performing the move
> +	detection for --color-moved.  This ignores whitespace
> +	at line end, and considers all other sequences of one or
> +	more whitespace characters to be equivalent.
> +--color-moved-[no-]ignore-space-at-eol::
> +	Ignore changes in whitespace at EOL when performing the move
> +	detection for --color-moved.

Optional: I would reorder this to be in the same order as the analogous
options (--ignore-space-at-eol first, then -b, then -w).

> @@ -720,7 +720,7 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>  
>  	return !xdiff_compare_lines(a->es->line, a->es->len,
>  				    b->es->line, b->es->len,
> -				    diffopt->xdl_opts);
> +				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
>  }
>  
>  static struct moved_entry *prepare_entry(struct diff_options *o,
> @@ -728,8 +728,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
>  {
>  	struct moved_entry *ret = xmalloc(sizeof(*ret));
>  	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
> +	unsigned flags = o->color_moved & XDF_WHITESPACE_FLAGS;
>  
> -	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
> +	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
>  	ret->es = l;
>  	ret->next_line = NULL;

These 2 changes looked suspicious to me at first, but then I saw that
moved_entry_cmp() and prepare_entry() are only used during move
detection.

> @@ -1419,7 +1422,10 @@ test_expect_success 'move detection ignoring whitespace ' '
>  	EOF
>  	test_cmp expected actual &&
>  
> -	git diff HEAD --no-renames -w --color-moved --color |
> +	git diff HEAD --no-renames --color-moved --color \
> +		--color-moved-ignore-all-space \
> +		--color-moved-no-ignore-space-change \
> +		--color-moved-no-ignore-space-at-eol |
>  		grep -v "index" |
>  		test_decode_color >actual &&
>  	cat <<-\EOF >expected &&

This change (removal of -w) looked suspicious to me at first, but then I
saw that the "-w" was only used to trigger whitespace insensitivity
during the move detection phase. (The line in question was moved, so "-"
and "+" lines would have been generated in the diff anyway regardless of
whether whitespace was ignored.) Same as to the other test
modifications.

> +test_expect_success 'move detection only ignores white spaces' '

This seems to be "only move detection ignores white spaces", not as
written. I would title this "--color-moved-ignore-all-space only ignores
whitespace during move detection".

> +	q_to_tab <<-\EOF >function.c &&
> +	int func()
> +	{
> +	Qif (foo) {
> +	QQ// this part of the function
> +	QQ// function will be very long
> +	QQ// indeed. We must exceed both
> +	QQ// per-line and number of line
> +	QQ// minimums
> +	QQ;
> +	Q}
> +	Qbaz();
> +	Qbar();
> +	Q// more unrelated stuff
> +	}
> +	EOF

Could the sample data be more concise? In particular, a few lines should
be sufficient for before:

  a long line to exceed per-line minimum
  another long line to exceed per-line minimum
  original file

and after:

  Qa long line to exceed per-line minimum
  Qanother long line to exceed per-line minimum
  new file

Then the ordinary diff (with -w) will only have "-" and "+" for
"original file" and "new file", and the one with only
--color-moved-ignore-all-space would have "-" and "+" for all lines.

> +	# Make sure we get a different diff using -w ("moved function header")

-w is not "moved function header", maybe reword.

  reply	other threads:[~2018-04-03  0:31 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
2018-04-02 22:48 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
2018-04-06 20:04   ` Jeff King
2018-04-06 20:41     ` Stefan Beller
2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
2018-04-02 23:51   ` Jonathan Tan
2018-04-03 18:59     ` Stefan Beller
2018-04-06 21:28     ` Stefan Beller
2018-04-06 22:27       ` Jonathan Tan
2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
2018-04-03 19:49     ` Stefan Beller
2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
2018-04-03 21:05         ` [PATCH] diff: add a blocks mode for moved code detection Stefan Beller
2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
2018-04-03  0:31   ` Jonathan Tan [this message]
2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-03  0:41   ` Jonathan Tan
2018-04-03 19:22     ` Stefan Beller
2018-04-03 20:38       ` Jonathan Tan
2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
2018-04-03  0:03   ` Jacob Keller
2018-04-03 19:00     ` Stefan Beller
2018-04-03 19:55       ` Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180402173135.a6ea2bc1154781abac4dc21b@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=jacob.keller@gmail.com \
    --cc=sbeller@google.com \
    --cc=simon@ruderich.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).