git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Stefan Beller <sbeller@google.com>
Cc: Jacob Keller <jacob.keller@gmail.com>,
	Simon Ruderich <simon@ruderich.org>, git <git@vger.kernel.org>
Subject: Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
Date: Tue, 3 Apr 2018 13:38:14 -0700	[thread overview]
Message-ID: <20180403133814.4845c4cc8cc2c9c749e2ada5@google.com> (raw)
In-Reply-To: <CAGZ79kYPx=OB92yWFw8W=zJDoJT5BjBd3Q+gt_WS_KuUw299vQ@mail.gmail.com>

On Tue, 3 Apr 2018 12:22:32 -0700
Stefan Beller <sbeller@google.com> wrote:

> On Mon, Apr 2, 2018 at 5:41 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> > On Mon,  2 Apr 2018 15:48:54 -0700
> > Stefan Beller <sbeller@google.com> wrote:
> >
> >> +struct ws_delta {
> >> +     int deltachars;
> >> +     char firstchar;
> >> +};
> >
> > I'll just make some overall design comments.
> >
> > Shouldn't this be a string of characters (or a char* and len) and
> > whether it was added or removed? If you're only checking the first
> > character, this would not work if the other characters were different.
> 
> I considered diving into this, but it seemed to be too complicated for
> >95 % of the use cases, which can be approximated in change of the
> first character.

It's true that most use cases can be approximated this way, but I don't
think that it's worth the approximation.

> Because if we take a string of characters, we'd also need to take care of
> tricky conversions (e.g. Are 8 white spaces equal to a tab, and if so do
> we break blocks if one line converts 8 ws to a tab?)

No conversions - spaces are spaces and tabs are tabs.

> So I would definitely pursue the string instead of change of first
> character, but what are all the heuristics to put in?

No heuristics - a few lines make a block if the same prefix (which
consists of all whitespace) was added or removed.

> Just to be clear: The string would contain only the change in
> white space up front, or would we also somehow store white space
> in other parts?

Only change in white space at the start of the line - this option only
handles space at the start of the line, right?

> - # This is a sample comment
> - # across multiple lines
> - # maybe even a license header
> + #     This is a sample comment
> + #     across multiple lines
> + #     maybe even a license header
> 
> How about this?

My understanding is that this patch does not handle this case.

> >> @@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
> >>       const struct diff_options *diffopt = hashmap_cmp_fn_data;
> >>       const struct moved_entry *a = entry;
> >>       const struct moved_entry *b = entry_or_key;
> >> +     unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
> >> +
> >> +     if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
> >> +             /*
> >> +              * As there is not specific white space config given,
> >> +              * we'd need to check for a new block, so ignore all
> >> +              * white space. The setup of the white space
> >> +              * configuration for the next block is done else where
> >> +              */
> >> +             flags |= XDF_IGNORE_WHITESPACE;
> >>
> >>       return !xdiff_compare_lines(a->es->line, a->es->len,
> >>                                   b->es->line, b->es->len,
> >> -                                 diffopt->color_moved & XDF_WHITESPACE_FLAGS);
> >> +                                 flags);
> >>  }
> >
> > I think we should just prohibit combining this with any of the
> > whitespace ignoring flags except for the space-at-eol one. They seem to
> > contradict anyway.
> 
> ok, we can narrow this one down to ignore all white space.

What do you mean? The rationale for my comment is that I saw that you
need to specify a special flag to xdiff_compare_lines if
COLOR_MOVED_DELTA_WHITESPACES is set, which could conflict with other
flags that the user has explicitly set. So avoiding that case entirely
seems like a good idea, especially since it is logical to do so.

> >> +test_expect_success 'compare whitespace delta across moved blocks' '
> >> +
> >> +     git reset --hard &&
> >> +     q_to_tab <<-\EOF >text.txt &&
> >> +     QIndented
> >> +     QText
> >> +     Qacross
> >> +     Qfive
> >> +     Qlines
> >> +     QBut!
> >> +     Qthis
> >> +     QQone
> >> +     Qline
> >> +     QQdid
> >> +     Qnot
> >> +     QQadjust
> >> +     EOF
> >
> > Do we need 5 lines? I thought 2 would suffice. (It's the ALNUM_COUNT
> > that matters, as far as I know.) This makes it hard to see that the
> > "But!" line is the one that counts.
> 
> I did not want to go with the bare minimum as then adjusting the minimum
> would be a pain as these unrelated (to the minimum) test cases would
> break.

That is true, but it makes the test case harder to read now. If you're
worried about bumping into the minimum if we do adjust the minimum,
making the lines longer should be sufficient.

> >> +test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '
> 
> >> +     EOF
> >
> > If the objective it just to show that the functions f and g are treated
> > as one unit despite their lines being of multiple indentation levels,
> > the test file could be much shorter.
> 
> yeah, I noticed that we already test that in the test above where we
> have that test after the "But!", where lines ziggy-zag. Will drop this test.

OK, sounds good.

  reply	other threads:[~2018-04-03 20:38 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
2018-04-02 22:48 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
2018-04-06 20:04   ` Jeff King
2018-04-06 20:41     ` Stefan Beller
2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
2018-04-02 23:51   ` Jonathan Tan
2018-04-03 18:59     ` Stefan Beller
2018-04-06 21:28     ` Stefan Beller
2018-04-06 22:27       ` Jonathan Tan
2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
2018-04-03 19:49     ` Stefan Beller
2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
2018-04-03 21:05         ` [PATCH] diff: add a blocks mode for moved code detection Stefan Beller
2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
2018-04-03  0:31   ` Jonathan Tan
2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-03  0:41   ` Jonathan Tan
2018-04-03 19:22     ` Stefan Beller
2018-04-03 20:38       ` Jonathan Tan [this message]
2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
2018-04-03  0:03   ` Jacob Keller
2018-04-03 19:00     ` Stefan Beller
2018-04-03 19:55       ` Jacob Keller
  -- strict thread matches above, loose matches on Subject: below --
2018-04-24 21:03 [PATCHv2 " Stefan Beller
2018-04-24 21:03 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-24 22:35   ` Jonathan Tan
     [not found]     ` <CAGZ79kbGkHFSS9K8KKTwNikx3Tw+m+RMLY3RAf8SW_iK9a2OJQ@mail.gmail.com>
2018-04-24 23:23       ` Stefan Beller
2018-04-25  0:11         ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180403133814.4845c4cc8cc2c9c749e2ada5@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=jacob.keller@gmail.com \
    --cc=sbeller@google.com \
    --cc=simon@ruderich.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).