git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Stefan Beller <sbeller@google.com>
Cc: git <git@vger.kernel.org>, "Simon Ruderich" <simon@ruderich.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jacob Keller" <jacob.keller@gmail.com>
Subject: Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
Date: Tue, 24 Apr 2018 17:11:23 -0700
Message-ID: <20180424171123.7092788b94498908c25eccf0@google.com> (raw)
In-Reply-To: <CAGZ79kbQUWq_pfvwLWotqCxc1-Y=ctJ4SqgqR+EvJ5wkJkp5kQ@mail.gmail.com>

On Tue, 24 Apr 2018 16:23:28 -0700
Stefan Beller <sbeller@google.com> wrote:

> >> s/missmatch/mismatch/
> >> Also, what is this used for?
> >
> > The mismatch should be used for (thanks for catching!)
> > checking if the remainder of a line is the same, although a boolean
> > may be not the correct choice. We know that the two strings
> > passed into compute_ws_delta come from a complete white space
> > agnostic comparison, so consider:
> >
> > + SP SP more TAB more
> > + SP SP text TAB text
> >
> > - SP more TAB more
> > - SP text TAB text
> >
> > which would mark this as a moved block. This is the feature
> > working as intended, but what about
> >
> > + SP SP more TAB more
> > + SP SP text TAB text
> >
> > - SP more SP more
> > - SP text TAB text
> >
> > Note how the length of the strings is the same, hence the current
> > code of
> >
> >     compute_ws_delta(...) {
> >         int d = longer->len - shorter->len;
> >         out->string = xmemdupz(longer->line, d);
> >     }
> >
> > would work fine and not notice the "change in the remainder".
> > That ought to be caught by the mismatch variable, that
> > is assigned, but not used.
> >
> > The compare_ws_delta(a, b) needs to be extended to
> >
> >   !a->mismatch && !b->mismatch && existing_condition
> >
> > Ideally the example from above would be different depending
> > on whether the other white space flags are given or not.

Thanks - this gives me food for thought.

I'm starting to think that it is impossible to avoid creating our own
string comparison function that:
 - seeks to the first non-whitespace character in both strings
 - checks that both strings, from that first non-whitespace characters,
   are equal for some definition of equal (whether through strcmp or
   xdiff_compare_lines)
 - walks backwards from that first non-whitespace characters to look for
   the first non-matching whitespace character between the 2 strings

The existing diff whitespace modes (to be passed to xdiff_compare_lines)
do not give the exact result we want. For example, if
XDF_IGNORE_WHITESPACE is used (as is in this patch), lines like "a b"
and "ab " would match even though they shouldn't.

This comparison function can be used both in moved_entry_cmp() and when
constructing the ws_delta (in which case it should be made to output
whatever information is needed as out parameters or similar).

> >>> +     if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
> >>> +             /*
> >>> +              * As there is not specific white space config given,
> >>> +              * we'd need to check for a new block, so ignore all
> >>> +              * white space. The setup of the white space
> >>> +              * configuration for the next block is done else where
> >>> +              */
> >>> +             flags |= XDF_IGNORE_WHITESPACE;
> >>> +
> >>>       return !xdiff_compare_lines(a->es->line, a->es->len,
> >>>                                   b->es->line, b->es->len,
> >>>                                   flags);

We can't add XDF_IGNORE_WHITESPACE here - as far as I can tell, this
means that more lines will be treated as moved than the user wants (if
the user did not set --color-moved-ignore-all-space).

> >> It seems like pmb and wsd are parallel arrays - could each wsd be
> >> embedded into the corresponding entry of pmb instead?
> >
> > I'll explore that. It sounds like a good idea for code hygiene.
> > Although if you do not intend to use this feature, then keeping it separate
> > would allow for a smaller footprint in memory.

If you're worried about memory, wsd can be embedded as a pointer.

> > The command is missing a --color-moved, as the --color-moved-whitespace-settings
> > do not imply --color-moved, yet(?)

Maybe one should imply the other (or at least a warning).

  reply index

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
2018-04-24 21:03 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
2018-04-24 21:03 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2018-04-24 21:03 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
2018-04-24 21:03 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
2018-04-24 21:03 ` [PATCH 5/7] diff.c: add a blocks mode for moved code detection Stefan Beller
2018-04-24 21:50   ` Jonathan Tan
2018-04-24 22:37     ` Stefan Beller
2018-04-24 22:59       ` Jonathan Tan
2018-04-24 21:03 ` [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
2018-04-24 22:00   ` Jonathan Tan
2018-04-24 22:19     ` Stefan Beller
2018-04-24 21:03 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-24 22:35   ` Jonathan Tan
     [not found]     ` <CAGZ79kbGkHFSS9K8KKTwNikx3Tw+m+RMLY3RAf8SW_iK9a2OJQ@mail.gmail.com>
2018-04-24 23:23       ` Stefan Beller
2018-04-25  0:11         ` Jonathan Tan [this message]
2018-04-24 22:37 ` [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
2018-04-24 22:58   ` Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2018-04-02 22:48 [RFC PATCH " Stefan Beller
2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-03  0:41   ` Jonathan Tan
2018-04-03 19:22     ` Stefan Beller
2018-04-03 20:38       ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180424171123.7092788b94498908c25eccf0@google.com \
    --to=jonathantanmy@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jacob.keller@gmail.com \
    --cc=sbeller@google.com \
    --cc=simon@ruderich.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git