git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [RFC/PATCH 0/2] Color moved code differently
Date: Sun, 4 Sep 2016 01:28:58 -0700	[thread overview]
Message-ID: <CAGZ79kZp1EeDYDNiFGVLYbth8gRAe8=WLP=i1YeUwM-1TX4B9g@mail.gmail.com> (raw)
In-Reply-To: <xmqqpookqi8k.fsf@gitster.mtv.corp.google.com>

On Sat, Sep 3, 2016 at 11:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>>  * On 2/2, doing it at xdiff.c level may be limiting this good idea
>>>    to flourish to its full potential, as the interface is fed only
>>>    one diff_filepair at a time.
>>
>> I realized that after I implemented it. I agree we would want to have
>> it function cross file.
>>
>> So from my current understanding of the code,
>> * diffcore_std would call a new function diffcore_detect_moved(void)
>>    just before diffcore_apply_filter is called.
>> * The new function diffcore_detect_moved would then check if the
>>    diff is a valid textual diff (i.e. real files, not submodules, but
>>    deletion/creation of one file is allowed)
>>    If so we generate the diff internally and as in 2/2 would
>>    hash all added/removed lines with context and store it.
>
> I do not think you should step outside diff_flush().

After looking further into it, I think we need to step outside
or at the very least deep down in the xdl cellar from there,
because we need the context around the line both in the
pre and after image.


>  Only when
> producing textual diff, you would have to run the textual diff
> twice by going over the q twice:
>
>  * The first pass would run diff_flush_patch(), which would call
>    into xdiff the usual way, but the callback from xdiff would
>    capture the removed lines and the added lines without making any
>    output.

The callback doesn't have the context around one line as easily
accessible as the xdl emit function, so rather we'd pass in a flag,
i.e. the hashmap where to store the added/removed lines or NULL.

Both passes (the first to find and the second enhanced pass that
outputs the lines) need to have access to the context of the respective
file.

>
>  * The second pass would run diff_flush_patch(), but the callback
>    from xdiff would be called with additional information, namely,
>    the removed and the added lines captured in the first pass.
>
>  * I suspect that the fn_out_consume() function that is used for a
>    normal case (i.e. when we are not doing this more expensive
>    "moved to/moved from" coloring) can be used for the second pass
>    above (i.e. the "priv" aka "ecbdata" may need to be extended so
>    that it can tell which mode of operation it is asked to perform),
>    but if there is not enough similarity between the second pass of
>    this "moved from/moved to" mode and the normal mode of output, it
>    is also OK to have two different callback functions, i.e. the
>    original one to be used in the normal mode, the second one that
>    knows the "these are moved without modification" coloring.  The
>    callback for the first pass is sufficiently different and I think
>    it is better to invent a new callback function to be used in the
>    first pass, instead of reusing fn_out_consume().

The callback doesn't have easy accessible data IMHO,
e.g. we may get these lines in the callback:

(function context not required:)
---8<---
 {
        int i;
        for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
-               unsigned int val;
-               /*
-                * hex[1]=='\0' is caught when val is checked below,
-                * but if hex[0] is NUL we have to avoid reading
-                * past the end of the string:
-                */
-               if (!hex[0])
-                       return -1;
-               val = (hexval(hex[0]) << 4) | hexval(hex[1]);
-               if (val & ~0xff)
+               int val = hex2chr(hex);
+               if (val < 0)
                        return -1;
                *sha1++ = val;
                hex += 2;
---8<---

And for the first + line (int val = hex2chr(hex);) we need
to hash that line and 2 (made up threshold) before and after:

---8<---
        int i;
        for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
               int val = hex2chr(hex);
               if (val < 0)
                        return -1;
---8<---

and this is hard to reconstruct from the above diff fed into the
callback. So we have to do it in the xdl emit phase.

So I think the wrong part of the initial patch 2/2 is the too
late recording of the moved lines.
So rather whenever we call diff_flush_patch(), we have to first
call diff_prepare_moved_line_detection() first for all file pairs.

>
>    The fn_out_consume() function working in the "second pass of
>    moved from/moved to mode" would inspect line[] and see if it is
>    an added or a removed line, and then:
>
>    - if it is an added line, and it appears as a removed line
>      elsewhere in the patchset (you obtained the information in the
>      first pass), you show it as "this was moved from elsewhere".
>
>    - if it is a removed line, and it appears as an added line
>      elsewhere in the patchset (you obtained the information in the
>      first pass), you show it as "this was moved to elsewhere".
>
> Or something like that.

  reply	other threads:[~2016-09-04  8:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-03  3:31 [RFC/PATCH 0/2] Color moved code differently Stefan Beller
2016-09-03  3:31 ` [PATCH 1/2] diff.c: emit duplicate lines with a different color Stefan Beller
2016-09-03  3:31 ` [RFC/PATCH 2/2] WIP xdiff: markup duplicates differently Stefan Beller
2016-09-03 12:25   ` Jakub Narębski
2016-09-04  5:31     ` Stefan Beller
2016-09-04 10:35       ` Jakub Narębski
2016-09-04  6:48     ` Junio C Hamano
2016-09-03  7:00 ` [RFC/PATCH 0/2] Color moved code differently Junio C Hamano
2016-09-04  5:23   ` Stefan Beller
2016-09-04  6:41     ` Junio C Hamano
2016-09-04  8:28       ` Stefan Beller [this message]
2016-09-04 22:19       ` Junio C Hamano
2016-09-04  9:57 ` Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kZp1EeDYDNiFGVLYbth8gRAe8=WLP=i1YeUwM-1TX4B9g@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).