git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Fwd: possible Improving diff algoritm
       [not found] <CAO54GHC4AXQO1MbU2qXMdcDO5mtUFhrXfXND5evc93kQhNfCrw@mail.gmail.com>
@ 2012-12-12 15:03 ` Kevin
  2012-12-12 18:29   ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Kevin @ 2012-12-12 15:03 UTC (permalink / raw)
  To: git

Regularly I notice that the diffs that are provided (through diff, or
add -p) tend to disconnect changes that belong to each other and
report lines being changed that are not changed.

An example for this is:

     /**
+     * Default parent
+     *
+     * @var int
+     * @access protected
+     * @index
+     */
+    protected $defaultParent;
+
+    /**

I understand this is a valid view of what is changed, but not a very
logical view from the point of the user.

I wondered if there is a way to improve this, or would that have other
consequences.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 15:03 ` Fwd: possible Improving diff algoritm Kevin
@ 2012-12-12 18:29   ` Junio C Hamano
  2012-12-12 18:48     ` Brian J. Murrell
                       ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Junio C Hamano @ 2012-12-12 18:29 UTC (permalink / raw)
  To: Kevin; +Cc: git

Kevin <ikke@ikke.info> writes:

> Regularly I notice that the diffs that are provided (through diff, or
> add -p) tend to disconnect changes that belong to each other and
> report lines being changed that are not changed.
>
> An example for this is:
>
>      /**
> +     * Default parent
> +     *
> +     * @var int
> +     * @access protected
> +     * @index
> +     */
> +    protected $defaultParent;
> +
> +    /**
>
> I understand this is a valid view of what is changed, but not a very
> logical view from the point of the user.
>
> I wondered if there is a way to improve this, or would that have other
> consequences.

I think your example shows a case where the end of the pre-context
matches the end of the added text in the hunk, and it appears it may
produce a better result if you shift the hunk up.  But I think that
works only half the time.  Imagine:

   @@ -K,L +M,N @@
    }
   
   +void new_function(void)
   +{
   +  printf("hello, world.\n");
   +}
   +
    void existing_one(void)
    {
      printf("goodbye, world.\n");

Here the end of the pre-context matches the end of the added lines,
but it will produce worse result if you blindly apply the "shift the
hunk up" trick:

     ... what was before the } we saw in the precontext ...
   +}
   +
   +void new_function(void)
   +{
   +  printf("hello, world.\n");
    }
    
    void existing_one(void)

So I think with s/Regularly/About half the time/, your observation
above is correct.

I think the reason you perceived this as "Regularly" is that you do
not notice nor appreciate it when things go right (half the time),
but you tend to notice and remember only when a wrong side happened
to have been picked (the other half).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 18:29   ` Junio C Hamano
@ 2012-12-12 18:48     ` Brian J. Murrell
  2012-12-12 19:30     ` Kevin
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Brian J. Murrell @ 2012-12-12 18:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin, git

[-- Attachment #1: Type: text/plain, Size: 489 bytes --]

On 12-12-12 01:29 PM, Junio C Hamano wrote:
> 
> Here the end of the pre-context matches the end of the added lines,
> but it will produce worse result if you blindly apply the "shift the
> hunk up" trick:

Yeah.  I would not think a blind shift would be appropriate.  But I
wonder if diff can take whitespace hints about when to shift and when
not to.

It would still never be perfect but might be more accurate than it is
now.  Can't imagine it would be any worse.

b.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 18:29   ` Junio C Hamano
  2012-12-12 18:48     ` Brian J. Murrell
@ 2012-12-12 19:30     ` Kevin
  2012-12-12 20:29     ` Junio C Hamano
  2012-12-12 21:40     ` Morten Welinder
  3 siblings, 0 replies; 18+ messages in thread
From: Kevin @ 2012-12-12 19:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Yeah, I didn't mention it, but I didn't think it was doing this wrong
in a systematic way. I only wondered if there was some kind of
heuristic that could improve the cases where it goes wrong, without
affecting the cases where it would do it right.

I know this is not an easy problem, lest it would already been fixed.

On Wed, Dec 12, 2012 at 7:29 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Kevin <ikke@ikke.info> writes:
>
>> Regularly I notice that the diffs that are provided (through diff, or
>> add -p) tend to disconnect changes that belong to each other and
>> report lines being changed that are not changed.
>>
>> An example for this is:
>>
>>      /**
>> +     * Default parent
>> +     *
>> +     * @var int
>> +     * @access protected
>> +     * @index
>> +     */
>> +    protected $defaultParent;
>> +
>> +    /**
>>
>> I understand this is a valid view of what is changed, but not a very
>> logical view from the point of the user.
>>
>> I wondered if there is a way to improve this, or would that have other
>> consequences.
>
> I think your example shows a case where the end of the pre-context
> matches the end of the added text in the hunk, and it appears it may
> produce a better result if you shift the hunk up.  But I think that
> works only half the time.  Imagine:
>
>    @@ -K,L +M,N @@
>     }
>
>    +void new_function(void)
>    +{
>    +  printf("hello, world.\n");
>    +}
>    +
>     void existing_one(void)
>     {
>       printf("goodbye, world.\n");
>
> Here the end of the pre-context matches the end of the added lines,
> but it will produce worse result if you blindly apply the "shift the
> hunk up" trick:
>
>      ... what was before the } we saw in the precontext ...
>    +}
>    +
>    +void new_function(void)
>    +{
>    +  printf("hello, world.\n");
>     }
>
>     void existing_one(void)
>
> So I think with s/Regularly/About half the time/, your observation
> above is correct.
>
> I think the reason you perceived this as "Regularly" is that you do
> not notice nor appreciate it when things go right (half the time),
> but you tend to notice and remember only when a wrong side happened
> to have been picked (the other half).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 18:29   ` Junio C Hamano
  2012-12-12 18:48     ` Brian J. Murrell
  2012-12-12 19:30     ` Kevin
@ 2012-12-12 20:29     ` Junio C Hamano
  2012-12-12 21:40     ` Morten Welinder
  3 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2012-12-12 20:29 UTC (permalink / raw)
  To: Kevin; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Kevin <ikke@ikke.info> writes:
>
>> Regularly I notice that the diffs that are provided (through diff, or
>> add -p) tend to disconnect changes that belong to each other and
>> report lines being changed that are not changed.
>>
>> An example for this is:
>>
>>      /**
>> +     * Default parent
>> +     *
>> +     * @var int
>> +     * @access protected
>> +     * @index
>> +     */
>> +    protected $defaultParent;
>> +
>> +    /**
>>
>> I understand this is a valid view of what is changed, but not a very
>> logical view from the point of the user.
>>
>> I wondered if there is a way to improve this, or would that have other
>> consequences.

I forgot to mention consequences.  Changing it obviously changes the
shape of the diff, hence changes the patch id.  Anything that caches
output from "git cherry" to match up "identical patches" will need
to discard and repopulate its cache.  Your "rerere" database will go
stale.

Also "kup" tool used at k.org allows an uploader to pretend to
upload an incremental diff between two known commits by only sending
the GPG signature of the diff the uploader generates.  The actual
diff is generated on the k.org machine locally and deposited next to
the GPG signature file, with the expectation that the signture
matches the diff.  Changing the output from diff between two
versions will break the optimization and force the uploader to
upload the diff over the wire.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 18:29   ` Junio C Hamano
                       ` (2 preceding siblings ...)
  2012-12-12 20:29     ` Junio C Hamano
@ 2012-12-12 21:40     ` Morten Welinder
  2012-12-12 21:53       ` Junio C Hamano
  3 siblings, 1 reply; 18+ messages in thread
From: Morten Welinder @ 2012-12-12 21:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin, git

> So I think with s/Regularly/About half the time/, your observation
> above is correct.
>
> I think the reason you perceived this as "Regularly" is that you do
> not notice nor appreciate it when things go right (half the time),
> but you tend to notice and remember only when a wrong side happened
> to have been picked (the other half).

Is there a reason why picking among the choices in a sliding window
must be contents neutral?

I see these "illogical" (== stylistically not matching user intent) diffs
quite often.  C comments (as in the example given) and #ifdef blocks
are typical cases.

Purely anecdotically, I have seen more trouble applying "illogical"
diffs than I would have expected from the corresponding "logical" diffs.

Morten

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 21:40     ` Morten Welinder
@ 2012-12-12 21:53       ` Junio C Hamano
  2012-12-12 22:34         ` Andrew Ardill
                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Junio C Hamano @ 2012-12-12 21:53 UTC (permalink / raw)
  To: Morten Welinder; +Cc: Kevin, git

Morten Welinder <mwelinder@gmail.com> writes:

> Is there a reason why picking among the choices in a sliding window
> must be contents neutral?

Sorry, you might be getting at something interesting but I do not
understand the question.  I have no idea what you mean by "contents
neutral".

Picking between these two choices

         /**                         +    /**                         
    +     * Default parent           +     * Default parent           
    +     *                          +     *                          
    +     * @var int                 +     * @var int                 
    +     * @access protected        +     * @access protected        
    +     * @index                   +     * @index                   
    +     */                         +     */                         
    +    protected $defaultParent;   +    protected $defaultParent;   
    +                                +                                
    +    /**                              /**                         

would not affect the correctness of the patch.  You may pick
whatever you deem the most desirable, but your answer must be a
correct patch (the definition of "correct" here is "applying that
patch to the preimage produces the intended postimage").

And I think if you inserted a block of text B after a context C
where the tail of B matches the tail of C like the above, you can
shift what you treat as "inserted" up and still come up with a
correct patch.

The output being "a correct patch" is not the only thing we need to
consider, though, as I mentioned in another response to Kevin
regarding the "consequences".

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 21:53       ` Junio C Hamano
@ 2012-12-12 22:34         ` Andrew Ardill
  2012-12-12 23:32           ` Javier Domingo
  2012-12-13  0:00         ` Michael Haggerty
  2012-12-13  1:55         ` Morten Welinder
  2 siblings, 1 reply; 18+ messages in thread
From: Andrew Ardill @ 2012-12-12 22:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Morten Welinder, Kevin, git

On 13 December 2012 08:53, Junio C Hamano <gitster@pobox.com> wrote:
> The output being "a correct patch" is not the only thing we need to
> consider, though, as I mentioned in another response to Kevin
> regarding the "consequences".

The main benefit of picking a more 'natural' diff is a usability one.
I know that when a chunk begins and ends one line after the logical
break point (typically with braces in my experience) mentally parsing
the diff becomes significantly harder. If there was a way to teach git
where it should try and break out a chunk (potentially per filetype?)
this is a good thing for readability, and I think would outweigh any
temporary pain with regards to cached rerere and diff data.

Regards,

Andrew Ardill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 22:34         ` Andrew Ardill
@ 2012-12-12 23:32           ` Javier Domingo
  2012-12-12 23:43             ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Javier Domingo @ 2012-12-12 23:32 UTC (permalink / raw)
  To: Andrew Ardill; +Cc: Junio C Hamano, Morten Welinder, Kevin, git

I must say it is _quite_ helpfull having the diffs well done (natural
diffs as here named), just because when you want to review a patch on
the fly, this sort of things are annoying.

I just wanted to say my opinion. No idea on how to fix that, nor why
does it happen.

Javier Domingo


2012/12/12 Andrew Ardill <andrew.ardill@gmail.com>:
> On 13 December 2012 08:53, Junio C Hamano <gitster@pobox.com> wrote:
>> The output being "a correct patch" is not the only thing we need to
>> consider, though, as I mentioned in another response to Kevin
>> regarding the "consequences".
>
> The main benefit of picking a more 'natural' diff is a usability one.
> I know that when a chunk begins and ends one line after the logical
> break point (typically with braces in my experience) mentally parsing
> the diff becomes significantly harder. If there was a way to teach git
> where it should try and break out a chunk (potentially per filetype?)
> this is a good thing for readability, and I think would outweigh any
> temporary pain with regards to cached rerere and diff data.
>
> Regards,
>
> Andrew Ardill
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 23:32           ` Javier Domingo
@ 2012-12-12 23:43             ` Junio C Hamano
  2012-12-12 23:49               ` Javier Domingo
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2012-12-12 23:43 UTC (permalink / raw)
  To: Javier Domingo; +Cc: Andrew Ardill, Morten Welinder, Kevin, git

Javier Domingo <javierdo1@gmail.com> writes:

> I must say it is _quite_ helpfull having the diffs well done (natural
> diffs as here named), just because when you want to review a patch on
> the fly, this sort of things are annoying.

I do not think anybody is arguing that it would not help the human
users to shift the hunk boundaries in the case Kevin's original
message demonstrated.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 23:43             ` Junio C Hamano
@ 2012-12-12 23:49               ` Javier Domingo
  0 siblings, 0 replies; 18+ messages in thread
From: Javier Domingo @ 2012-12-12 23:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andrew Ardill, Morten Welinder, Kevin, git

So, how can it be fixed? or is diff command's problem?
Javier Domingo


2012/12/13 Junio C Hamano <gitster@pobox.com>:
> Javier Domingo <javierdo1@gmail.com> writes:
>
>> I must say it is _quite_ helpfull having the diffs well done (natural
>> diffs as here named), just because when you want to review a patch on
>> the fly, this sort of things are annoying.
>
> I do not think anybody is arguing that it would not help the human
> users to shift the hunk boundaries in the case Kevin's original
> message demonstrated.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 21:53       ` Junio C Hamano
  2012-12-12 22:34         ` Andrew Ardill
@ 2012-12-13  0:00         ` Michael Haggerty
  2012-12-13  1:55         ` Morten Welinder
  2 siblings, 0 replies; 18+ messages in thread
From: Michael Haggerty @ 2012-12-13  0:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Morten Welinder, Kevin, git

On 12/12/2012 10:53 PM, Junio C Hamano wrote:
> Morten Welinder <mwelinder@gmail.com> writes:
> 
>> Is there a reason why picking among the choices in a sliding window
>> must be contents neutral?
> 
> Sorry, you might be getting at something interesting but I do not
> understand the question.  I have no idea what you mean by "contents
> neutral".
> 
> Picking between these two choices
> 
>          /**                         +    /**                         
>     +     * Default parent           +     * Default parent           
>     +     *                          +     *                          
>     +     * @var int                 +     * @var int                 
>     +     * @access protected        +     * @access protected        
>     +     * @index                   +     * @index                   
>     +     */                         +     */                         
>     +    protected $defaultParent;   +    protected $defaultParent;   
>     +                                +                                
>     +    /**                              /**                         
> 
> would not affect the correctness of the patch.  You may pick
> whatever you deem the most desirable, but your answer must be a
> correct patch (the definition of "correct" here is "applying that
> patch to the preimage produces the intended postimage").
> 
> And I think if you inserted a block of text B after a context C
> where the tail of B matches the tail of C like the above, you can
> shift what you treat as "inserted" up and still come up with a
> correct patch.

I have the feeling that a few crude heuristics would go a long way
towards improving diffs like this.  For example:

* Prefer to have an add/remove block that has balanced begin/end pairs
(where begin/end pairs might be opening and closing parentheses,
brackets, braces, and angle brackets, "/*" and "*/", and perhaps a
couple of other things.  For SGML-like text begin and end tags could be
matched up.

It would be possible to read these begin/end pairs from a
filetype-specific table or configuration setting, though this would add
complication and would also make it possible that diffs generated by two
different people are not identical if their configurations differ.

* Prefer to have a block where the first non-blank line of the block and
the first non-blank line after the block are indented by the same amount.

* Prefer to have a block with trailing (as opposed to leading or
embedded) blank lines--the more the better.

The beautiful thing is that even if the heuristics sometimes fail, the
correctness of the patch (in the sense that you have defined) is not
compromised.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: possible Improving diff algoritm
  2012-12-12 21:53       ` Junio C Hamano
  2012-12-12 22:34         ` Andrew Ardill
  2012-12-13  0:00         ` Michael Haggerty
@ 2012-12-13  1:55         ` Morten Welinder
  2012-12-13  4:58           ` Geert Bosch
  2 siblings, 1 reply; 18+ messages in thread
From: Morten Welinder @ 2012-12-13  1:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin, git

>> Is there a reason why picking among the choices in a sliding window
>> must be contents neutral?
>
> Sorry, you might be getting at something interesting but I do not
> understand the question.  I have no idea what you mean by "contents
> neutral".

I was merely asking if an algorithm to pick between the
2+ choices was allowed to look at the contents of the
lines.

I.e., an algorithm would look at the C comment
example and determine that the choice starting containing
a full inserted comment is preferable over the one that
appears to close one comment and open a new.

And the in inserted-function case it would prefer the one
where the matching { and } are in correct order.

Morten

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: possible Improving diff algoritm
  2012-12-13  1:55         ` Morten Welinder
@ 2012-12-13  4:58           ` Geert Bosch
  2012-12-13  6:26             ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Geert Bosch @ 2012-12-13  4:58 UTC (permalink / raw)
  To: Morten Welinder; +Cc: Junio C Hamano, Kevin, git


On Dec 12, 2012, at 20:55, Morten Welinder <mwelinder@gmail.com> wrote:
> I was merely asking if an algorithm to pick between the
> 2+ choices was allowed to look at the contents of the
> lines.
> 
> I.e., an algorithm would look at the C comment
> example and determine that the choice starting containing
> a full inserted comment is preferable over the one that
> appears to close one comment and open a new.
> 
> And the in inserted-function case it would prefer the one
> where the matching { and } are in correct order.

        /**                         +    /**                         
   +     * Default parent           +     * Default parent           
   +     *                          +     *                          
   +     * @var int                 +     * @var int                 
   +     * @access protected        +     * @access protected        
   +     * @index                   +     * @index                   
   +     */                         +     */                         
   +    protected $defaultParent;   +    protected $defaultParent;   
   +                                +                                
   +    /**                              /**                         

It would seem that just looking at the line length (stripped) of
the last line, might be sufficient for cost function to minimize.
Here the some would be 3 vs 0. In case of ties, use the last
possibility with minimum cost.

I think it would be nice if the cost function we choose does not
depend on file type, as that is something that is very dependent
on the exact local configuration and might hinder comparison of
patches. If something really simple gets us 90% there, that would
be preferable over extra complexity.

  -Geert

Junio's other example:

   }

  +void new_function(void)
  +{
  +  printf("hello, world.\n");
  +}
  +
   void existing_one(void)
   {
     printf("goodbye, world.\n");

=> Cost 0

  +}
  +
  +void new_function(void)
  +{
  +  printf("hello, world.\n");
   }
=> Cost 27

Kevin's example:
    /**
+     * Default parent
+     *
+     * @var int
+     * @access protected
+     * @index
+     */
+    protected $defaultParent;
+
+    /**

=> Cost 3

+   /**
+     * Default parent
+     *
+     * @var int
+     * @access protected
+     * @index
+     */
+    protected $defaultParent;
+
     /**
=> cost 0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: possible Improving diff algoritm
  2012-12-13  4:58           ` Geert Bosch
@ 2012-12-13  6:26             ` Junio C Hamano
  2012-12-14 12:20               ` Javier Domingo
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2012-12-13  6:26 UTC (permalink / raw)
  To: Geert Bosch; +Cc: Morten Welinder, Kevin, git

Geert Bosch <bosch@adacore.com> writes:

> It would seem that just looking at the line length (stripped) of
> the last line, might be sufficient for cost function to minimize.
> Here the some would be 3 vs 0. In case of ties, use the last
> possibility with minimum cost.

-- 8< --
#ifdef A

some stuff
about A

#endif
#ifdef Z

some more stuff
about Z

#endif
-- >8 --

If you insert a block for M following the existing formatting
convention in the middle, your heuristics will pick the blank line
after "about A" as having minimum cost, no?

You inherently have to know the nature of the payload, as your eyes
that judge the result use that knowledge when doing so, I am afraid.
I think your "define a function that gives a good score to lines
that are likely to be good breaking points" idea has merit, but I
think that should be tied to the content type, most likely via the
attribute mechanism.

In any case, I consider this as a low-impact (as Michael Haggerty
noted, it is impossible to introduce a bug that subtly break the
output; your result is either totally borked or is correct) and
low-hanging fruit (it can be done as a postprocessing phase after
the xdiff machinery has done the heavy-lifting of computing LCA), if
somebody wants to experiment and implement one.  As long as the new
heuristics is hidden behind an explicit command line option to avoid
other "consequences", I wouldn't discourage interested parties from
working on it.  It is not just my itch, though.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: possible Improving diff algoritm
  2012-12-13  6:26             ` Junio C Hamano
@ 2012-12-14 12:20               ` Javier Domingo
  2012-12-14 22:29                 ` Bernhard R. Link
  0 siblings, 1 reply; 18+ messages in thread
From: Javier Domingo @ 2012-12-14 12:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Geert Bosch, Morten Welinder, Kevin, git

I think the idea of being preferable to have a blank line at the end
of the added/deleted block is key in this case.

Javier Domingo


2012/12/13 Junio C Hamano <gitster@pobox.com>:
> Geert Bosch <bosch@adacore.com> writes:
>
>> It would seem that just looking at the line length (stripped) of
>> the last line, might be sufficient for cost function to minimize.
>> Here the some would be 3 vs 0. In case of ties, use the last
>> possibility with minimum cost.
>
> -- 8< --
> #ifdef A
>
> some stuff
> about A
>
> #endif
> #ifdef Z
>
> some more stuff
> about Z
>
> #endif
> -- >8 --
>
> If you insert a block for M following the existing formatting
> convention in the middle, your heuristics will pick the blank line
> after "about A" as having minimum cost, no?
>
> You inherently have to know the nature of the payload, as your eyes
> that judge the result use that knowledge when doing so, I am afraid.
> I think your "define a function that gives a good score to lines
> that are likely to be good breaking points" idea has merit, but I
> think that should be tied to the content type, most likely via the
> attribute mechanism.
>
> In any case, I consider this as a low-impact (as Michael Haggerty
> noted, it is impossible to introduce a bug that subtly break the
> output; your result is either totally borked or is correct) and
> low-hanging fruit (it can be done as a postprocessing phase after
> the xdiff machinery has done the heavy-lifting of computing LCA), if
> somebody wants to experiment and implement one.  As long as the new
> heuristics is hidden behind an explicit command line option to avoid
> other "consequences", I wouldn't discourage interested parties from
> working on it.  It is not just my itch, though.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: possible Improving diff algoritm
  2012-12-14 12:20               ` Javier Domingo
@ 2012-12-14 22:29                 ` Bernhard R. Link
  2012-12-15 12:16                   ` Javier Domingo
  0 siblings, 1 reply; 18+ messages in thread
From: Bernhard R. Link @ 2012-12-14 22:29 UTC (permalink / raw)
  To: Javier Domingo; +Cc: Junio C Hamano, Geert Bosch, Morten Welinder, Kevin, git

* Javier Domingo <javierdo1@gmail.com> [121214 13:20]:
> I think the idea of being preferable to have a blank line at the end
> of the added/deleted block is key in this case.

For symmetry I'd suggest to make it preferable to have blank lines
at the end or the beginning.

  {
  old
+ }
+
+ {
+ new
  }

vs

  {
  old
  }
+
+ {
+ new
+ }

is just the same case in blue.
(Although empty lines alone feels not quite optimal, it is at least a
good start).

        Bernhard R. Link

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: possible Improving diff algoritm
  2012-12-14 22:29                 ` Bernhard R. Link
@ 2012-12-15 12:16                   ` Javier Domingo
  0 siblings, 0 replies; 18+ messages in thread
From: Javier Domingo @ 2012-12-15 12:16 UTC (permalink / raw)
  To: Bernhard R. Link; +Cc: Junio C Hamano, Geert Bosch, Morten Welinder, Kevin, git

If just empty lines alone, then it is forced to say there is a new
line. That is beyond this use-case.
Javier Domingo


2012/12/14 Bernhard R. Link <brl+git@mail.brlink.eu>:
> * Javier Domingo <javierdo1@gmail.com> [121214 13:20]:
>> I think the idea of being preferable to have a blank line at the end
>> of the added/deleted block is key in this case.
>
> For symmetry I'd suggest to make it preferable to have blank lines
> at the end or the beginning.
>
>   {
>   old
> + }
> +
> + {
> + new
>   }
>
> vs
>
>   {
>   old
>   }
> +
> + {
> + new
> + }
>
> is just the same case in blue.
> (Although empty lines alone feels not quite optimal, it is at least a
> good start).
>
>         Bernhard R. Link

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-12-15 12:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAO54GHC4AXQO1MbU2qXMdcDO5mtUFhrXfXND5evc93kQhNfCrw@mail.gmail.com>
2012-12-12 15:03 ` Fwd: possible Improving diff algoritm Kevin
2012-12-12 18:29   ` Junio C Hamano
2012-12-12 18:48     ` Brian J. Murrell
2012-12-12 19:30     ` Kevin
2012-12-12 20:29     ` Junio C Hamano
2012-12-12 21:40     ` Morten Welinder
2012-12-12 21:53       ` Junio C Hamano
2012-12-12 22:34         ` Andrew Ardill
2012-12-12 23:32           ` Javier Domingo
2012-12-12 23:43             ` Junio C Hamano
2012-12-12 23:49               ` Javier Domingo
2012-12-13  0:00         ` Michael Haggerty
2012-12-13  1:55         ` Morten Welinder
2012-12-13  4:58           ` Geert Bosch
2012-12-13  6:26             ` Junio C Hamano
2012-12-14 12:20               ` Javier Domingo
2012-12-14 22:29                 ` Bernhard R. Link
2012-12-15 12:16                   ` Javier Domingo

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).