git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Consequences of CRLF in index?
@ 2017-10-24 17:48 Lars Schneider
  2017-10-24 18:14 ` Jonathan Nieder
  2017-10-24 21:04 ` Johannes Sixt
  0 siblings, 2 replies; 44+ messages in thread
From: Lars Schneider @ 2017-10-24 17:48 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen, Jeff King, Johannes Schindelin

Hi,

I've migrated a large repo (110k+ files) with a lot of history (177k commits) 
and a lot of users (200+) to Git. Unfortunately, all text files in the index
of the repo have CRLF line endings. In general this seems not to be a problem 
as the project is developed exclusively on Windows.

However, I wonder if there are any "hidden consequences" of this setup?
If there are consequences, then I see two options. Either I rebase the repo 
and fix the line endings for all commits or I add a single commit that fixes 
the line endings for all files. Both actions require coordination with the 
users to avoid repo trouble/merge conflicts. The "single fixup commit" options 
would also make diffs into the past look bad. Would a single large commit have
any impact on the performance of standard Git operations?

Thanks,
Lars



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-24 17:48 Consequences of CRLF in index? Lars Schneider
@ 2017-10-24 18:14 ` Jonathan Nieder
  2017-10-24 19:02   ` Torsten Bögershausen
                     ` (2 more replies)
  2017-10-24 21:04 ` Johannes Sixt
  1 sibling, 3 replies; 44+ messages in thread
From: Jonathan Nieder @ 2017-10-24 18:14 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Torsten Bögershausen, Jeff King, Johannes Schindelin

Hi,

Lars Schneider wrote:

> I've migrated a large repo (110k+ files) with a lot of history (177k commits)
> and a lot of users (200+) to Git. Unfortunately, all text files in the index
> of the repo have CRLF line endings. In general this seems not to be a problem
> as the project is developed exclusively on Windows.

Sounds good.

> However, I wonder if there are any "hidden consequences" of this setup?
> If there are consequences, then I see two options. Either I rebase the repo
> and fix the line endings for all commits or I add a single commit that fixes
> the line endings for all files. Both actions require coordination with the
> users to avoid repo trouble/merge conflicts. The "single fixup commit" options
> would also make diffs into the past look bad. Would a single large commit have
> any impact on the performance of standard Git operations?

There are no hidden consequences that I'm aware of.  If you later
decide that you want to become a cross-platform project, then you may
want to switch to LF endings, in which case I suggest the "single
fixup commit" strategy.

In any event, you also probably want to declare what you're doing
using .gitattributes.  By checking in the files as CRLF, you are
declaring that you do *not* want Git to treat them as text files
(i.e., you do not want Git to change the line endings), so something
as simple as

	* -text

should do the trick.  See gitattributes(5) for details.

I'd be interested to hear what happens when diff-ing across a line
ending fixup commit.  Is this an area where Git needs some
improvement?  "git merge" knows an -Xrenormalize option to deal with a
related problem --- it's possible that "git diff" needs to learn a
similar trick.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-24 18:14 ` Jonathan Nieder
@ 2017-10-24 19:02   ` Torsten Bögershausen
  2017-10-25 12:13     ` Johannes Schindelin
  2017-10-25  1:51   ` Junio C Hamano
  2017-10-25 17:04   ` Consequences of CRLF in index? Lars Schneider
  2 siblings, 1 reply; 44+ messages in thread
From: Torsten Bögershausen @ 2017-10-24 19:02 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Lars Schneider, git, Jeff King, Johannes Schindelin

On Tue, Oct 24, 2017 at 11:14:15AM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Lars Schneider wrote:
> 
> > I've migrated a large repo (110k+ files) with a lot of history (177k commits)
> > and a lot of users (200+) to Git. Unfortunately, all text files in the index
> > of the repo have CRLF line endings. In general this seems not to be a problem
> > as the project is developed exclusively on Windows.
> 
> Sounds good.
> 
> > However, I wonder if there are any "hidden consequences" of this setup?
> > If there are consequences, then I see two options. Either I rebase the repo
> > and fix the line endings for all commits or I add a single commit that fixes
> > the line endings for all files. Both actions require coordination with the
> > users to avoid repo trouble/merge conflicts. The "single fixup commit" options
> > would also make diffs into the past look bad. Would a single large commit have
> > any impact on the performance of standard Git operations?
> 
> There are no hidden consequences that I'm aware of.  If you later
> decide that you want to become a cross-platform project, then you may
> want to switch to LF endings, in which case I suggest the "single
> fixup commit" strategy.
> 
> In any event, you also probably want to declare what you're doing
> using .gitattributes.  By checking in the files as CRLF, you are
> declaring that you do *not* want Git to treat them as text files
> (i.e., you do not want Git to change the line endings), so something
> as simple as
> 
> 	* -text
> 
> should do the trick.  See gitattributes(5) for details.

Exactly.
The "hidden consequence" may be the other way around:
If you don't specify .gitattributes, then all people who have core.autocrlf=true
will suffer from a runtime penalty.
core.autocrlf=true means "auto".
At each checkout Git needs to figure out that the file has CRLF in the repo,
so that there is no conversion done.
The same happens on "git add" or "git commit" (and sometimes on "git status").

The penalty may be low, but Dscho once reported that it is measurable & painful
on a "big repo".

The other advantage of "* -text" in .gitattributes is that new files are handled
consistantly accross developers.

Those who have "core.autocrlf=false" would produce commits with CRLF for new
files, and those developpers who have core.autocrlf=true would produce files
with LF in the index and CRLF in the worktree.
This may (most probably will) cause confusion later, when things are pushed and
pulled.

> 
> I'd be interested to hear what happens when diff-ing across a line
> ending fixup commit.  Is this an area where Git needs some
> improvement?  "git merge" knows an -Xrenormalize option to deal with a
> related problem --- it's possible that "git diff" needs to learn a
> similar trick.

That is a tricky thing.
Sometimes you want to see the CLRF - LF as a diff, (represented as "^M"),
and sometimes not.

All in all "* -text" is a good choice for Windows-only projects.

> 
> Thanks and hope that helps,
> Jonathan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-24 17:48 Consequences of CRLF in index? Lars Schneider
  2017-10-24 18:14 ` Jonathan Nieder
@ 2017-10-24 21:04 ` Johannes Sixt
  2017-10-25 12:19   ` Johannes Schindelin
  1 sibling, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2017-10-24 21:04 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Torsten Bögershausen, Jeff King, Johannes Schindelin

Am 24.10.2017 um 19:48 schrieb Lars Schneider:
> I've migrated a large repo (110k+ files) with a lot of history (177k commits)
> and a lot of users (200+) to Git. Unfortunately, all text files in the index
> of the repo have CRLF line endings. In general this seems not to be a problem
> as the project is developed exclusively on Windows.
> 
> However, I wonder if there are any "hidden consequences" of this setup?

I've been working on a project with CRLF in every source file for a 
decade now. It's C++ source, and it isn't even Windows-only: when 
checked out on Linux, there are CRs in the files, with no bad 
consequences so far. GCC is happy with them.

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-24 18:14 ` Jonathan Nieder
  2017-10-24 19:02   ` Torsten Bögershausen
@ 2017-10-25  1:51   ` Junio C Hamano
  2017-10-25  4:53     ` Junio C Hamano
  2017-10-25 17:04   ` Consequences of CRLF in index? Lars Schneider
  2 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2017-10-25  1:51 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Lars Schneider, git, Torsten Bögershausen, Jeff King,
	Johannes Schindelin

Jonathan Nieder <jrnieder@gmail.com> writes:

> I'd be interested to hear what happens when diff-ing across a line
> ending fixup commit.  Is this an area where Git needs some
> improvement?  "git merge" knows an -Xrenormalize option to deal with a
> related problem --- it's possible that "git diff" needs to learn a
> similar trick.

My knee-jerk reaction is that (1) the end user definitely wants to
see preimage and postimage lines are different in such a commit by
default, one side has and the other side lacks ^M at the end., and
(2) one of the "whitespace ignoring" options (namely, "ignore space
at eol") may suffice, but if not, it should be easy to invent a new
one "ignore CR at eol".

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-25  1:51   ` Junio C Hamano
@ 2017-10-25  4:53     ` Junio C Hamano
  2017-10-25 16:44       ` Stefan Beller
  2017-11-07  6:40       ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Junio C Hamano
  0 siblings, 2 replies; 44+ messages in thread
From: Junio C Hamano @ 2017-10-25  4:53 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Lars Schneider, git, Torsten Bögershausen, Jeff King,
	Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>> I'd be interested to hear what happens when diff-ing across a line
>> ending fixup commit.  Is this an area where Git needs some
>> improvement?  "git merge" knows an -Xrenormalize option to deal with a
>> related problem --- it's possible that "git diff" needs to learn a
>> similar trick.
>
> My knee-jerk reaction is that (1) the end user definitely wants to
> see preimage and postimage lines are different in such a commit by
> default, one side has and the other side lacks ^M at the end., and
> (2) one of the "whitespace ignoring" options (namely, "ignore space
> at eol") may suffice, but if not, it should be easy to invent a new
> one "ignore CR at eol".

Here is a lunch-time hack to add that option.  As you can see from
the lack of docs, tests and a proper log message, I haven't played
with it long enough to know how buggy it is, though.  I am not all
that interested in polishing it to completion myself and prefer to
leave it as #leftoverbits ;-)

Also I didn't bother teaching this to Stefan's color-moved thing, as
the line comparator it uses will hopefully be unified with the one I
am touching in xdiff/ with this patch.

Have fun.

 diff.c            |  5 ++++-
 merge-recursive.c |  2 ++
 xdiff/xdiff.h     |  3 ++-
 xdiff/xutils.c    | 34 ++++++++++++++++++++++++++++++++--
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 6fd288420b..eeca0fd3b8 100644
--- a/diff.c
+++ b/diff.c
@@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
 
 	if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
 	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
-	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
+	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
+	    DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))
 		DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
 	else
 		DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);
@@ -4659,6 +4660,8 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_CHANGE);
 	else if (!strcmp(arg, "--ignore-space-at-eol"))
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
+	else if (!strcmp(arg, "--ignore-cr-at-eol"))
+		DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
 	else if (!strcmp(arg, "--indent-heuristic"))
diff --git a/merge-recursive.c b/merge-recursive.c
index 1d3f8f0d22..4a49ec93b1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2251,6 +2251,8 @@ int parse_merge_opt(struct merge_options *o, const char *s)
 		DIFF_XDL_SET(o, IGNORE_WHITESPACE);
 	else if (!strcmp(s, "ignore-space-at-eol"))
 		DIFF_XDL_SET(o, IGNORE_WHITESPACE_AT_EOL);
+	else if (!strcmp(s, "ignore-cr-at-eol"))
+		DIFF_XDL_SET(o, IGNORE_CR_AT_EOL);
 	else if (!strcmp(s, "renormalize"))
 		o->renormalize = 1;
 	else if (!strcmp(s, "no-renormalize"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index b090ad8eac..ff16243da9 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -32,7 +32,8 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
-#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
+#define XDF_IGNORE_CR_AT_EOL (1 << 5)
+#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL | XDF_IGNORE_CR_AT_EOL)
 
 #define XDF_PATIENCE_DIFF (1 << 5)
 #define XDF_HISTOGRAM_DIFF (1 << 6)
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 04d7b32e4e..8720e272b9 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long flags)
 	return (i == size);
 }
 
+/*
+ * Have we eaten everything on the line, except for an optional
+ * CR at the very end?
+ */
+static int ends_with_optional_cr(const char *l, long s, long i)
+{
+	if (s && l[s-1] == '\n')
+		s--;
+	if (s == i)
+		return 1;
+	if (s == i + 1 && l[i] == '\r')
+		return 1;
+	return 0;
+}
+
 int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 {
 	int i1, i2;
@@ -170,7 +185,8 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 
 	/*
 	 * -w matches everything that matches with -b, and -b in turn
-	 * matches everything that matches with --ignore-space-at-eol.
+	 * matches everything that matches with --ignore-space-at-eol,
+	 * which in turn matches everything that matches with --ignore-cr-at-eol.
 	 *
 	 * Each flavor of ignoring needs different logic to skip whitespaces
 	 * while we have both sides to compare.
@@ -204,6 +220,14 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 			i1++;
 			i2++;
 		}
+	} else if (flags & XDF_IGNORE_CR_AT_EOL) {
+		/* Find the first difference and see how the line ends */
+		while (i1 < s1 && i2 < s2 && l1[i1] == l2[i2]) {
+			i1++;
+			i2++;
+		}
+		return (ends_with_optional_cr(l1, s1, i1) &&
+			ends_with_optional_cr(l2, s2, i2));
 	}
 
 	/*
@@ -230,9 +254,15 @@ static unsigned long xdl_hash_record_with_whitespace(char const **data,
 		char const *top, long flags) {
 	unsigned long ha = 5381;
 	char const *ptr = *data;
+	int cr_at_eol_only = (flags & XDF_WHITESPACE_FLAGS) == XDF_IGNORE_CR_AT_EOL;
 
 	for (; ptr < top && *ptr != '\n'; ptr++) {
-		if (XDL_ISSPACE(*ptr)) {
+		if (cr_at_eol_only) {
+			if (*ptr == '\r' &&
+			    (top <= ptr + 1 || ptr[1] == '\n'))
+				continue;
+		}
+		else if (XDL_ISSPACE(*ptr)) {
 			const char *ptr2 = ptr;
 			int at_eol;
 			while (ptr + 1 < top && XDL_ISSPACE(ptr[1])



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-24 19:02   ` Torsten Bögershausen
@ 2017-10-25 12:13     ` Johannes Schindelin
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Schindelin @ 2017-10-25 12:13 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Jonathan Nieder, Lars Schneider, git, Jeff King

[-- Attachment #1: Type: text/plain, Size: 402 bytes --]

Hi,

On Tue, 24 Oct 2017, Torsten Bögershausen wrote:

> The penalty may be low, but Dscho once reported that it [line endings
> conversion] is measurable & painful on a "big repo".

Yes, I do not remember the details, but it must have been around 5-15%
speed improvement to prevent the autoCRLF stuff from doing its thing.

At work, we always switch it off, for that reason.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-24 21:04 ` Johannes Sixt
@ 2017-10-25 12:19   ` Johannes Schindelin
  2017-10-26  7:09     ` Johannes Sixt
  0 siblings, 1 reply; 44+ messages in thread
From: Johannes Schindelin @ 2017-10-25 12:19 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Lars Schneider, git, Torsten Bögershausen, Jeff King

Hi Hannes,

On Tue, 24 Oct 2017, Johannes Sixt wrote:

> Am 24.10.2017 um 19:48 schrieb Lars Schneider:
> > I've migrated a large repo (110k+ files) with a lot of history (177k
> > commits)
> > and a lot of users (200+) to Git. Unfortunately, all text files in the index
> > of the repo have CRLF line endings. In general this seems not to be a
> > problem
> > as the project is developed exclusively on Windows.
> > 
> > However, I wonder if there are any "hidden consequences" of this setup?
> 
> I've been working on a project with CRLF in every source file for a decade
> now. It's C++ source, and it isn't even Windows-only: when checked out on
> Linux, there are CRs in the files, with no bad consequences so far. GCC is
> happy with them.

I envy you for the blessing of such a clean C++ source that you do not
have any, say, Unix shell script in it. Try this, and weep:

	$ printf 'echo \\\r\n\t123\r\n' >a1

	$ sh a1

	a1: 2: a1: 123: not found

For the same reason (Unix shell not handling CR/LF gracefull), I went
through that painful work that finally landed as 00ddc9d13ca (Fix build
with core.autocrlf=true, 2017-05-09).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-25  4:53     ` Junio C Hamano
@ 2017-10-25 16:44       ` Stefan Beller
  2017-10-26  5:54         ` Junio C Hamano
  2017-11-07  6:40       ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Junio C Hamano
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2017-10-25 16:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Lars Schneider, git, Torsten Bögershausen,
	Jeff King, Johannes Schindelin

On Tue, Oct 24, 2017 at 9:53 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Here is a lunch-time hack to add that option.  As you can see from
> the lack of docs, tests and a proper log message, I haven't played
> with it long enough to know how buggy it is, though.  I am not all
> that interested in polishing it to completion myself and prefer to
> leave it as #leftoverbits ;-)

Ok, nevertheless a review pointing out a couple things would be
useful for those who pick it up later, I assume.

> Also I didn't bother teaching this to Stefan's color-moved thing, as
> the line comparator it uses will hopefully be unified with the one I
> am touching in xdiff/ with this patch.

which will be rerolled shortly fixing just the parameter names as Eric
mentioned.

>  diff.c            |  5 ++++-
>  merge-recursive.c |  2 ++
>  xdiff/xdiff.h     |  3 ++-
>  xdiff/xutils.c    | 34 ++++++++++++++++++++++++++++++++--
>  4 files changed, 40 insertions(+), 4 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..eeca0fd3b8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
>
>         if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
>             DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
> -           DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
> +           DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
> +           DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))

This highlights another part of the flag macros, that could be made nicer.
All these flags combined are XDF_WHITESPACE_FLAGS, so this
if could be written without the macros as

  if (options->xdl_ops & XDF_WHITESPACE_FLAGS)

which we might want to hide in a macro

  DIFF_XDL_MASK_TST(options, mask)

or such?


>  #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
> -#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
> +#define XDF_IGNORE_CR_AT_EOL (1 << 5)
> +#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL | XDF_IGNORE_CR_AT_EOL)
>
>  #define XDF_PATIENCE_DIFF (1 << 5)

(1<<5) is taken twice now. Currently there is only one
unused free bit (but that was used once upon a time);
so we have to think how we revamp the flag system to
support more than 32 bits.

See also the answers to
https://public-inbox.org/git/20171024000931.14814-1-sbeller@google.com/
as that started this discussion already.

>  #define XDF_HISTOGRAM_DIFF (1 << 6)
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index 04d7b32e4e..8720e272b9 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long flags)
>         return (i == size);
>  }
>
> +/*
> + * Have we eaten everything on the line, except for an optional
> + * CR at the very end?
> + */
> +static int ends_with_optional_cr(const char *l, long s, long i)
> +{
> +       if (s && l[s-1] == '\n')
> +               s--;

so first we cut off the '\n',

> +       if (s == i)
> +               return 1;

then we either have an ending without

> +       if (s == i + 1 && l[i] == '\r')
> +               return 1;

or with a '\r' before.

That seems correct after some thought; I might offer
another easier to understand (for me) solution,
which is
{
       /* cut of ending LF */
       if (s && l[s-1] == '\n')
               s--;
      /* do we only need to cut LF? */
      if (i == s)
        return 1;

       /* cut of ending CR */
       if (s && l[s-1] == '\r')
               s--;
      /* was this everything to cut? */
      return i == s
}

Though this seems even more complicated
after having it written down.

>          * Each flavor of ignoring needs different logic to skip whitespaces
>          * while we have both sides to compare.
> @@ -204,6 +220,14 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
>                         i1++;
>                         i2++;
>                 }
> +       } else if (flags & XDF_IGNORE_CR_AT_EOL) {
> +               /* Find the first difference and see how the line ends */
> +               while (i1 < s1 && i2 < s2 && l1[i1] == l2[i2]) {
> +                       i1++;
> +                       i2++;
> +               }
> +               return (ends_with_optional_cr(l1, s1, i1) &&
> +                       ends_with_optional_cr(l2, s2, i2));
>         }
>
>         /*
> @@ -230,9 +254,15 @@ static unsigned long xdl_hash_record_with_whitespace(char const **data,
>                 char const *top, long flags) {
>         unsigned long ha = 5381;
>         char const *ptr = *data;
> +       int cr_at_eol_only = (flags & XDF_WHITESPACE_FLAGS) == XDF_IGNORE_CR_AT_EOL;
>
>         for (; ptr < top && *ptr != '\n'; ptr++) {
> -               if (XDL_ISSPACE(*ptr)) {
> +               if (cr_at_eol_only) {
> +                       if (*ptr == '\r' &&
> +                           (top <= ptr + 1 || ptr[1] == '\n'))
> +                               continue;
> +               }
> +               else if (XDL_ISSPACE(*ptr)) {
>                         const char *ptr2 = ptr;
>                         int at_eol;
>                         while (ptr + 1 < top && XDL_ISSPACE(ptr[1])
>
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-24 18:14 ` Jonathan Nieder
  2017-10-24 19:02   ` Torsten Bögershausen
  2017-10-25  1:51   ` Junio C Hamano
@ 2017-10-25 17:04   ` Lars Schneider
  2017-10-25 17:13     ` Jonathan Nieder
  2 siblings, 1 reply; 44+ messages in thread
From: Lars Schneider @ 2017-10-25 17:04 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Torsten Bögershausen, Jeff King, Johannes Schindelin


> On 24 Oct 2017, at 20:14, Jonathan Nieder <jrnieder@gmail.com> wrote:
> 
> Hi,
> 
> Lars Schneider wrote:
> 
>> I've migrated a large repo (110k+ files) with a lot of history (177k commits)
>> and a lot of users (200+) to Git. Unfortunately, all text files in the index
>> of the repo have CRLF line endings. In general this seems not to be a problem
>> as the project is developed exclusively on Windows.
> 
> Sounds good.
> 
>> However, I wonder if there are any "hidden consequences" of this setup?
>> If there are consequences, then I see two options. Either I rebase the repo
>> and fix the line endings for all commits or I add a single commit that fixes
>> the line endings for all files. Both actions require coordination with the
>> users to avoid repo trouble/merge conflicts. The "single fixup commit" options
>> would also make diffs into the past look bad. Would a single large commit have
>> any impact on the performance of standard Git operations?
> 
> There are no hidden consequences that I'm aware of.  If you later
> decide that you want to become a cross-platform project, then you may
> want to switch to LF endings, in which case I suggest the "single
> fixup commit" strategy.
> 
> In any event, you also probably want to declare what you're doing
> using .gitattributes.  By checking in the files as CRLF, you are
> declaring that you do *not* want Git to treat them as text files
> (i.e., you do not want Git to change the line endings), so something
> as simple as
> 
> 	* -text

That's sounds good. Does "-text" have any other implications?
For whatever reason I always thought this is the way to tell
Git that a particular file is binary with the implication that
Git should not attempt to diff it.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-25 17:04   ` Consequences of CRLF in index? Lars Schneider
@ 2017-10-25 17:13     ` Jonathan Nieder
  2017-10-26 11:06       ` Lars Schneider
  2017-10-26 19:15       ` Torsten Bögershausen
  0 siblings, 2 replies; 44+ messages in thread
From: Jonathan Nieder @ 2017-10-25 17:13 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Torsten Bögershausen, Jeff King, Johannes Schindelin

Hi again,

Lars Schneider wrote:
>> On 24 Oct 2017, at 20:14, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> In any event, you also probably want to declare what you're doing
>> using .gitattributes.  By checking in the files as CRLF, you are
>> declaring that you do *not* want Git to treat them as text files
>> (i.e., you do not want Git to change the line endings), so something
>> as simple as
>>
>> 	* -text
>
> That's sounds good. Does "-text" have any other implications?
> For whatever reason I always thought this is the way to tell
> Git that a particular file is binary with the implication that
> Git should not attempt to diff it.

No other implications.  You're thinking of "-diff".  There is also a
shortcut "binary" which simply means "-text -diff".

Ideas for wording improvements to gitattributes(5) on this subject?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-25 16:44       ` Stefan Beller
@ 2017-10-26  5:54         ` Junio C Hamano
  2017-10-27  6:13           ` Re* " Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2017-10-26  5:54 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jonathan Nieder, Lars Schneider, git, Torsten Bögershausen,
	Jeff King, Johannes Schindelin

Stefan Beller <sbeller@google.com> writes:

>> diff --git a/diff.c b/diff.c
>> index 6fd288420b..eeca0fd3b8 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
>>
>>         if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
>>             DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
>> -           DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
>> +           DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
>> +           DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))
>
> This highlights another part of the flag macros, that could be made nicer.
> All these flags combined are XDF_WHITESPACE_FLAGS, so this
> if could be written without the macros as
>
>   if (options->xdl_ops & XDF_WHITESPACE_FLAGS)

Yes, and I think the codepath that matters most already uses that
form.  Perhaps it is a good idea to do the rewrite without a macro
(XDF_WHITESPACE_FLAGS is already a macro enough).

> (1<<5) is taken twice now.

Good eyes.  I think we use bits #1-#8 now (bit #0 is vacant, so are
#9-#31).

>> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index 04d7b32e4e..8720e272b9 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long flags)
>>         return (i == size);
>>  }
>>
>> +/*
>> + * Have we eaten everything on the line, except for an optional
>> + * CR at the very end?
>> + */
>> +static int ends_with_optional_cr(const char *l, long s, long i)
>> +{
>> +       if (s && l[s-1] == '\n')
>> +               s--;
>
> so first we cut off the '\n',

> That seems correct after some thought;

I added the "trim the LF at the end" to the beginning of the
function at the last minute to cheat, and it is debatable if it is
entirely correct on an incomplete line.

The byte at the end of line, i.e. l[s-1], could be either '\n' or
something else, and the latter is an incompete line at the end of
the file.  When we trimmed the LF and decremented s, CR at l[s-1]
is the CR in CRLF, which we do want to ignore.  If we didn't, then
what is CR sitting at l[s-1]?  It is a lone CR at the end of file,
not a part of CRLF.  Do we really want to ignore it?

If we take the name of the option "ignore-cr-at-eol" literally, yes,
it is a CR sitting at the end of a line, which happens to be an
incomplete one, so we do want to ignore.  But if we think about the
reason why we are adding the option (i.e. to help conversion between
CRLF and LF), it is somewhat iffy.  The lone CR at the end of file
cannot be something that came from CRLF<->LF conversion, and ignoring
it would hide possible problems in conversion or the original data.

> I might offer
> another easier to understand (for me) solution,
> ...
> Though this seems even more complicated
> after having it written down.

This happens to me quite often and my solution to it is to remove
the alternative I tried to formulate after convincing myself that it
is not that much of an improvement ;-).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-25 12:19   ` Johannes Schindelin
@ 2017-10-26  7:09     ` Johannes Sixt
  2017-10-26 11:01       ` Lars Schneider
  0 siblings, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2017-10-26  7:09 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Lars Schneider, git, Torsten Bögershausen, Jeff King

Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
> I envy you for the blessing of such a clean C++ source that you do not
> have any, say, Unix shell script in it. Try this, and weep:
> 
> 	$ printf 'echo \\\r\n\t123\r\n' >a1
> 
> 	$ sh a1
> 
> 	a1: 2: a1: 123: not found

I was bitten by that, too. For this reason, I ensure that shell scripts 
and Makefiles begin their life on Linux. Fortunately, modern editors on 
Windows, includ^Wand vi, do not force CRLF line breaks, and such files 
can be edited on Windows, too.

Of course, I do not set core.autocrlf anywhere to avoid any changes 
behind my back.

> For the same reason (Unix shell not handling CR/LF gracefull), I went
> through that painful work that finally landed as 00ddc9d13ca (Fix build
> with core.autocrlf=true, 2017-05-09).

That's much appreciated!

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-26  7:09     ` Johannes Sixt
@ 2017-10-26 11:01       ` Lars Schneider
  2017-10-26 19:22         ` Torsten Bögershausen
                           ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Lars Schneider @ 2017-10-26 11:01 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Johannes Schindelin, git, Torsten Bögershausen, Jeff King


> On 26 Oct 2017, at 09:09, Johannes Sixt <j6t@kdbg.org> wrote:
> 
> Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
>> I envy you for the blessing of such a clean C++ source that you do not
>> have any, say, Unix shell script in it. Try this, and weep:
>> 	$ printf 'echo \\\r\n\t123\r\n' >a1
>> 	$ sh a1
>> 	a1: 2: a1: 123: not found
> 
> I was bitten by that, too. For this reason, I ensure that shell scripts and Makefiles begin their life on Linux. Fortunately, modern editors on Windows, includ^Wand vi, do not force CRLF line breaks, and such files can be edited on Windows, too.

Wouldn't this kind of .gitattributes setup solve the problem?

*     -text
*.sh   text eol=lf

Thanks,
Lars


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-25 17:13     ` Jonathan Nieder
@ 2017-10-26 11:06       ` Lars Schneider
  2017-10-26 19:15       ` Torsten Bögershausen
  1 sibling, 0 replies; 44+ messages in thread
From: Lars Schneider @ 2017-10-26 11:06 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Torsten Bögershausen, Jeff King, Johannes Schindelin


> On 25 Oct 2017, at 19:13, Jonathan Nieder <jrnieder@gmail.com> wrote:
> 
> Hi again,
> 
> Lars Schneider wrote:
>>> On 24 Oct 2017, at 20:14, Jonathan Nieder <jrnieder@gmail.com> wrote:
> 
>>> In any event, you also probably want to declare what you're doing
>>> using .gitattributes.  By checking in the files as CRLF, you are
>>> declaring that you do *not* want Git to treat them as text files
>>> (i.e., you do not want Git to change the line endings), so something
>>> as simple as
>>> 
>>> 	* -text
>> 
>> That's sounds good. Does "-text" have any other implications?
>> For whatever reason I always thought this is the way to tell
>> Git that a particular file is binary with the implication that
>> Git should not attempt to diff it.
> 
> No other implications.  You're thinking of "-diff".  There is also a
> shortcut "binary" which simply means "-text -diff".

Yeah. Well, when I read "-text" then I think "no text" and that makes
me think "is binary".


> Ideas for wording improvements to gitattributes(5) on this subject?

I think the wording in the docs is good. It is just the "text" keyword
that confused me. Maybe this could have been names "eolnorm" and
"-eolnorm" or something. But it is too late for that now I guess :-)

Thanks,
Lars

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-25 17:13     ` Jonathan Nieder
  2017-10-26 11:06       ` Lars Schneider
@ 2017-10-26 19:15       ` Torsten Bögershausen
  1 sibling, 0 replies; 44+ messages in thread
From: Torsten Bögershausen @ 2017-10-26 19:15 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Lars Schneider, git, Jeff King, Johannes Schindelin

On Wed, Oct 25, 2017 at 10:13:57AM -0700, Jonathan Nieder wrote:
> Hi again,
> 
> Lars Schneider wrote:
> >> On 24 Oct 2017, at 20:14, Jonathan Nieder <jrnieder@gmail.com> wrote:
> 
> >> In any event, you also probably want to declare what you're doing
> >> using .gitattributes.  By checking in the files as CRLF, you are
> >> declaring that you do *not* want Git to treat them as text files
> >> (i.e., you do not want Git to change the line endings), so something
> >> as simple as
> >>
> >> 	* -text
> >
> > That's sounds good. Does "-text" have any other implications?
> > For whatever reason I always thought this is the way to tell
> > Git that a particular file is binary with the implication that
> > Git should not attempt to diff it.
> 
> No other implications.  You're thinking of "-diff".  There is also a
> shortcut "binary" which simply means "-text -diff".

Not 100% the same, as far as I know.
"binary" means: Don't convert line endings, and there is now way to
do a readable diff.
The only thing to tell the user is: The binary blobs are different.

Then we have "text". The "old" version of "text" was "crlf", which
for some people was more intuitive, and less intuitive for others.
"* crlf" is the same as "* text" and means please convert line endings.
And yes, the file is still line oriented.
"* -crlf" means don't touch the line endings, the file is
line-orinted and diff and  merge will work.
"* -text" is the same as "* -crlf"

> 
> Ideas for wording improvements to gitattributes(5) on this subject?

None from me at the moment.

> 
> Thanks,
> Jonathan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-26 11:01       ` Lars Schneider
@ 2017-10-26 19:22         ` Torsten Bögershausen
  2017-10-26 20:20         ` Johannes Sixt
  2017-10-27 15:18         ` Johannes Schindelin
  2 siblings, 0 replies; 44+ messages in thread
From: Torsten Bögershausen @ 2017-10-26 19:22 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Johannes Sixt, Johannes Schindelin, git, Jeff King

On Thu, Oct 26, 2017 at 01:01:25PM +0200, Lars Schneider wrote:
> 
> > On 26 Oct 2017, at 09:09, Johannes Sixt <j6t@kdbg.org> wrote:
> > 
> > Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
> >> I envy you for the blessing of such a clean C++ source that you do not
> >> have any, say, Unix shell script in it. Try this, and weep:
> >> 	$ printf 'echo \\\r\n\t123\r\n' >a1
> >> 	$ sh a1
> >> 	a1: 2: a1: 123: not found
> > 
> > I was bitten by that, too. For this reason, I ensure that shell scripts and Makefiles begin their life on Linux. Fortunately, modern editors on Windows, includ^Wand vi, do not force CRLF line breaks, and such files can be edited on Windows, too.
> 
> Wouldn't this kind of .gitattributes setup solve the problem?
> 
> *     -text
> *.sh   text eol=lf
> 

Yes, exactly. and for the snake-lovers:

*.py   text eol=lf

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-26 11:01       ` Lars Schneider
  2017-10-26 19:22         ` Torsten Bögershausen
@ 2017-10-26 20:20         ` Johannes Sixt
  2017-10-26 20:30           ` Jonathan Nieder
  2017-10-27 15:18         ` Johannes Schindelin
  2 siblings, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2017-10-26 20:20 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Johannes Schindelin, git, Torsten Bögershausen, Jeff King

Am 26.10.2017 um 13:01 schrieb Lars Schneider:
>> On 26 Oct 2017, at 09:09, Johannes Sixt <j6t@kdbg.org> wrote:
>> Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
>>> I envy you for the blessing of such a clean C++ source that you do not
>>> have any, say, Unix shell script in it. Try this, and weep:
>>> 	$ printf 'echo \\\r\n\t123\r\n' >a1
>>> 	$ sh a1
>>> 	a1: 2: a1: 123: not found
>>
>> I was bitten by that, too. For this reason, I ensure that shell scripts and Makefiles begin their life on Linux. Fortunately, modern editors on Windows, includ^Wand vi, do not force CRLF line breaks, and such files can be edited on Windows, too.
> 
> Wouldn't this kind of .gitattributes setup solve the problem?

But there is no problem. Don't have CRs in shell scripts and be done 
with it.

> 
> *     -text
> *.sh   text eol=lf

Why would that be necessary? I cannot have CRLF in shell scripts etc., 
not even on Windows. (And in addition I do not mind CR in C++ source 
code.) All I want is: Git, please, by all means, do not mess with my 
files, ever. Isn't the simplest way to achieve that to set *nothing* at 
all? Not even core.autocrlf?

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-26 20:20         ` Johannes Sixt
@ 2017-10-26 20:30           ` Jonathan Nieder
  2017-10-26 20:51             ` Johannes Sixt
  0 siblings, 1 reply; 44+ messages in thread
From: Jonathan Nieder @ 2017-10-26 20:30 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Lars Schneider, Johannes Schindelin, git,
	Torsten Bögershausen, Jeff King

Johannes Sixt wrote:
> Am 26.10.2017 um 13:01 schrieb Lars Schneider:

>> *     -text
>> *.sh   text eol=lf
>
> Why would that be necessary? I cannot have CRLF in shell scripts etc., not
> even on Windows. (And in addition I do not mind CR in C++ source code.) All
> I want is: Git, please, by all means, do not mess with my files, ever. Isn't
> the simplest way to achieve that to set *nothing* at all? Not even
> core.autocrlf?

That's why Lars suggests "* -text" in .gitattributes.  That way, if
some user of the repository *does* set core.autocrlf, you still get
the "do not mess with my files" semantics you want.

If you control the configuration of all users of the repository, you
don't need that, but it doesn't hurt, either.

With that "* -text", you do not need "*.sh text eol=lf", but maybe
you'd want it in order to get some warnings when someone changes the
line endings by mistake without running tests.  (Better to have a
culture of running tests, you might say.  Fair enough, but as with the
other .gitattributes line, it doesn't hurt.)

Jonathan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-26 20:30           ` Jonathan Nieder
@ 2017-10-26 20:51             ` Johannes Sixt
  2017-10-26 22:27               ` Ross Kabus
  0 siblings, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2017-10-26 20:51 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Lars Schneider, Johannes Schindelin, git,
	Torsten Bögershausen, Jeff King

Thank you for the clarification, it's much appreciated.

-- Hannes


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-26 20:51             ` Johannes Sixt
@ 2017-10-26 22:27               ` Ross Kabus
  2017-10-27  1:05                 ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Ross Kabus @ 2017-10-26 22:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Johannes Sixt, Lars Schneider,
	Johannes Schindelin, Torsten Bögershausen, Jeff King

Is "* -text" in any way different than "-text" (without the * asterisk)? All
of my .gitattributes files have "-text" (no * asterisk) and I haven't noticed
any difference but could I be missing something subtle?

~Ross

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-26 22:27               ` Ross Kabus
@ 2017-10-27  1:05                 ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2017-10-27  1:05 UTC (permalink / raw)
  To: Ross Kabus
  Cc: git, Jonathan Nieder, Johannes Sixt, Lars Schneider,
	Johannes Schindelin, Torsten Bögershausen, Jeff King

Ross Kabus <rkabus@aerotech.com> writes:

> Is "* -text" in any way different than "-text" (without the * asterisk)? All
> of my .gitattributes files have "-text" (no * asterisk) and I haven't noticed
> any difference but could I be missing something subtle?
>
> ~Ross

A line in the .gitattibute file is of the form

	<pattern>	<attribute definition>...

i.e. a pattern to match paths, with a list of attribute definitions.
The asterisk they are showing in their description is the <pattern>
part, i.e. "apply the '-text' thing to paths that match '*'", which
is equivalent to saying "set text attribute to false for all paths".


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re* Consequences of CRLF in index?
  2017-10-26  5:54         ` Junio C Hamano
@ 2017-10-27  6:13           ` Junio C Hamano
  2017-10-27 17:06             ` Stefan Beller
  2017-10-27 17:07             ` [PATCH 0/2] " Stefan Beller
  0 siblings, 2 replies; 44+ messages in thread
From: Junio C Hamano @ 2017-10-27  6:13 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jonathan Nieder, Lars Schneider, git, Torsten Bögershausen,
	Jeff King, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <sbeller@google.com> writes:
>
>> (1<<5) is taken twice now.
>
> Good eyes.  I think we use bits #1-#8 now (bit #0 is vacant, so are
> #9-#31).

Let's do this bit-shuffling as a preliminary clean-up.

-- >8 --
Subject: [PATCH] xdiff: reassign xpparm_t.flags bits

We have packed the bits too tightly in such a way that it is not
easy to add a new type of whitespace ignoring option, a new type
of LCS algorithm, or a new type of post-cleanup heuristics.

Reorder bits a bit to give room for these three classes of options
to grow.

While at it, add a comment in front of the bit definitions to
clarify in which structure these defined bits may appear.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 xdiff/xdiff.h | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index b090ad8eac..457cac32d8 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -27,22 +27,24 @@
 extern "C" {
 #endif /* #ifdef __cplusplus */
 
+/* xpparm_t.flags */
+#define XDF_NEED_MINIMAL (1 << 0)
 
-#define XDF_NEED_MINIMAL (1 << 1)
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
-#define XDF_PATIENCE_DIFF (1 << 5)
-#define XDF_HISTOGRAM_DIFF (1 << 6)
+#define XDF_IGNORE_BLANK_LINES (1 << 7)
+
+#define XDF_PATIENCE_DIFF (1 << 14)
+#define XDF_HISTOGRAM_DIFF (1 << 15)
 #define XDF_DIFF_ALGORITHM_MASK (XDF_PATIENCE_DIFF | XDF_HISTOGRAM_DIFF)
 #define XDF_DIFF_ALG(x) ((x) & XDF_DIFF_ALGORITHM_MASK)
 
-#define XDF_IGNORE_BLANK_LINES (1 << 7)
-
-#define XDF_INDENT_HEURISTIC (1 << 8)
+#define XDF_INDENT_HEURISTIC (1 << 23)
 
+/* xdemitconf_t.flags */
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-- 
2.15.0-rc2-266-g8f92d095f4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: Consequences of CRLF in index?
  2017-10-26 11:01       ` Lars Schneider
  2017-10-26 19:22         ` Torsten Bögershausen
  2017-10-26 20:20         ` Johannes Sixt
@ 2017-10-27 15:18         ` Johannes Schindelin
  2 siblings, 0 replies; 44+ messages in thread
From: Johannes Schindelin @ 2017-10-27 15:18 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Johannes Sixt, git, Torsten Bögershausen, Jeff King

Hi Lars,

On Thu, 26 Oct 2017, Lars Schneider wrote:

> > On 26 Oct 2017, at 09:09, Johannes Sixt <j6t@kdbg.org> wrote:
> > 
> > Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
> >> I envy you for the blessing of such a clean C++ source that you do not
> >> have any, say, Unix shell script in it. Try this, and weep:
> >> 	$ printf 'echo \\\r\n\t123\r\n' >a1
> >> 	$ sh a1
> >> 	a1: 2: a1: 123: not found
> > 
> > I was bitten by that, too. For this reason, I ensure that shell
> > scripts and Makefiles begin their life on Linux. Fortunately, modern
> > editors on Windows, includ^Wand vi, do not force CRLF line breaks, and

                        ^^^^^^^^^^^^^^

This put a well-needed smile on my face. Thanks.

> > such files can be edited on Windows, too.
> 
> Wouldn't this kind of .gitattributes setup solve the problem?
> 
> *     -text
> *.sh   text eol=lf

If you look at the commit I mentioned, you will see examples where it
breaks down: when using Unix shell scripts *without* .sh file extension.
Most notable offender: GIT-VERSION-GEN.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Re* Consequences of CRLF in index?
  2017-10-27  6:13           ` Re* " Junio C Hamano
@ 2017-10-27 17:06             ` Stefan Beller
  2017-10-27 17:07             ` [PATCH 0/2] " Stefan Beller
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Beller @ 2017-10-27 17:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Lars Schneider, git, Torsten Bögershausen,
	Jeff King, Johannes Schindelin

On Thu, Oct 26, 2017 at 11:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Subject: [PATCH] xdiff: reassign xpparm_t.flags bits
>
> We have packed the bits too tightly in such a way that it is not
> easy to add a new type of whitespace ignoring option, a new type
> of LCS algorithm, or a new type of post-cleanup heuristics.
>
> Reorder bits a bit to give room for these three classes of options
> to grow.
>
> While at it, add a comment in front of the bit definitions to
> clarify in which structure these defined bits may appear.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Reviewed-by: Stefan Beller <sbeller@google.com>

> ---
>  xdiff/xdiff.h | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index b090ad8eac..457cac32d8 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -27,22 +27,24 @@
>  extern "C" {
>  #endif /* #ifdef __cplusplus */
>
> +/* xpparm_t.flags */
> +#define XDF_NEED_MINIMAL (1 << 0)
>
> -#define XDF_NEED_MINIMAL (1 << 1)

The whitespace flags are also stored in xpparm_t, though
these flags can also come in from other sources (e.g. we
just pass it in manually via the interface).

>  #define XDF_IGNORE_WHITESPACE (1 << 2)
>  #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
>  #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
>  #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
>
> -#define XDF_PATIENCE_DIFF (1 << 5)
> -#define XDF_HISTOGRAM_DIFF (1 << 6)
> +#define XDF_IGNORE_BLANK_LINES (1 << 7)

Is XDF_IGNORE_BLANK_LINES a whitespace option, just
on the lines level instead of the inside one line? I think
it still makes sense to keep it here, slightly separated from
the IGNORE flags.

> +#define XDF_PATIENCE_DIFF (1 << 14)
> +#define XDF_HISTOGRAM_DIFF (1 << 15)
>  #define XDF_DIFF_ALGORITHM_MASK (XDF_PATIENCE_DIFF | XDF_HISTOGRAM_DIFF)
>  #define XDF_DIFF_ALG(x) ((x) & XDF_DIFF_ALGORITHM_MASK)
>
> -#define XDF_IGNORE_BLANK_LINES (1 << 7)
> -
> -#define XDF_INDENT_HEURISTIC (1 << 8)
> +#define XDF_INDENT_HEURISTIC (1 << 23)
>
> +/* xdemitconf_t.flags */
>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)

This looked like it was carefully crafted to avoid accidental bit overlap
with XDF_NEED_MINIMAL; but these are in different flag fields, this
should be fine.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 0/2] Re* Consequences of CRLF in index?
  2017-10-27  6:13           ` Re* " Junio C Hamano
  2017-10-27 17:06             ` Stefan Beller
@ 2017-10-27 17:07             ` Stefan Beller
  2017-10-27 17:07               ` [PATCH 1/2] xdiff/xdiff.h: remove unused flags Stefan Beller
                                 ` (2 more replies)
  1 sibling, 3 replies; 44+ messages in thread
From: Stefan Beller @ 2017-10-27 17:07 UTC (permalink / raw)
  To: gitster
  Cc: git, johannes.schindelin, jrnieder, larsxschneider, peff, sbeller,
	tboegi

> Let's do this bit-shuffling as a preliminary clean-up.

These 2 patches can go on top of that as well.

Thanks,
Stefan

Stefan Beller (2):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations

 xdiff/xdiff.h  |  8 --------
 xdiff/xdiffi.c | 17 -----------------
 2 files changed, 25 deletions(-)

-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/2] xdiff/xdiff.h: remove unused flags
  2017-10-27 17:07             ` [PATCH 0/2] " Stefan Beller
@ 2017-10-27 17:07               ` Stefan Beller
  2017-10-27 17:07               ` [PATCH 2/2] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
  2017-10-30 17:20               ` [PATCH 0/2] Re* Consequences of CRLF in index? Stefan Beller
  2 siblings, 0 replies; 44+ messages in thread
From: Stefan Beller @ 2017-10-27 17:07 UTC (permalink / raw)
  To: gitster
  Cc: git, johannes.schindelin, jrnieder, larsxschneider, peff, sbeller,
	tboegi

These flags were there since the beginning (3443546f6e (Use a *real*
built-in diff generator, 2006-03-24), but were never used. Remove them.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiff.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 457cac32d8..0a41f5e500 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -48,14 +48,6 @@ extern "C" {
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-#define XDL_MMB_READONLY (1 << 0)
-
-#define XDL_MMF_ATOMIC (1 << 0)
-
-#define XDL_BDOP_INS 1
-#define XDL_BDOP_CPY 2
-#define XDL_BDOP_INSB 3
-
 /* merge simplification levels */
 #define XDL_MERGE_MINIMAL 0
 #define XDL_MERGE_EAGER 1
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/2] xdiff/xdiffi.c: remove unneeded function declarations
  2017-10-27 17:07             ` [PATCH 0/2] " Stefan Beller
  2017-10-27 17:07               ` [PATCH 1/2] xdiff/xdiff.h: remove unused flags Stefan Beller
@ 2017-10-27 17:07               ` Stefan Beller
  2017-10-30 17:20               ` [PATCH 0/2] Re* Consequences of CRLF in index? Stefan Beller
  2 siblings, 0 replies; 44+ messages in thread
From: Stefan Beller @ 2017-10-27 17:07 UTC (permalink / raw)
  To: gitster
  Cc: git, johannes.schindelin, jrnieder, larsxschneider, peff, sbeller,
	tboegi

There is no need to forward-declare these functions, as they are used
after their implementation only.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiffi.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 93a65680a1..0a4ea948f9 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -22,34 +22,17 @@
 
 #include "xinclude.h"
 
-
-
 #define XDL_MAX_COST_MIN 256
 #define XDL_HEUR_MIN_COST 256
 #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
 #define XDL_SNAKE_CNT 20
 #define XDL_K_HEUR 4
 
-
-
 typedef struct s_xdpsplit {
 	long i1, i2;
 	int min_lo, min_hi;
 } xdpsplit_t;
 
-
-
-
-static long xdl_split(unsigned long const *ha1, long off1, long lim1,
-		      unsigned long const *ha2, long off2, long lim2,
-		      long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
-		      xdalgoenv_t *xenv);
-static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
-
-
-
-
-
 /*
  * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
  * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/2] Re* Consequences of CRLF in index?
  2017-10-27 17:07             ` [PATCH 0/2] " Stefan Beller
  2017-10-27 17:07               ` [PATCH 1/2] xdiff/xdiff.h: remove unused flags Stefan Beller
  2017-10-27 17:07               ` [PATCH 2/2] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
@ 2017-10-30 17:20               ` Stefan Beller
  2017-10-31  2:44                 ` Junio C Hamano
  2 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2017-10-30 17:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Jonathan Nieder, Lars Schneider,
	Jeff King, Stefan Beller, Torsten Bögershausen

On Fri, Oct 27, 2017 at 10:07 AM, Stefan Beller <sbeller@google.com> wrote:
>> Let's do this bit-shuffling as a preliminary clean-up.
>
> These 2 patches can go on top of that as well.

Actually these textually do not conflict with your patch,
and they can be picked independently, e.g. they could
go on top of sb/diff-color-moved-use-xdl-recmatch
as a diff cleanup.

(I note this as you regard your patches as a lunch time hack
in the cooking email; I am serious about these patches though.)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/2] Re* Consequences of CRLF in index?
  2017-10-30 17:20               ` [PATCH 0/2] Re* Consequences of CRLF in index? Stefan Beller
@ 2017-10-31  2:44                 ` Junio C Hamano
  2017-10-31 16:41                   ` Stefan Beller
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2017-10-31  2:44 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Johannes Schindelin, Jonathan Nieder, Lars Schneider,
	Jeff King, Torsten Bögershausen

Stefan Beller <sbeller@google.com> writes:

> (I note this as you regard your patches as a lunch time hack
> in the cooking email; I am serious about these patches though.)

We do not want to touch borrowed code unnecessarily.  Are these
lines and bits hampering further progress we are actively trying to
make right now?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/2] Re* Consequences of CRLF in index?
  2017-10-31  2:44                 ` Junio C Hamano
@ 2017-10-31 16:41                   ` Stefan Beller
  2017-10-31 17:01                     ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2017-10-31 16:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Jonathan Nieder, Lars Schneider,
	Jeff King, Torsten Bögershausen

On Mon, Oct 30, 2017 at 7:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> (I note this as you regard your patches as a lunch time hack
>> in the cooking email; I am serious about these patches though.)
>
> We do not want to touch borrowed code unnecessarily.  Are these
> lines and bits hampering further progress we are actively trying to
> make right now?

No they are not, you are correct.

I differ in opinion on 'borrowed code'. The latest release of xdiff
(v0.23) was in Nov 13, 2008 according to http://freecode.com/projects/xdiff-lib
or on March 23, 2006 according to https://directory.fsf.org/wiki/LibXDiff
and given that we incorporated so many changes already to xdiff,
I would argue it is sufficiently different from the original, we'll probably
never import another upstream version (if there will be a release at all).

So the code was rather taken and now we are the bag holders in
maintaining it, so we can make it pretty even only for the sake of
pleasing ourselves (or rather: not confusing ourselves with too
many unused flags).

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/2] Re* Consequences of CRLF in index?
  2017-10-31 16:41                   ` Stefan Beller
@ 2017-10-31 17:01                     ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2017-10-31 17:01 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, git, Johannes Schindelin, Jonathan Nieder,
	Lars Schneider, Torsten Bögershausen

On Tue, Oct 31, 2017 at 09:41:25AM -0700, Stefan Beller wrote:

> On Mon, Oct 30, 2017 at 7:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Stefan Beller <sbeller@google.com> writes:
> >
> >> (I note this as you regard your patches as a lunch time hack
> >> in the cooking email; I am serious about these patches though.)
> >
> > We do not want to touch borrowed code unnecessarily.  Are these
> > lines and bits hampering further progress we are actively trying to
> > make right now?
> 
> No they are not, you are correct.
> 
> I differ in opinion on 'borrowed code'. The latest release of xdiff
> (v0.23) was in Nov 13, 2008 according to http://freecode.com/projects/xdiff-lib
> or on March 23, 2006 according to https://directory.fsf.org/wiki/LibXDiff
> and given that we incorporated so many changes already to xdiff,
> I would argue it is sufficiently different from the original, we'll probably
> never import another upstream version (if there will be a release at all).
> 
> So the code was rather taken and now we are the bag holders in
> maintaining it, so we can make it pretty even only for the sake of
> pleasing ourselves (or rather: not confusing ourselves with too
> many unused flags).

Yes, I'd agree. For what it's worth both sides of this argument played
out in my head when I saw your patches, and I ended up at the same "we
are the bag holders" place. And there's certainly precedent for touching
that code and cleaning it up to make it easier to work with (just look
at at "git log -p xdiff").

I don't know that I would support a full-scale rewrite (for the same
reason that I wouldn't on the rest of the code base -- avoiding churn).
But deleting unused bits seems like an easy win for readability.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL
  2017-10-25  4:53     ` Junio C Hamano
  2017-10-25 16:44       ` Stefan Beller
@ 2017-11-07  6:40       ` Junio C Hamano
  2017-11-07  6:40         ` [PATCH v2 1/2] xdiff: reassign xpparm_t.flags bits Junio C Hamano
                           ` (2 more replies)
  1 sibling, 3 replies; 44+ messages in thread
From: Junio C Hamano @ 2017-11-07  6:40 UTC (permalink / raw)
  To: git

This time with doc updates and tests.  The previous one is at 

https://public-inbox.org/git/xmqqshe7j0af.fsf@gitster.mtv.corp.google.com

A change that is supposed to only change the end-of-line convention
between LF <-> CRLF and nothing else can be verified with "diff -b"
or "diff --ignore-space-at-eol" in practice, but these hide changes
in whitespaces other than the carriage return at the end of the lines.

These two patches introduce a new --ignore-cr-at-eol option so that
only CR at eol and no other whitespace changes are ignored while
making a comparison, to allow a stricter validation.

I am not sure how much practical value the new option has.  The
preliminary clean-up patch to shuffle the bis assignment for flags
does have values, though ;-).

Junio C Hamano (2):
  xdiff: reassign xpparm_t.flags bits
  diff: --ignore-cr-at-eol

 Documentation/diff-options.txt     |  3 +++
 Documentation/merge-strategies.txt |  5 +++--
 diff.c                             |  6 +++---
 merge-recursive.c                  |  2 ++
 t/t4015-diff-whitespace.sh         | 28 ++++++++++++++++++++++++++++
 xdiff/xdiff.h                      | 26 ++++++++++++++++----------
 xdiff/xutils.c                     | 38 ++++++++++++++++++++++++++++++++++++--
 7 files changed, 91 insertions(+), 17 deletions(-)

-- 
2.15.0-263-g47cc852023


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/2] xdiff: reassign xpparm_t.flags bits
  2017-11-07  6:40       ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Junio C Hamano
@ 2017-11-07  6:40         ` Junio C Hamano
  2017-11-07 12:44           ` Johannes Schindelin
  2017-11-07  6:40         ` [PATCH v2 2/2] diff: --ignore-cr-at-eol Junio C Hamano
  2017-11-07 12:30         ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Johannes Schindelin
  2 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2017-11-07  6:40 UTC (permalink / raw)
  To: git

We have packed the bits too tightly in such a way that it is not
easy to add a new type of whitespace ignoring option, a new type
of LCS algorithm, or a new type of post-cleanup heuristics.

Reorder bits a bit to give room for these three classes of options
to grow.  Also make use of XDF_WHITESPACE_FLAGS macro where we check
any of these bits are on, instead of using DIFF_XDL_TST() macro on
individual possibilities.  That way, the "is any of the bits on?"
code does not have to change when we add more ways to ignore
whitespaces.

While at it, add a comment in front of the bit definitions to
clarify in which structure these defined bits may appear.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c        |  4 +---
 xdiff/xdiff.h | 24 ++++++++++++++----------
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..790250fe86 100644
--- a/diff.c
+++ b/diff.c
@@ -3434,9 +3434,7 @@ void diff_setup_done(struct diff_options *options)
 	 * inside contents.
 	 */
 
-	if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
-	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
-	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
+	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS))
 		DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
 	else
 		DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index b090ad8eac..cbf5d8e166 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -27,22 +27,26 @@
 extern "C" {
 #endif /* #ifdef __cplusplus */
 
+/* xpparm_t.flags */
+#define XDF_NEED_MINIMAL (1 << 0)
 
-#define XDF_NEED_MINIMAL (1 << 1)
-#define XDF_IGNORE_WHITESPACE (1 << 2)
-#define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
-#define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
-#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
+#define XDF_IGNORE_WHITESPACE (1 << 1)
+#define XDF_IGNORE_WHITESPACE_CHANGE (1 << 2)
+#define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 3)
+#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | \
+			      XDF_IGNORE_WHITESPACE_CHANGE | \
+			      XDF_IGNORE_WHITESPACE_AT_EOL)
 
-#define XDF_PATIENCE_DIFF (1 << 5)
-#define XDF_HISTOGRAM_DIFF (1 << 6)
+#define XDF_IGNORE_BLANK_LINES (1 << 7)
+
+#define XDF_PATIENCE_DIFF (1 << 14)
+#define XDF_HISTOGRAM_DIFF (1 << 15)
 #define XDF_DIFF_ALGORITHM_MASK (XDF_PATIENCE_DIFF | XDF_HISTOGRAM_DIFF)
 #define XDF_DIFF_ALG(x) ((x) & XDF_DIFF_ALGORITHM_MASK)
 
-#define XDF_IGNORE_BLANK_LINES (1 << 7)
-
-#define XDF_INDENT_HEURISTIC (1 << 8)
+#define XDF_INDENT_HEURISTIC (1 << 23)
 
+/* xdemitconf_t.flags */
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-- 
2.15.0-263-g47cc852023


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 2/2] diff: --ignore-cr-at-eol
  2017-11-07  6:40       ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Junio C Hamano
  2017-11-07  6:40         ` [PATCH v2 1/2] xdiff: reassign xpparm_t.flags bits Junio C Hamano
@ 2017-11-07  6:40         ` Junio C Hamano
  2017-11-07 13:23           ` Johannes Schindelin
  2017-11-07 12:30         ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Johannes Schindelin
  2 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2017-11-07  6:40 UTC (permalink / raw)
  To: git

A new option --ignore-cr-at-eol tells the diff machinery to treat a
carriage-return at the end of a (complete) line as if it does not
exist.

This would make it easier to review a change whose only effect is to
turn line endings from CRLF to LF or the other way around.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/diff-options.txt     |  3 +++
 Documentation/merge-strategies.txt |  5 +++--
 diff.c                             |  2 ++
 merge-recursive.c                  |  2 ++
 t/t4015-diff-whitespace.sh         | 28 ++++++++++++++++++++++++++++
 xdiff/xdiff.h                      |  4 +++-
 xdiff/xutils.c                     | 38 ++++++++++++++++++++++++++++++++++++--
 7 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 89cc0f48de..aa2c0ff74d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -519,6 +519,9 @@ endif::git-format-patch[]
 --text::
 	Treat all files as text.
 
+--ignore-cr-at-eol::
+	Ignore carrige-return at the end of line when doing a comparison.
+
 --ignore-space-at-eol::
 	Ignore changes in whitespace at EOL.
 
diff --git a/Documentation/merge-strategies.txt b/Documentation/merge-strategies.txt
index 2eb92b9327..030744910e 100644
--- a/Documentation/merge-strategies.txt
+++ b/Documentation/merge-strategies.txt
@@ -57,11 +57,12 @@ diff-algorithm=[patience|minimal|histogram|myers];;
 ignore-space-change;;
 ignore-all-space;;
 ignore-space-at-eol;;
+ignore-cr-at-eol;;
 	Treats lines with the indicated type of whitespace change as
 	unchanged for the sake of a three-way merge.  Whitespace
 	changes mixed with other changes to a line are not ignored.
-	See also linkgit:git-diff[1] `-b`, `-w`, and
-	`--ignore-space-at-eol`.
+	See also linkgit:git-diff[1] `-b`, `-w`,
+	`--ignore-space-at-eol`, and `--ignore-cr-at-eol`.
 +
 * If 'their' version only introduces whitespace changes to a line,
   'our' version is used;
diff --git a/diff.c b/diff.c
index 790250fe86..dd14e4190c 100644
--- a/diff.c
+++ b/diff.c
@@ -3888,6 +3888,8 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_CHANGE);
 	else if (!strcmp(arg, "--ignore-space-at-eol"))
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
+	else if (!strcmp(arg, "--ignore-cr-at-eol"))
+		DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
 	else if (!strcmp(arg, "--indent-heuristic"))
diff --git a/merge-recursive.c b/merge-recursive.c
index 7a7d55aabe..006b94baf2 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2214,6 +2214,8 @@ int parse_merge_opt(struct merge_options *o, const char *s)
 		DIFF_XDL_SET(o, IGNORE_WHITESPACE);
 	else if (!strcmp(s, "ignore-space-at-eol"))
 		DIFF_XDL_SET(o, IGNORE_WHITESPACE_AT_EOL);
+	else if (!strcmp(s, "ignore-cr-at-eol"))
+		DIFF_XDL_SET(o, IGNORE_CR_AT_EOL);
 	else if (!strcmp(s, "renormalize"))
 		o->renormalize = 1;
 	else if (!strcmp(s, "no-renormalize"))
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 289806d0c7..32dd54c21d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -106,6 +106,8 @@ test_expect_success 'another test, without options' '
 	git diff -w -b --ignore-space-at-eol >out &&
 	test_cmp expect out &&
 
+	git diff -w --ignore-cr-at-eol >out &&
+	test_cmp expect out &&
 
 	tr "Q_" "\015 " <<-\EOF >expect &&
 	diff --git a/x b/x
@@ -128,6 +130,9 @@ test_expect_success 'another test, without options' '
 	git diff -b --ignore-space-at-eol >out &&
 	test_cmp expect out &&
 
+	git diff -b --ignore-cr-at-eol >out &&
+	test_cmp expect out &&
+
 	tr "Q_" "\015 " <<-\EOF >expect &&
 	diff --git a/x b/x
 	index d99af23..22d9f73 100644
@@ -145,6 +150,29 @@ test_expect_success 'another test, without options' '
 	 CR at end
 	EOF
 	git diff --ignore-space-at-eol >out &&
+	test_cmp expect out &&
+
+	git diff --ignore-space-at-eol --ignore-cr-at-eol >out &&
+	test_cmp expect out &&
+
+	tr "Q_" "\015 " <<-\EOF >expect &&
+	diff --git a/x b/x
+	index_d99af23..22d9f73 100644
+	--- a/x
+	+++ b/x
+	@@ -1,6 +1,6 @@
+	-whitespace at beginning
+	-whitespace change
+	-whitespace in the middle
+	-whitespace at end
+	+_	whitespace at beginning
+	+whitespace_	_change
+	+white space in the middle
+	+whitespace at end__
+	 unchanged line
+	 CR at end
+	EOF
+	git diff --ignore-cr-at-eol >out &&
 	test_cmp expect out
 '
 
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index cbf5d8e166..51f8926215 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -33,9 +33,11 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 1)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 3)
+#define XDF_IGNORE_CR_AT_EOL (1 << 4)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | \
 			      XDF_IGNORE_WHITESPACE_CHANGE | \
-			      XDF_IGNORE_WHITESPACE_AT_EOL)
+			      XDF_IGNORE_WHITESPACE_AT_EOL | \
+			      XDF_IGNORE_CR_AT_EOL)
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 04d7b32e4e..b2cbcc818f 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -156,6 +156,24 @@ int xdl_blankline(const char *line, long size, long flags)
 	return (i == size);
 }
 
+/*
+ * Have we eaten everything on the line, except for an optional
+ * CR at the very end?
+ */
+static int ends_with_optional_cr(const char *l, long s, long i)
+{
+	int complete = s && l[s-1] == '\n';
+
+	if (complete)
+		s--;
+	if (s == i)
+		return 1;
+	/* do not ignore CR at the end of an incomplete line */
+	if (complete && s == i + 1 && l[i] == '\r')
+		return 1;
+	return 0;
+}
+
 int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 {
 	int i1, i2;
@@ -170,7 +188,8 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 
 	/*
 	 * -w matches everything that matches with -b, and -b in turn
-	 * matches everything that matches with --ignore-space-at-eol.
+	 * matches everything that matches with --ignore-space-at-eol,
+	 * which in turn matches everything that matches with --ignore-cr-at-eol.
 	 *
 	 * Each flavor of ignoring needs different logic to skip whitespaces
 	 * while we have both sides to compare.
@@ -204,6 +223,14 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 			i1++;
 			i2++;
 		}
+	} else if (flags & XDF_IGNORE_CR_AT_EOL) {
+		/* Find the first difference and see how the line ends */
+		while (i1 < s1 && i2 < s2 && l1[i1] == l2[i2]) {
+			i1++;
+			i2++;
+		}
+		return (ends_with_optional_cr(l1, s1, i1) &&
+			ends_with_optional_cr(l2, s2, i2));
 	}
 
 	/*
@@ -230,9 +257,16 @@ static unsigned long xdl_hash_record_with_whitespace(char const **data,
 		char const *top, long flags) {
 	unsigned long ha = 5381;
 	char const *ptr = *data;
+	int cr_at_eol_only = (flags & XDF_WHITESPACE_FLAGS) == XDF_IGNORE_CR_AT_EOL;
 
 	for (; ptr < top && *ptr != '\n'; ptr++) {
-		if (XDL_ISSPACE(*ptr)) {
+		if (cr_at_eol_only) {
+			/* do not ignore CR at the end of an incomplete line */
+			if (*ptr == '\r' &&
+			    (ptr + 1 < top && ptr[1] == '\n'))
+				continue;
+		}
+		else if (XDL_ISSPACE(*ptr)) {
 			const char *ptr2 = ptr;
 			int at_eol;
 			while (ptr + 1 < top && XDL_ISSPACE(ptr[1])
-- 
2.15.0-263-g47cc852023


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL
  2017-11-07  6:40       ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Junio C Hamano
  2017-11-07  6:40         ` [PATCH v2 1/2] xdiff: reassign xpparm_t.flags bits Junio C Hamano
  2017-11-07  6:40         ` [PATCH v2 2/2] diff: --ignore-cr-at-eol Junio C Hamano
@ 2017-11-07 12:30         ` Johannes Schindelin
  2017-11-07 15:12           ` Junio C Hamano
  2 siblings, 1 reply; 44+ messages in thread
From: Johannes Schindelin @ 2017-11-07 12:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Tue, 7 Nov 2017, Junio C Hamano wrote:

> This time with doc updates and tests.  The previous one is at 
> 
> https://public-inbox.org/git/xmqqshe7j0af.fsf@gitster.mtv.corp.google.com
> 
> A change that is supposed to only change the end-of-line convention
> between LF <-> CRLF and nothing else can be verified with "diff -b"
> or "diff --ignore-space-at-eol" in practice, but these hide changes
> in whitespaces other than the carriage return at the end of the lines.

Good. I was wishing for such a feature in the past.

However, the short and sweet `-b` or `-w` switches are really, really
nice. `--ignore-cr-at-eol` is just very cumbersome to type out. So I think
you will want to add this patch to your patch series:

-- snipsnap --
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 0e16f017a41..b7a45e8df29 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1400,7 +1400,7 @@ __git_diff_common_options="--stat --numstat --shortstat --summary
 			--patch-with-stat --name-only --name-status --color
 			--no-color --color-words --no-renames --check
 			--full-index --binary --abbrev --diff-filter=
-			--find-copies-harder
+			--find-copies-harder --ignore-cr-at-eol
 			--text --ignore-space-at-eol --ignore-space-change
 			--ignore-all-space --ignore-blank-lines --exit-code
 			--quiet --ext-diff --no-ext-diff

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/2] xdiff: reassign xpparm_t.flags bits
  2017-11-07  6:40         ` [PATCH v2 1/2] xdiff: reassign xpparm_t.flags bits Junio C Hamano
@ 2017-11-07 12:44           ` Johannes Schindelin
  2017-11-07 15:02             ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Johannes Schindelin @ 2017-11-07 12:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Tue, 7 Nov 2017, Junio C Hamano wrote:

> We have packed the bits too tightly in such a way that it is not
> easy to add a new type of whitespace ignoring option, a new type
> of LCS algorithm, or a new type of post-cleanup heuristics.
> 
> Reorder bits a bit to give room for these three classes of options
> to grow.  Also make use of XDF_WHITESPACE_FLAGS macro where we check
> any of these bits are on, instead of using DIFF_XDL_TST() macro on
> individual possibilities.  That way, the "is any of the bits on?"
> code does not have to change when we add more ways to ignore
> whitespaces.
> 
> While at it, add a comment in front of the bit definitions to
> clarify in which structure these defined bits may appear.

Makes sense.

> diff --git a/diff.c b/diff.c
> index 74283d9001..790250fe86 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3434,9 +3434,7 @@ void diff_setup_done(struct diff_options *options)
>  	 * inside contents.
>  	 */
>  
> -	if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
> -	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
> -	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
> +	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS))

Not only shorter now, but a lot clearer, too: nobody needs to wonder now
whether one whitespace flag was excluded on purpose.

>  		DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
>  	else
>  		DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index b090ad8eac..cbf5d8e166 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -27,22 +27,26 @@
>  extern "C" {
>  #endif /* #ifdef __cplusplus */
>  
> +/* xpparm_t.flags */
> +#define XDF_NEED_MINIMAL (1 << 0)
>  
> -#define XDF_NEED_MINIMAL (1 << 1)

This change makes me wonder whether the least significant bit was omitted
on purpose originally. You probably looked at that? May be worth
mentioning in the commit message.

> -#define XDF_IGNORE_WHITESPACE (1 << 2)
> -#define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
> -#define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
> -#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
> +#define XDF_IGNORE_WHITESPACE (1 << 1)
> +#define XDF_IGNORE_WHITESPACE_CHANGE (1 << 2)
> +#define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 3)
> +#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | \
> +			      XDF_IGNORE_WHITESPACE_CHANGE | \
> +			      XDF_IGNORE_WHITESPACE_AT_EOL)
>  
> -#define XDF_PATIENCE_DIFF (1 << 5)
> -#define XDF_HISTOGRAM_DIFF (1 << 6)
> +#define XDF_IGNORE_BLANK_LINES (1 << 7)
> +
> +#define XDF_PATIENCE_DIFF (1 << 14)
> +#define XDF_HISTOGRAM_DIFF (1 << 15)
>  #define XDF_DIFF_ALGORITHM_MASK (XDF_PATIENCE_DIFF | XDF_HISTOGRAM_DIFF)
>  #define XDF_DIFF_ALG(x) ((x) & XDF_DIFF_ALGORITHM_MASK)
>  
> -#define XDF_IGNORE_BLANK_LINES (1 << 7)
> -
> -#define XDF_INDENT_HEURISTIC (1 << 8)
> +#define XDF_INDENT_HEURISTIC (1 << 23)
>  
> +/* xdemitconf_t.flags */
>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)

It is a pity that this diff is not easier to review, but it shows how much
it was in need of cleaning up. Looks much nicer now.

I wonder, however, what your guiding principle was in determining the
gaps? I would have expected consecutive bits, except for the one gap to
make room for the upcoming flag, of course.

Future patch series could always start by making room for new flags, as
needed, after all.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/2] diff: --ignore-cr-at-eol
  2017-11-07  6:40         ` [PATCH v2 2/2] diff: --ignore-cr-at-eol Junio C Hamano
@ 2017-11-07 13:23           ` Johannes Schindelin
  2017-11-08  0:43             ` Junio C Hamano
  2017-11-15  4:28             ` Junio C Hamano
  0 siblings, 2 replies; 44+ messages in thread
From: Johannes Schindelin @ 2017-11-07 13:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Tue, 7 Nov 2017, Junio C Hamano wrote:

> A new option --ignore-cr-at-eol tells the diff machinery to treat a
> carriage-return at the end of a (complete) line as if it does not
> exist.
> 
> This would make it easier to review a change whose only effect is to
> turn line endings from CRLF to LF or the other way around.

If the goal is to make CR/LF -> LF conversions easier to review (or for
that matter, LF -> CR/LF), then this option may not be *completely*
satisfactory, as it would hide mixed changes (i.e. where some lines are
converted from CR/LF to LF and others are converted in the other direction
*in the same patch*).

I wonder whether it would make this patch series even more useful if you
would instead introduce --hide-crlf-to-lf and --hide-lf-to-crlf options
(not even necessarily mutually exclusive, in case the reviewer really
wants to ignore all line ending conversions).

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 89cc0f48de..aa2c0ff74d 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -519,6 +519,9 @@ endif::git-format-patch[]
>  --text::
>  	Treat all files as text.
>  
> +--ignore-cr-at-eol::
> +	Ignore carrige-return at the end of line when doing a comparison.

I am not a native speaker, either, yet I have the impression that "do a
comparison" may be more colloquial than not. Also, it is a carriage-return
(as in Sinatra's famous song about Love and Marriage) not a carrige-return.

How about "Hide changed line endings"?

> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index 04d7b32e4e..b2cbcc818f 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -156,6 +156,24 @@ int xdl_blankline(const char *line, long size, long flags)
>  	return (i == size);
>  }
>  
> +/*
> + * Have we eaten everything on the line, except for an optional
> + * CR at the very end?
> + */
> +static int ends_with_optional_cr(const char *l, long s, long i)
> +{
> +	int complete = s && l[s-1] == '\n';
> +
> +	if (complete)
> +		s--;
> +	if (s == i)
> +		return 1;

What is the role of `s`, what of `i`? Maybe `length` and `current_offset`?

> +	/* do not ignore CR at the end of an incomplete line */
> +	if (complete && s == i + 1 && l[i] == '\r')
> +		return 1;

This made me scratch my head: too many negations. The comment may better
read "ignore CR only at the end of a complete line".

And now I understand even less why `1` is returned if `s == i`? Is this
not an empty line (complete or incomplete) *without* a CR?

> @@ -204,6 +223,14 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
>  			i1++;
>  			i2++;
>  		}
> +	} else if (flags & XDF_IGNORE_CR_AT_EOL) {
> +		/* Find the first difference and see how the line ends */
> +		while (i1 < s1 && i2 < s2 && l1[i1] == l2[i2]) {
> +			i1++;
> +			i2++;
> +		}
> +		return (ends_with_optional_cr(l1, s1, i1) &&
> +			ends_with_optional_cr(l2, s2, i2));

There are extra parentheses around the `return` expression.

To accommodate the tentative --hide-crlf-to-lf and --hide-lf-to-crlf
options that I suggested earlier, this would simply become something like
this:

	} else if (flags & (XDF_IGNORE_CRLF_TO_LF | XDF_IGNORE_LF_TO_CRLF)) {
		/* Early exit: length must be equal or differ by 1 */
		if (s1 - i1 != s2 - i2 &&
		    s1 - i1 != s2 + 1 - i2 && s1 + 1 - i1 != s2 - i2)
			return 0;

		/* Early exit: skip incomplete lines */
		if (!s1 || !s2 || l1[s1-1] != '\n' || l2[s2-1] != '\n')
			return 0;

		/* Find the first difference and see how the line ends */
		while (i1 < s1 && i2 < s2 && l1[i1] == l2[i2]) {
			i1++;
			i2++;
		}

		/* Lines must be identical or have the indicated EOL change */
		return ((i1 == s1 && i2 == s2) ||
			((flags & XDF_IGNORE_CRLF_TO_LF) &&
			 i1 + 2 == s1 && l1[i1] == '\r' && i2 + 1 == s2) ||
			((flags & XDF_IGNORE_LF_TO_CRLF) &&
			 i1 + 1 == s1 && i2 + 2 == s2 && l2[i2] == '\r');

Note: I do not even know whether the code in this function has to assume
that the lines can be byte-wise identical or not, I just erred on the side
of caution. I also do not know whether the return value 0 indicates a
mistmatch, I just assumed it did. I really wish that I could have reviewed
the real code, not a patch in a mail program that lacks the context.

Caveat emptor: my proposed code change has been written in the mail
program (if we had a more code-centric review process, you would have a PR
with a suggested alternative already, I am really sorry for the
inconvenience).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/2] xdiff: reassign xpparm_t.flags bits
  2017-11-07 12:44           ` Johannes Schindelin
@ 2017-11-07 15:02             ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2017-11-07 15:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
>> index b090ad8eac..cbf5d8e166 100644
>> --- a/xdiff/xdiff.h
>> +++ b/xdiff/xdiff.h
>> @@ -27,22 +27,26 @@
>>  extern "C" {
>>  #endif /* #ifdef __cplusplus */
>>  
>> +/* xpparm_t.flags */
>> +#define XDF_NEED_MINIMAL (1 << 0)
>>  
>> -#define XDF_NEED_MINIMAL (1 << 1)
>
> This change makes me wonder whether the least significant bit was omitted
> on purpose originally. You probably looked at that? May be worth
> mentioning in the commit message.
>
>> -#define XDF_IGNORE_WHITESPACE (1 << 2)
>> -#define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
>> -#define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
>> -#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
>> +#define XDF_IGNORE_WHITESPACE (1 << 1)
>> +#define XDF_IGNORE_WHITESPACE_CHANGE (1 << 2)
>> +#define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 3)
>> +#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | \
>> +			      XDF_IGNORE_WHITESPACE_CHANGE | \
>> +			      XDF_IGNORE_WHITESPACE_AT_EOL)
>>  
>> -#define XDF_PATIENCE_DIFF (1 << 5)
>> -#define XDF_HISTOGRAM_DIFF (1 << 6)
>> +#define XDF_IGNORE_BLANK_LINES (1 << 7)
>> +
>> +#define XDF_PATIENCE_DIFF (1 << 14)
>> +#define XDF_HISTOGRAM_DIFF (1 << 15)
>>  #define XDF_DIFF_ALGORITHM_MASK (XDF_PATIENCE_DIFF | XDF_HISTOGRAM_DIFF)
>>  #define XDF_DIFF_ALG(x) ((x) & XDF_DIFF_ALGORITHM_MASK)
>>  
>> -#define XDF_IGNORE_BLANK_LINES (1 << 7)
>> -
>> -#define XDF_INDENT_HEURISTIC (1 << 8)
>> +#define XDF_INDENT_HEURISTIC (1 << 23)
>>  
>> +/* xdemitconf_t.flags */
>>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)
>
> It is a pity that this diff is not easier to review, but it shows how much
> it was in need of cleaning up. Looks much nicer now.
>
> I wonder, however, what your guiding principle was in determining the
> gaps? I would have expected consecutive bits, except for the one gap to
> make room for the upcoming flag, of course.

There wasn't that deep a thought went into it, actually.  

I wanted to give room for each "groupsof bits" to grow, and gave
each group a byte.  Whitespace bits sits in the first byte (but
ignore-blank-lines work quite different so it grabs the bit from the
other end), fundamental choices of LCS algos is another group that
sits in the second byte, then hunk shifting and other heuristics as
another group.  I guess that the "need minimal" thing could have
been thrown into the last group, but somehow I didn't think it would
grow that much, so I left it at the lowest place.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL
  2017-11-07 12:30         ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Johannes Schindelin
@ 2017-11-07 15:12           ` Junio C Hamano
  2017-11-07 17:42             ` Stefan Beller
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2017-11-07 15:12 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Good. I was wishing for such a feature in the past.
>
> However, the short and sweet `-b` or `-w` switches are really, really
> nice. `--ignore-cr-at-eol` is just very cumbersome to type out. So I think
> you will want to add this patch to your patch series:

Yeah, I should probably have added that to 2/2 from the beginning.

> -- snipsnap --
> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
> index 0e16f017a41..b7a45e8df29 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1400,7 +1400,7 @@ __git_diff_common_options="--stat --numstat --shortstat --summary
>  			--patch-with-stat --name-only --name-status --color
>  			--no-color --color-words --no-renames --check
>  			--full-index --binary --abbrev --diff-filter=
> -			--find-copies-harder
> +			--find-copies-harder --ignore-cr-at-eol
>  			--text --ignore-space-at-eol --ignore-space-change
>  			--ignore-all-space --ignore-blank-lines --exit-code
>  			--quiet --ext-diff --no-ext-diff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL
  2017-11-07 15:12           ` Junio C Hamano
@ 2017-11-07 17:42             ` Stefan Beller
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Beller @ 2017-11-07 17:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Tue, Nov 7, 2017 at 7:12 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> Good. I was wishing for such a feature in the past.
>>
>> However, the short and sweet `-b` or `-w` switches are really, really
>> nice. `--ignore-cr-at-eol` is just very cumbersome to type out. So I think
>> you will want to add this patch to your patch series:
>
> Yeah, I should probably have added that to 2/2 from the beginning.
>
>> -- snipsnap --
>> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
>> index 0e16f017a41..b7a45e8df29 100644
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -1400,7 +1400,7 @@ __git_diff_common_options="--stat --numstat --shortstat --summary
>>                       --patch-with-stat --name-only --name-status --color
>>                       --no-color --color-words --no-renames --check
>>                       --full-index --binary --abbrev --diff-filter=
>> -                     --find-copies-harder
>> +                     --find-copies-harder --ignore-cr-at-eol
>>                       --text --ignore-space-at-eol --ignore-space-change
>>                       --ignore-all-space --ignore-blank-lines --exit-code
>>                       --quiet --ext-diff --no-ext-diff

That series, with this squash, looks good to me.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/2] diff: --ignore-cr-at-eol
  2017-11-07 13:23           ` Johannes Schindelin
@ 2017-11-08  0:43             ` Junio C Hamano
  2017-11-08  0:49               ` Junio C Hamano
  2017-11-15  4:28             ` Junio C Hamano
  1 sibling, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2017-11-08  0:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Tue, 7 Nov 2017, Junio C Hamano wrote:
>
>> A new option --ignore-cr-at-eol tells the diff machinery to treat a
>> carriage-return at the end of a (complete) line as if it does not
>> exist.
>> 
>> This would make it easier to review a change whose only effect is to
>> turn line endings from CRLF to LF or the other way around.
>
> If the goal is to make CR/LF -> LF conversions easier to review (or for
> that matter, LF -> CR/LF), then this option may not be *completely*
> satisfactory, as it would hide mixed changes (i.e. where some lines are
> converted from CR/LF to LF and others are converted in the other direction
> *in the same patch*).

You are 100% right.

This feature is not about helping to review a patch that wanted to
do CRLF-to-LF (or the other way around) conversion at all.  Just
like the --ignore-space-at-eol is not a feature to make sure that
the only thing you did was to remove trailing whitespaces---it will
also ignore lines you added trailing whitespaces as irrelevant and
uninteresting.

In general, selling these "--ignore-*" whitespace options as a tool
for such a verification is incorrect.

These "--ignore-*" whitespace options are to help reviewing _other_
changes without getting distracted by the class of changes these
options represent.  I guess I may have to update the log message (I
do not think I wrote anything like that in the documentation update).

Thanks for pointing it out.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/2] diff: --ignore-cr-at-eol
  2017-11-08  0:43             ` Junio C Hamano
@ 2017-11-08  0:49               ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2017-11-08  0:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> These "--ignore-*" whitespace options are to help reviewing _other_
> changes without getting distracted by the class of changes these
> options represent.  I guess I may have to update the log message (I
> do not think I wrote anything like that in the documentation update).

Here is what I just did with "commit --amend" (there is no change
in the contents of the patch).

    diff: --ignore-cr-at-eol
    
    A new option --ignore-cr-at-eol tells the diff machinery to treat a
    carriage-return at the end of a (complete) line as if it does not
    exist.
    
    Just like other "--ignore-*" options to ignore various kinds of
    whitespace differences, this will help reviewing the real changes
    you made without getting distracted by spurious CRLF<->LF conversion
    made by your editor program.
    
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/2] diff: --ignore-cr-at-eol
  2017-11-07 13:23           ` Johannes Schindelin
  2017-11-08  0:43             ` Junio C Hamano
@ 2017-11-15  4:28             ` Junio C Hamano
  1 sibling, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2017-11-15  4:28 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

I notice that I left a few things unanswered even after giving
answers to the most important part (i.e. "what is this for was
sold incorrectly").  Here are the leftover bits.

>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> index 89cc0f48de..aa2c0ff74d 100644
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -519,6 +519,9 @@ endif::git-format-patch[]
>>  --text::
>>  	Treat all files as text.
>>  
>> +--ignore-cr-at-eol::
>> +	Ignore carrige-return at the end of line when doing a comparison.
>
> I am not a native speaker, either, yet I have the impression that "do a
> comparison" may be more colloquial than not. Also, it is a carriage-return
> (as in Sinatra's famous song about Love and Marriage) not a carrige-return.
>
> How about "Hide changed line endings"?

That felt like a good suggestion when I saw your reaction,
especially with only the parts visible in the patch and its context,
but after reviewing descriptions of other --ignore-* options, I no
longer think so.  The existing description of "--ignore-*" matches
manpages of GNU diff and they do not say "hide", either.

>> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index 04d7b32e4e..b2cbcc818f 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -156,6 +156,24 @@ int xdl_blankline(const char *line, long size, long flags)
>>  	return (i == size);
>>  }
>>  
>> +/*
>> + * Have we eaten everything on the line, except for an optional
>> + * CR at the very end?
>> + */
>> +static int ends_with_optional_cr(const char *l, long s, long i)
>> +{
>> +	int complete = s && l[s-1] == '\n';
>> +
>> +	if (complete)
>> +		s--;
>> +	if (s == i)
>> +		return 1;
>
> What is the role of `s`, what of `i`? Maybe `length` and `current_offset`?

I'd agree with that sentiment if this file were not borrowed code
but our own, but after looking at xdiff/ code, I think the names of
these variables follow the convention used there better, which
consistently names the variable for lines they deal with l1, l2, etc.,
their sizes s1, s2, etc., and the indices into the line i1, i2, etc.

>> +	/* do not ignore CR at the end of an incomplete line */
>> +	if (complete && s == i + 1 && l[i] == '\r')
>> +		return 1;
>
> This made me scratch my head: too many negations. The comment may better
> read "ignore CR only at the end of a complete line".

Perhaps.  "incomplete line" is a term with a specific definition
(and I think by "complete line" you mean a line that is not an
incomplete line), so I do not see the above comment as having too
many negations, though.  If you feel strongly about it, you could
"fix" it with a follow-up patch.

>> @@ -204,6 +223,14 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
>>  			i1++;
>>  			i2++;
>>  		}
>> +	} else if (flags & XDF_IGNORE_CR_AT_EOL) {
>> +		/* Find the first difference and see how the line ends */
>> +		while (i1 < s1 && i2 < s2 && l1[i1] == l2[i2]) {
>> +			i1++;
>> +			i2++;
>> +		}
>> +		return (ends_with_optional_cr(l1, s1, i1) &&
>> +			ends_with_optional_cr(l2, s2, i2));
>
> There are extra parentheses around the `return` expression.

Yes, everybody knows that return is not a function that needs a
parentheses around its parameter.  I would drop them if this
expression were not split into two lines, but because the expression
is split at &&, I think it reads better with the extra parens.  So
I'll leave them as-is.

Thanks.


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2017-11-15  4:28 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-24 17:48 Consequences of CRLF in index? Lars Schneider
2017-10-24 18:14 ` Jonathan Nieder
2017-10-24 19:02   ` Torsten Bögershausen
2017-10-25 12:13     ` Johannes Schindelin
2017-10-25  1:51   ` Junio C Hamano
2017-10-25  4:53     ` Junio C Hamano
2017-10-25 16:44       ` Stefan Beller
2017-10-26  5:54         ` Junio C Hamano
2017-10-27  6:13           ` Re* " Junio C Hamano
2017-10-27 17:06             ` Stefan Beller
2017-10-27 17:07             ` [PATCH 0/2] " Stefan Beller
2017-10-27 17:07               ` [PATCH 1/2] xdiff/xdiff.h: remove unused flags Stefan Beller
2017-10-27 17:07               ` [PATCH 2/2] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2017-10-30 17:20               ` [PATCH 0/2] Re* Consequences of CRLF in index? Stefan Beller
2017-10-31  2:44                 ` Junio C Hamano
2017-10-31 16:41                   ` Stefan Beller
2017-10-31 17:01                     ` Jeff King
2017-11-07  6:40       ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Junio C Hamano
2017-11-07  6:40         ` [PATCH v2 1/2] xdiff: reassign xpparm_t.flags bits Junio C Hamano
2017-11-07 12:44           ` Johannes Schindelin
2017-11-07 15:02             ` Junio C Hamano
2017-11-07  6:40         ` [PATCH v2 2/2] diff: --ignore-cr-at-eol Junio C Hamano
2017-11-07 13:23           ` Johannes Schindelin
2017-11-08  0:43             ` Junio C Hamano
2017-11-08  0:49               ` Junio C Hamano
2017-11-15  4:28             ` Junio C Hamano
2017-11-07 12:30         ` [PATCH v2 0/2] Teach "diff" to ignore only CR at EOL Johannes Schindelin
2017-11-07 15:12           ` Junio C Hamano
2017-11-07 17:42             ` Stefan Beller
2017-10-25 17:04   ` Consequences of CRLF in index? Lars Schneider
2017-10-25 17:13     ` Jonathan Nieder
2017-10-26 11:06       ` Lars Schneider
2017-10-26 19:15       ` Torsten Bögershausen
2017-10-24 21:04 ` Johannes Sixt
2017-10-25 12:19   ` Johannes Schindelin
2017-10-26  7:09     ` Johannes Sixt
2017-10-26 11:01       ` Lars Schneider
2017-10-26 19:22         ` Torsten Bögershausen
2017-10-26 20:20         ` Johannes Sixt
2017-10-26 20:30           ` Jonathan Nieder
2017-10-26 20:51             ` Johannes Sixt
2017-10-26 22:27               ` Ross Kabus
2017-10-27  1:05                 ` Junio C Hamano
2017-10-27 15:18         ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).