git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Junio C Hamano <gitster@pobox.com>, tboegi@web.de
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/3] git reset --hard gives clean working tree
Date: Sat, 5 Mar 2016 08:23:22 +0100	[thread overview]
Message-ID: <56DA896A.3050201@web.de> (raw)
In-Reply-To: <xmqqy4arw089.fsf@gitster.mtv.corp.google.com>



On 11.02.16 19:49, Junio C Hamano wrote:
> tboegi@web.de writes:
>
>> From: Torsten Bögershausen <tboegi@web.de>
>>
>> We define the working tree file is clean if either:
>>
>>   * the result of running convert_to_git() on the working tree
>>     contents matches what is in the index (because that would mean
>>     doing another "git add" on the path is a no-op); OR
>>
>>   * the result of running convert_to_working_tree() on the content
>>     in the index matches what is in the working tree (because that
>>     would mean doing another "git checkout -f" on the path is a
>>     no-op).
>>
>> Add an extra check in ce_compare_data() in read_cache.c, and adjust
>> the test cases in t0025:
>> When a file has CRLF in the index, and is checked out into the working tree,
>> but left unchabged, it is not normalized at the next commit.
>> Whenever the file is changed in the working tree, a line is added/deleted
>> or dos2unix is run, it may be normalized at the next commit,
>> depending on .gitattributes.
>>
>> This patch is a result of a longer discussion on the mailing list,
>> how to fix the flaky t0025.
> Currently, the codepaths that want to stop if it would lose
> information from the index and/or the working tree for the path run
> an equivalent of "diff-files path" to see there is any difference.
> This indeed is overly strict for a path that has contents in the
> index that wouldn't have been created by "clean" conversion (I am
> using this word to mean anything convert_to_git() does, not limited
> to the "clean" filter).
>
> And it is sensible to allow them to proceed if the contents in the
> working tree file for the path match what would be created by
> "smudge" conversion of the contents in the index.
>
> But breaking "diff-files" is not the right way to do so.  Teaching
> "Am I safe to proceed" callers that paths that do not pass
> "diff-files" test may still be safe to work on is.
>
> I did not continue the approach I illustrated because I realized and
> finally convinced myself that touching ce_compare_data() is a wrong
> solution--it breaks "diff-files".
>
> Imagine if you have contents in the index that wouldn't have been
> left by a "clean" conversion of what is in the working tree.  You
> then run "git checkout -f".  Now the contents in the working tree
> will still not convert back to what is in the index with another
> "clean" conversion, but it should pass the "Am I safe to proceed"
> check, namely, it matches what convert_to_worktree() would give.
>
> But imagine further what would happen when you add an extra blank
> line at the end of the file in the working tree (i.e. "echo >>file")
> and then run "diff-files -p".
>
> The illustration patch I gave broke "diff-files" in such a way that
> before such an addition of an extra blank line, it would have said
> "No changes".  And if you run "diff-files" after adding that extra
> blank line, you will see whole bunch of changes, not just the extra
> blank line at the end.
>
> This is sufficient to convince me that the approach is broken.
[]
Would something like this make sense?
(The main part is in diff.c, the rest needs some polishing)

commit e494c31fd2f0f8a638ff14d1b8ae3f3a6d56a107
Author: Torsten Bögershausen <tboegi@web.de>
Date:   Sat Mar 5 07:51:08 2016 +0100

    Text files needs to be normalized before diffing
   
    Whenever a text file is stored with CRLF in the index, Git changes
    CRLF into LF at the next commit.
    (text file means the attribute "text" or "crlf" of "eol" is set).
   
    "git diff" reports that all lines with CRLF in the index as changed to LF.
    After cloning a repo, the work tree is not considered clean by Git, even if
    the user didn't change a single line.
   
    Avoid to report lines as changed by converting the content from the index into
    LF before running diff.

diff --git a/convert.c b/convert.c
index f524b8d..af8248d 100644
--- a/convert.c
+++ b/convert.c
@@ -231,9 +231,9 @@ static int has_cr_in_index(const char *path)
     return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
-               struct strbuf *buf,
-               enum crlf_action crlf_action, enum safe_crlf checksafe)
+static int crlf_to_git_internal(const char *path, const char *src, size_t len,
+                struct strbuf *buf,
+                enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
     struct text_stat stats;
     char *dst;
@@ -852,7 +852,17 @@ const char *get_convert_attr_ascii(const char *path)
     return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
+int convert_crlf_to_git(const char *path, const char *src, size_t len,
+            struct strbuf *buf,
+            enum safe_crlf checksafe)
+{
+    struct conv_attrs ca;
+    convert_attrs(&ca, path);
+    return crlf_to_git_internal(path, src, len, buf,
+                    ca.crlf_action, checksafe);
+
+}
+    int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
     int ret = 0;
@@ -874,7 +884,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
         src = dst->buf;
         len = dst->len;
     }
-    ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+    ret |= crlf_to_git_internal(path, src, len, dst, ca.crlf_action, checksafe);
     if (ret && dst) {
         src = dst->buf;
         len = dst->len;
@@ -894,7 +904,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
     if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
         die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-    crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+    crlf_to_git_internal(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
     ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..11eff02 100644
--- a/convert.h
+++ b/convert.h
@@ -52,6 +52,9 @@ extern void convert_to_git_filter_fd(const char *path, int fd,
                      struct strbuf *dst,
                      enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
+extern int convert_crlf_to_git(const char *path, const char *src, size_t len,
+                   struct strbuf *buf,
+                   enum safe_crlf checksafe);
 
 /*****************************************************************
  *
diff --git a/diff.c b/diff.c
index 059123c..8710a36 100644
--- a/diff.c
+++ b/diff.c
@@ -2730,6 +2730,7 @@ static int diff_populate_gitlink(struct diff_filespec *s, int size_only)
  */
 int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 {
+    struct strbuf buf = STRBUF_INIT;
     int size_only = flags & CHECK_SIZE_ONLY;
     int err = 0;
     /*
@@ -2756,7 +2757,6 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 
     if (!s->sha1_valid ||
         reuse_worktree_file(s->path, s->sha1, 0)) {
-        struct strbuf buf = STRBUF_INIT;
         struct stat st;
         int fd;
 
@@ -2826,6 +2826,12 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
         if (!s->data)
             die("unable to read %s", sha1_to_hex(s->sha1));
         s->should_free = 1;
+        if (convert_crlf_to_git(s->path, s->data, s->size,
+                    &buf, SAFE_CRLF_FALSE)) {
+            size_t size = 0;
+            s->data = strbuf_detach(&buf, &size);
+            s->size = size;
+        }
     }
     return 0;
 }
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index c164b46..dd1645b 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -39,7 +39,7 @@ test_expect_success 'default settings cause no changes' '
     test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'crlf=true causes a CRLF file to be normalized' '
+test_expect_success 'crlf=true on a CRLF file git diff is empty' '
 
     # Backwards compatibility check
     rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
@@ -49,19 +49,18 @@ test_expect_success 'crlf=true causes a CRLF file to be normalized' '
     # Note, "normalized" means that git will normalize it if added
     has_cr CRLFonly &&
     CRLFonlydiff=$(git diff CRLFonly) &&
-    test -n "$CRLFonlydiff"
+    test -z "$CRLFonlydiff"
 '
 
-test_expect_success 'text=true causes a CRLF file to be normalized' '
+test_expect_success 'eol=crlf gives CRLF with no diff' '
 
     rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
-    echo "CRLFonly text" > .gitattributes &&
+    echo "CRLFonly text eol=crlf" > .gitattributes &&
     git read-tree --reset -u HEAD &&
-
-    # Note, "normalized" means that git will normalize it if added
-    has_cr CRLFonly &&
-    CRLFonlydiff=$(git diff CRLFonly) &&
-    test -n "$CRLFonlydiff"
+    >expect &&
+    git diff CRLFonly | tr "\015" Q >actual &&
+    test_cmp expect actual &&
+    has_cr CRLFonly
 '
 
 test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=false' '
@@ -114,7 +113,7 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
     test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'text=auto, autocrlf=true git diff is empty' '
 
     rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
     git config core.autocrlf true &&
@@ -126,7 +125,7 @@ test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
     LFonlydiff=$(git diff LFonly) &&
     CRLFonlydiff=$(git diff CRLFonly) &&
     LFwithNULdiff=$(git diff LFwithNUL) &&
-    test -z "$LFonlydiff" -a -n "$CRLFonlydiff" -a -z "$LFwithNULdiff"
+    test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
 test_expect_success 'text=auto, autocrlf=true does not normalize binary files' '

  reply	other threads:[~2016-03-05  7:23 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Message-Id=xmqqio26nqk8.fsf@gitster.mtv.corp.google.com>
2016-02-11 16:16 ` [PATCH 1/3] git reset --hard gives clean working tree tboegi
2016-02-11 18:49   ` Junio C Hamano
2016-03-05  7:23     ` Torsten Bögershausen [this message]
2016-03-05  8:05       ` Junio C Hamano
2016-03-05  8:27         ` Torsten Bögershausen
2016-03-05 21:18           ` Junio C Hamano
2016-03-07  8:14             ` Junio C Hamano
2016-03-07  8:51               ` Junio C Hamano
2016-03-07  8:58                 ` Torsten Bögershausen
2016-03-07 22:34                   ` Junio C Hamano
2016-03-29 13:25                     ` [PATCH v1 1/7] Make it possible to get sha1 for a path from the index tboegi
2016-03-29 13:28                       ` Duy Nguyen
2016-03-29 13:31                         ` Duy Nguyen
2016-03-29 15:05                           ` Torsten Bögershausen
2016-03-29 19:32                       ` Eric Sunshine
2016-03-29 13:25                     ` [PATCH v1 2/7] convert.c: stream and early out tboegi
2016-03-29 13:25                     ` [PATCH v1 3/7] Allow core.autocrlf=input and core.eol=crlf tboegi
2016-03-29 13:25                     ` [PATCH v1 4/7] t0027: TC for combined attributes tboegi
2016-03-29 13:25                     ` [PATCH v1 5/7] CRLF: unify the "auto" handling tboegi
2016-03-29 19:42                       ` Eric Sunshine
2016-03-29 13:25                     ` [PATCH v1 6/7] correct blame for files commited with CRLF tboegi
2016-03-29 17:21                       ` Junio C Hamano
2016-03-29 19:51                         ` Torsten Bögershausen
2016-03-29 19:58                           ` Junio C Hamano
2016-03-29 20:25                           ` Junio C Hamano
2016-03-29 20:32                             ` Junio C Hamano
2016-03-29 20:50                               ` Junio C Hamano
2016-03-30 17:48                                 ` Torsten Bögershausen
2016-03-29 13:25                     ` [PATCH v1 7/7] convert.c: more safer crlf handling with text attribute tboegi
2016-03-29 18:37                       ` Junio C Hamano
2016-04-01 16:08                     ` [PATCH v2 1/7] Make it possible to get sha1 for a path from the index tboegi
2016-04-01 16:08                     ` [PATCH v2 2/7] convert.c: stream and early out tboegi
2016-04-01 16:08                     ` [PATCH v2 3/7] Allow core.autocrlf=input and core.eol=crlf tboegi
2016-04-01 22:20                       ` Junio C Hamano
2016-04-01 16:08                     ` [PATCH v2 4/7] t0027: TC for combined attributes tboegi
2016-04-01 22:22                       ` Junio C Hamano
2016-04-01 16:08                     ` [PATCH v2 5/7] CRLF: unify the "auto" handling tboegi
2016-04-01 22:25                       ` Junio C Hamano
2016-04-01 16:08                     ` [PATCH v2 6/7] correct blame for files commited with CRLF tboegi
2016-04-01 22:29                       ` Junio C Hamano
2016-04-03  9:29                         ` Torsten Bögershausen
2016-04-01 16:08                     ` [PATCH v2 7/7] convert.c: more safer crlf handling with text attribute tboegi
2016-04-05 19:23                     ` [PATCH v1] correct blame for files commited with CRLF tboegi
2016-04-05 20:57                       ` Junio C Hamano
2016-04-05 21:12                       ` Junio C Hamano
2016-04-06  4:17                         ` Torsten Bögershausen
2016-04-19 13:24                     ` [PATCH v5 1/4] t0027: Make more reliable tboegi
2016-04-19 13:26                     ` [PATCH v5 2/4] convert: allow core.autocrlf=input and core.eol=crlf tboegi
2016-04-19 13:26                     ` [PATCH v5 3/4] t0027: test cases for combined attributes tboegi
2016-04-19 21:32                       ` Junio C Hamano
2016-04-20 15:52                         ` Torsten Bögershausen
2016-04-19 13:26                     ` [PATCH v5 4/4] convert.c: ident + core.autocrlf didn't work tboegi
2016-04-20 22:27                       ` Junio C Hamano
2016-04-22 14:38                     ` [PATCH v6 01/10] t0027: Make more reliable tboegi
2016-04-22 22:03                       ` Junio C Hamano
2016-04-24  3:45                         ` Torsten Bögershausen
2016-04-22 14:53                     ` [PATCH v6 02/10] convert: allow core.autocrlf=input and core.eol=crlf tboegi
2016-04-22 14:53                     ` [PATCH v6 03/10] t0027: test cases for combined attributes tboegi
2016-04-22 14:53                     ` [PATCH v6 04/10] convert.c: ident + core.autocrlf didn't work tboegi
2016-04-22 14:53                     ` [PATCH v6 05/10] read-cache: factor out get_sha1_from_index() helper tboegi
2016-04-22 14:53                     ` [PATCH v6 06/10] convert.c: stream and early out tboegi
2016-04-22 14:53                     ` [PATCH v6 07/10] convert: unify the "auto" handling of CRLF tboegi
2016-04-22 14:53                     ` [PATCH v6 08/10] convert.c: more safer crlf handling with text attribute tboegi
2016-04-22 14:53                     ` [PATCH v6 09/10] t6038; use crlf on all platforms tboegi
2016-04-22 14:53                     ` [PATCH v6 10/10] ce_compare_data() did not respect conversion tboegi
2016-04-24 15:10                     ` [PATCH v6b 01/10] t0027: Make commit_chk_wrnNNO() reliable tboegi
2016-04-24 15:11                     ` [PATCH v6b 02/10] convert: allow core.autocrlf=input and core.eol=crlf tboegi
2016-04-24 15:11                     ` [PATCH v6b 03/10] t0027: test cases for combined attributes tboegi
2016-04-24 15:11                     ` [PATCH v6b 04/10] convert.c: ident + core.autocrlf didn't work tboegi
2016-04-24 15:11                     ` [PATCH v6b 05/10] read-cache: factor out get_sha1_from_index() helper tboegi
2016-04-24 15:11                     ` [PATCH v6b 06/10] convert.c: stream and early out tboegi
2016-04-24 15:11                     ` [PATCH v6b 07/10] convert: unify the "auto" handling of CRLF tboegi
2016-04-24 15:11                     ` [PATCH v6b 08/10] convert.c: more safer crlf handling with text attribute tboegi
2016-04-24 15:11                     ` [PATCH v6b 09/10] t6038; use crlf on all platforms tboegi
2016-04-24 15:11                     ` [PATCH v6b 10/10] ce_compare_data() did not respect conversion tboegi
2016-04-25 16:56                     ` [PATCH v7 01/10] t0027: Make commit_chk_wrnNNO() reliable tboegi
2016-04-25 19:15                       ` Junio C Hamano
2016-04-25 16:56                     ` [PATCH v7 02/10] convert: allow core.autocrlf=input and core.eol=crlf tboegi
2016-04-25 16:56                     ` [PATCH v7 03/10] t0027: test cases for combined attributes tboegi
2016-04-25 16:56                     ` [PATCH v7 04/10] convert.c: ident + core.autocrlf didn't work tboegi
2016-04-25 16:56                     ` [PATCH v7 05/10] read-cache: factor out get_sha1_from_index() helper tboegi
2016-04-25 16:56                     ` [PATCH v7 06/10] convert.c: stream and early out tboegi
2016-04-25 16:56                     ` [PATCH v7 07/10] convert: unify the "auto" handling of CRLF tboegi
2016-04-25 19:37                       ` Junio C Hamano
2016-04-26 16:33                         ` Torsten Bögershausen
2016-04-26 17:42                           ` Junio C Hamano
2016-04-25 16:56                     ` [PATCH v7 08/10] convert.c: more safer crlf handling with text attribute tboegi
2016-04-25 16:56                     ` [PATCH v7 09/10] t6038; use crlf on all platforms tboegi
2016-04-25 16:56                     ` [PATCH v7 10/10] ce_compare_data() did not respect conversion tboegi
2016-04-29 15:01                     ` [PATCH v8 01/10] t0027: make commit_chk_wrnNNO() reliable tboegi
2016-04-29 15:01                     ` [PATCH v8 02/10] convert: allow core.autocrlf=input and core.eol=crlf tboegi
2016-04-29 15:01                     ` [PATCH v8 03/10] t0027: test cases for combined attributes tboegi
2016-04-29 15:01                     ` [PATCH v8 04/10] convert.c: ident + core.autocrlf didn't work tboegi
2016-04-29 15:02                     ` [PATCH v8 05/10] read-cache: factor out get_sha1_from_index() helper tboegi
2016-04-29 15:02                     ` [PATCH v8 06/10] convert.c: stream and early out tboegi
2016-04-29 15:02                     ` [PATCH v8 07/10] convert: unify the "auto" handling of CRLF tboegi
2016-11-25 15:48                       ` Torsten Bögershausen
2016-11-27 16:22                         ` [PATCH/RFC v1 1/1] New way to normalize the line endings tboegi
2016-11-29 19:15                           ` Junio C Hamano
2017-04-12 11:48                         ` [PATCH v2 1/1] Document how " tboegi
2016-04-29 15:02                     ` [PATCH v8 08/10] convert.c: more safer crlf handling with text attribute tboegi
2016-04-29 15:02                     ` [PATCH v8 09/10] t6038; use crlf on all platforms tboegi
2016-04-29 15:02                     ` [PATCH v8 10/10] ce_compare_data() did not respect conversion tboegi
2016-04-29 18:20                       ` Junio C Hamano
2016-04-29 21:09                       ` Junio C Hamano
2016-05-01 16:27                         ` Torsten Bögershausen
2016-05-02 18:16                           ` Junio C Hamano
2016-05-02 19:33                             ` Junio C Hamano
2016-05-03 16:02                               ` Torsten Bögershausen
2016-05-03 18:31                                 ` Junio C Hamano
2016-05-04  4:07                                   ` Torsten Bögershausen
2016-05-04  7:23                                     ` Junio C Hamano
2016-05-06  8:54                                       ` Torsten Bögershausen
2016-05-06 17:11                                         ` Junio C Hamano
2016-05-07  6:10                     ` [PATCH v9 0/6] convert-eol-autocrlf, old 5..10 now 1..6 tboegi
2016-05-07  6:10                     ` [PATCH v9 1/6] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-09 19:54                       ` Junio C Hamano
2016-05-07  6:11                     ` [PATCH v9 2/6] convert.c: stream and early out tboegi
2016-05-09 20:29                       ` Junio C Hamano
2016-05-11  4:30                         ` Torsten Bögershausen
2016-05-07  6:11                     ` [PATCH v9 3/6] convert: unify the "auto" handling of CRLF tboegi
2016-05-07  6:11                     ` [PATCH v9 4/6] convert.c: more safer crlf handling with text attribute tboegi
2016-05-07  6:11                     ` [PATCH v9 5/6] t6038; use crlf on all platforms tboegi
2016-05-07  6:11                     ` [PATCH v9 6/6] convert: ce_compare_data() checks for a sha1 of a path tboegi
2016-02-11 16:16 ` [PATCH 2/3] Factor out convert_cmp_checkout() into convert.c tboegi
2016-02-11 16:16 ` [PATCH 3/3] convert.c: Optimize convert_cmp_checkout() for changed file len tboegi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56DA896A.3050201@web.de \
    --to=tboegi@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).