From: "Torsten Bögershausen" <tboegi@web.de>
To: "Junio C Hamano" <gitster@pobox.com>,
"Torsten Bögershausen" <tboegi@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge
Date: Thu, 7 Jul 2016 19:16:09 +0200 [thread overview]
Message-ID: <c36fe487-b8dc-9767-7fae-bee513dac0b2@web.de> (raw)
In-Reply-To: <xmqqa8huvmpv.fsf@gitster.mtv.corp.google.com>
On 2016-07-06 16.57, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
>> At some point inside the merge, Git calls read-cache.c/ce_compare_data(),
>> to check if the path named "file" is clean.
>> According to the tree information, the path "file" has the sha1 99b633.
>> #Note:
>> #sha1 99b633 is "first line\nsame line\n"
>
> I thought your scenario was that our side had CRLF both in the blob
> and in the working tree. So this is a different example? Our side
> has LF in the blob that is checked out with CRLF in the working tree,
> and their side has CRLF in the blob?
>
That was probably to confuse myself, and the rest of the world,
sorry for confusion.
This is the callstack:
merge-recursive.c/add_cacheinfo(unsigned int mode, const unsigned char *sha1,
const char *path, int stage, int refresh, int options)
{
struct cache_entry *ce;
ce = make_cache_entry
if (!ce)
return error(_("addinfo_cache failed for path '%s'"), path);
return add_cache_entry(ce, options);
}
#Calls
read-cache.c/make_cache_entry(path=file sha1=0x99b633 stage=0)
struct cache_entry *make_cache_entry(unsigned int mode,
const unsigned char *sha1, const char *path, int stage,
unsigned int refresh_options)
{
[snip]
ret = refresh_cache_entry(ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
}
#Calls
refresh_cache_ent(&the_index, ce, options, NULL, NULL);
#Calls
ie_modified()
#Calls
read-cache.c/ie_match_stat()
#Calls
changed |= ce_modified_check_fs(ce, st);
#Calls
ce_compare_data(path=file sha1=0x99b633)
#Calls
index_fd(..., ..., ce->name, ...)
#Note the sha is lost here, the parameter sha is the output.
Deep down, has_cr_in_index(path) will look at ad55e2 instead,
which is wrong.
>> ce_compare_data() starts the work:
>> OK, lets run index_fd(...,ce->name,...)
>> # index_fd will simulate a "git add" and return the sha1 (via the sha1 pointer)
>> # after the hashing.
>>
>> # ce_compare_data() will later compare ce->sha1 with the result stored in
>> # the local sha1. That's why a sha1 is in the parameter list.
>> # To return the resulting hash:
>>
>> ce_compare_data() calls index_fd(sha1, ..., ce->name, ...)
>>
>> #Down in index_fd():
>>
>> sha1_file.c/index_fd() calls index_core() (after consulting the attributes)
>> index_core() reads the file from the working tree into memory using
>> read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
>> to calculate the sha1.
>>
>> Before the hashing is done, index_mem() runs
>> convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
>> to convert "blobs to git internal format".
>>
>>
>> Here, convert_to_git() consults the .gitattributes (again) to find out that
>> crlf_action is CRLF_AUTO in our case.
>> The "new safer autocrlf handling" says that if a blob as any CR, don't convert
>> CRLFs at "git add".
>>
>> convert.c/crlf_to_git() starts to do it's job:
>> - look at buf (It has CRLF, conversion may be needed)
>> - consult blob_has_cr()
>> # Note: Before this patch, has_cr_in_index(path)) was used
>>
>> #Again, before this patch,
>> has_cr_in_index(path) calls read_blob_data_from_cache(path, &sz) to read the
>> blob into memory.
>>
>> Now read_blob_data_from_cache() is a macro, and we end up in
>> read_blob_data_from_index(&the_index, (path), (sz))
>>
>> read-cache.c/read_blob_data_from_index() starts its work:
>> pos = index_name_pos(istate, path, len);
>> if (pos < 0) {
>> /*
>> * We might be in the middle of a merge, in which
>> * case we would read stage #2 (ours).
>> */
>>
>> # And here, and this is important to notice, "ours" is sha1 ad55e2,
>> # which corresponds to "first line\r\nsame line\r\n"
>
> Where did 99b633 come from then? There still is something missing
> in this description.
>
> Puzzled...
This is an unfinished attempt for a commit message:
--------------------------------------------------
correct ce_compare_data() at the end of a merge
The following didn't work as expected:
- At the end of a merge
- merge.renormalize is true,
- .gitattributes = "* text=auto"
- core.eol = crlf
Merge a blob with CRLF "first line\r\nsame line\r\n" and a blob
with LF "first line\nsame line\n".
The expected result of the merge is "first line\nsame line\n".
The content in the working tree is "first line\r\nsame line\r\n",
and ce_compare_data() should find that the content is clean and return 0.
The following callstack does not work:
merge-recursive.c/add_cacheinfo(path=file sha1=0x99b633)
#Calls
refresh_cache_ent(&the_index, ce, options, NULL, NULL);
#Calls
ie_modified()
#Calls
read-cache.c/ie_match_stat()
#Calls
changed |= ce_modified_check_fs(ce, st);
#Calls
ce_compare_data(path=file sha1=0x99b633)
#Calls
index_fd(..., ..., ce->name, ...)
#Note the sha is lost here.
#Calls
index_core()
index_core() reads the file from the working tree into memory using
read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
to calculate the sha1.
Before the hashing is done, index_mem() runs
convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
to convert "blobs to git internal format".
Here, convert_to_git() consults the .gitattributes (again) to find out that
crlf_action is CRLF_AUTO in our case.
The "new safer autocrlf handling" says that if a blob as any CR, don't convert
CRLFs at "git add".
convert.c/crlf_to_git() starts to do it's job:
- look at buf (It has CRLF, conversion may be needed)
- consult blob_has_cr()
# Note: Before this patch, has_cr_in_index(path)) was used
#Before this patch,
has_cr_in_index(path)
#Calls
read_blob_data_from_cache(path, &sz) to read the blob into memory.
Now read_blob_data_from_cache() is a macro, and we end up in
read_blob_data_from_index(&the_index, (path), (sz))
read-cache.c/read_blob_data_from_index() starts its work:
pos = index_name_pos(istate, path, len);
if (pos < 0) {
/*
* We might be in the middle of a merge, in which
* case we would read stage #2 (ours).
*/
# And here, and this is important to notice, "ours" is sha1 ad55e2,
# which corresponds to "first line\r\nsame line\r\n"
The result of the check is that the blob 99b633 doesn't seem
to match the working tree, and the whold merge is aborted:
error: addinfo_cache failed for path 'file'
Solution:
Make sure that ce_compare_data() forwards the source sha1 into crlf_to_git().
Replace has_cr_in_index(path) with blob_has_cr(sha1), and forward
the source sha1 from ce_compare_data() into blob_has_cr(sha1).
While at it, rename has_cr_in_index() into blob_has_cr()
and replace 0 with SAFE_CRLF_FALSE.
Add a TC in t6038 to have a test coverage under Linux.
next prev parent reply other threads:[~2016-07-07 17:20 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-27 23:21 What's cooking in git.git (Jun 2016, #09; Mon, 27) Junio C Hamano
2016-06-28 8:01 ` [PATCH v3 0/3] unified auto CRLF handling, V3 tboegi
2016-06-28 8:01 ` [PATCH v3 1/3] convert: unify the "auto" handling of CRLF tboegi
2016-06-28 8:01 ` [PATCH v3 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
2016-06-28 8:01 ` [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge tboegi
2016-06-29 16:14 ` Junio C Hamano
2016-06-30 16:52 ` Torsten Bögershausen
2016-07-01 22:11 ` Junio C Hamano
2016-07-02 18:41 ` Torsten Bögershausen
2016-07-06 14:57 ` Junio C Hamano
2016-07-07 17:16 ` Torsten Bögershausen [this message]
2016-07-07 18:43 ` Junio C Hamano
2016-07-07 22:19 ` Junio C Hamano
2016-07-08 7:52 ` Torsten Bögershausen
2016-07-08 16:36 ` Junio C Hamano
2016-07-08 17:13 ` Torsten Bögershausen
2016-07-08 17:25 ` Junio C Hamano
2016-07-08 17:59 ` Junio C Hamano
2016-07-08 19:01 ` Junio C Hamano
2016-07-08 20:50 ` Junio C Hamano
2016-07-11 20:07 ` Junio C Hamano
2016-07-12 2:23 ` Torsten Bögershausen
2016-07-12 19:54 ` Junio C Hamano
2016-07-08 15:00 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c36fe487-b8dc-9767-7fae-bee513dac0b2@web.de \
--to=tboegi@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).