git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: "Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge
Date: Sat, 2 Jul 2016 20:41:33 +0200	[thread overview]
Message-ID: <574692d1-c8ae-9c2f-6b99-a01545b15051@telia.com> (raw)
In-Reply-To: <xmqq8txlvwip.fsf@gitster.mtv.corp.google.com>

On 2016-07-02 00.11, Junio C Hamano wrote:
[snip]
> diff --git a/read-cache.c b/read-cache.c
> index a3ef967..c109b6d 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
>
>  	if (fd >= 0) {
>  		unsigned char sha1[20];
> -		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
> +		unsigned flags = HASH_USE_SHA_NOT_PATH;
> +		memcpy(sha1, ce->sha1, sizeof(sha1));
> +		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
>  			match = hashcmp(sha1, ce->sha1);
>  		/* index_fd() closed the file descriptor already */
>  	}
>
> I do not quite see how "Don't use ce->sha1, but use sha1 given to
> you" flag affects this call, when sha1 and ce->sha1 are the same due
> to memcpy() just before the call?
>

Lets step back a second. and try to explain whats going on.

Do a merge (and renormalize the files).

At some point inside the merge, Git calls read-cache.c/ce_compare_data(),
   to check if the path named "file" is clean.
   According to the tree information, the path "file" has the sha1 99b633.
   #Note:
   #sha1 99b633 is "first line\nsame line\n"

ce_compare_data() starts the work:
OK, lets run index_fd(...,ce->name,...)
# index_fd will simulate a "git add"  and return the sha1 (via the sha1 pointer)
# after the hashing.

# ce_compare_data() will later compare ce->sha1 with the result stored in
# the local sha1. That's why a sha1 is in the parameter list.
# To return the resulting hash:

ce_compare_data() calls index_fd(sha1, ..., ce->name, ...)

#Down in index_fd():

sha1_file.c/index_fd() calls index_core() (after consulting the attributes)
index_core() reads the file from the working tree into memory using
read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
to calculate the sha1.

Before the hashing is done, index_mem() runs
convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
to convert  "blobs to git internal format".


Here, convert_to_git() consults the .gitattributes (again) to find out that
crlf_action is CRLF_AUTO in our case.
The "new safer autocrlf handling" says that if a blob as any CR, don't convert
CRLFs at "git add".

convert.c/crlf_to_git() starts to do it's job:
- look at buf (It has CRLF, conversion may be needed)
- consult blob_has_cr()
   # Note: Before this patch, has_cr_in_index(path)) was used

#Again, before this patch,
has_cr_in_index(path) calls read_blob_data_from_cache(path, &sz) to read the
blob into memory.

Now read_blob_data_from_cache() is a macro, and we end up in
read_blob_data_from_index(&the_index, (path), (sz))

read-cache.c/read_blob_data_from_index() starts its work:
	pos = index_name_pos(istate, path, len);
	if (pos < 0) {
		/*
		 * We might be in the middle of a merge, in which
		 * case we would read stage #2 (ours).
		 */

# And here, and this is important to notice, "ours" is sha1 ad55e2,
# which corresponds to "first line\r\nsame line\r\n"
# On our way in the callstack the information what to check had been lost:
# blob 99b633 and nothing else.
# read_blob_data_from_index() doesn't know the sha, but makes a guess,
# derived from path.
# The guess works OK for most callers: to take "ours" in a middle
# of a merge, but not here.

#Back to crlf_to_git():
The (wrong) blob has CRs in it, so crlf_to_git() decides not to convert CRLFs,
(and this is wrong).

# And now we go all the way back in the call chain:

The content in the worktree which is "first line\r\nsame line\r\n" is hashed:
hash_sha1_file(buf, size, typename(type), sha1);
#and the result is stored in the return pointer "sha1": ad55e2

#Back to ce_compare_data():
match = hashcmp(sha1, ce->sha1);

#And here sha1=ad55e2 != ce->sha199b633

The whole merge is aborted:

    error: addinfo_cache failed for path 'file'
     file: unmerged (cbd69ec7cd12dd0989e853923867d94c8519aa52)
     file: unmerged (ad55e240aeb42e0d9a0e18d6d8b02dd82ee3e527)
     file: unmerged (99b633103c15c20cebebf821133ab526b0ff90b2)
     fatal: git write-tree failed to write a tree
#Side note: cbd69ec is another file in the working tree.

#######################

With this patch,
ce_compare_data() feeds the sha1 99b633 into
blob_has_cr(const unsigned char *index_blob_sha1).

crlf_to_git() says:
"first line\r\nsame line\r\n" in the worktree needs to be converted
into
"first line\nsame line\n" as the "new" blob.

The new (converted) blob "first line\nsame line\n"
is hashed into 99b633, so ce_compare_data() says
fine, converting "first line\r\nsame line\r\n" will give sha1 99b633,
lets continue with the merge, and do the normalization.

# Note1:
# The renormalization is outside what this patch fixes.
# But I didn't manage to construct a test case, which triggered
# "fatal: git write-tree failed to write a tree" without using
# merge.renormalize

# Note2:
# ce_compare_data() has (if I understand things right)
# no knowledge that a merge is going on
# ce_compare_data() has now knowledge that a merge
# with a merge with merge.renormalize=true is going on either.

#Note3:
# I hope that this explanation makes it more clear,
# why ce->sha1 must be forwarded into blob_has_cr() (to check the right blob)
# and that ce->name must be forwarded as path into crlf_to_git()
# (to check the .gitattributes)


  reply	other threads:[~2016-07-02 18:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-27 23:21 What's cooking in git.git (Jun 2016, #09; Mon, 27) Junio C Hamano
2016-06-28  8:01 ` [PATCH v3 0/3] unified auto CRLF handling, V3 tboegi
2016-06-28  8:01 ` [PATCH v3 1/3] convert: unify the "auto" handling of CRLF tboegi
2016-06-28  8:01 ` [PATCH v3 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
2016-06-28  8:01 ` [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge tboegi
2016-06-29 16:14   ` Junio C Hamano
2016-06-30 16:52     ` Torsten Bögershausen
2016-07-01 22:11       ` Junio C Hamano
2016-07-02 18:41         ` Torsten Bögershausen [this message]
2016-07-06 14:57           ` Junio C Hamano
2016-07-07 17:16             ` Torsten Bögershausen
2016-07-07 18:43               ` Junio C Hamano
2016-07-07 22:19                 ` Junio C Hamano
2016-07-08  7:52                 ` Torsten Bögershausen
2016-07-08 16:36                   ` Junio C Hamano
2016-07-08 17:13                     ` Torsten Bögershausen
2016-07-08 17:25                       ` Junio C Hamano
2016-07-08 17:59                         ` Junio C Hamano
2016-07-08 19:01                       ` Junio C Hamano
2016-07-08 20:50                         ` Junio C Hamano
2016-07-11 20:07                           ` Junio C Hamano
2016-07-12  2:23                             ` Torsten Bögershausen
2016-07-12 19:54                               ` Junio C Hamano
2016-07-08 15:00                 ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=574692d1-c8ae-9c2f-6b99-a01545b15051@telia.com \
    --to=tboegi@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).