git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC] ll-merge: Normalize files before merging
@ 2010-06-10 20:48 Eyvind Bernhardsen
  2010-06-11  5:49 ` Johannes Sixt
  0 siblings, 1 reply; 8+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-10 20:48 UTC (permalink / raw)
  To: git

Currently, merging across changes in line ending normalization is
painful since all lines containing CRLF will conflict uselessly.

Fix ll-merge so that the "base", "theirs" and "ours" files are passed
through convert_to_git() before a three-way merge.  This prevents
differences that can be normalized away from blocking an automatic
merge.
---
I have a repository that has been normalized using "* text=auto" and
need to merge un-normalized changes (ie, branches based on commits from
before the normalization) into it.  Unfortunately, this doesn't work:
every line containing a CRLF causes a conflict.  Fortunately, there is
an easy fix in the merge code.

This patch has already been useful to me, but I'm not sure it is the
best possible solution to the problem (especially in terms of
efficiency), hence the RFC.

Note that clean and ident filters will also be run, which might be a
good thing.  Also, the tests require my crlf/text series from pu.
-- 
Eyvind



 ll-merge.c                 |   12 +++++++++
 t/t6038-merge-text-auto.sh |   54 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 0 deletions(-)
 create mode 100755 t/t6038-merge-text-auto.sh

diff --git a/ll-merge.c b/ll-merge.c
index f9b3d85..63835af 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -321,6 +321,15 @@ static int git_path_check_merge(const char *path, struct git_attr_check check[2]
 	return git_checkattr(path, 2, check);
 }
 
+static void normalize_file(mmfile_t *mm, const char *path) {
+	struct strbuf strbuf = STRBUF_INIT;
+	if (convert_to_git(path, mm->ptr, mm->size, &strbuf, 0)) {
+		free(mm->ptr);
+		mm->size = strbuf.len;
+		mm->ptr = strbuf_detach(&strbuf, NULL);
+	}
+}
+
 int ll_merge(mmbuffer_t *result_buf,
 	     const char *path,
 	     mmfile_t *ancestor, const char *ancestor_label,
@@ -334,6 +343,9 @@ int ll_merge(mmbuffer_t *result_buf,
 	const struct ll_merge_driver *driver;
 	int virtual_ancestor = flag & 01;
 
+	normalize_file(ancestor, path);
+	normalize_file(ours, path);
+	normalize_file(theirs, path);
 	if (!git_path_check_merge(path, check)) {
 		ll_driver_name = check[0].value;
 		if (check[1].value) {
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
new file mode 100755
index 0000000..6af2c41
--- /dev/null
+++ b/t/t6038-merge-text-auto.sh
@@ -0,0 +1,54 @@
+#!/bin/sh
+
+test_description='CRLF merge conflict across text=auto change'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	git config core.autocrlf false &&
+	echo first line | append_cr >file &&
+	git add file &&
+	git commit -m "Initial" &&
+	git tag initial &&
+	git branch side &&
+	echo "* text=auto" >.gitattributes &&
+	git add .gitattributes &&
+	echo same line | append_cr >>file &&
+	git add file &&
+	git commit -m "add line from a" &&
+	git tag a &&
+	git rm .gitattributes &&
+	rm file &&
+	git checkout file &&
+	git commit -m "remove .gitattributes" &&
+	git tag c &&
+	git checkout side &&
+	echo same line | append_cr >>file &&
+	git commit -m "add line from b" file &&
+	git tag b &&
+	git checkout master
+'
+
+test_expect_success 'Check merging after setting text=auto' '
+	git reset --hard a &&
+	git merge b &&
+	cat file | remove_cr >file.temp &&
+	test_cmp file file.temp
+'
+
+test_expect_success 'Check merging addition of text=auto' '
+	git reset --hard b &&
+	git merge a &&
+	cat file | remove_cr >file.temp &&
+	test_cmp file file.temp
+'
+
+# Not sure if this deserves to be fixed
+test_expect_failure 'Check merging removal of text=auto' '
+	git reset --hard b &&
+	git merge c &&
+	cat file | remove_cr >file.temp &&
+	test_cmp file file.temp
+'
+
+test_done
-- 
1.7.1.5.g0ed10.dirty

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] ll-merge: Normalize files before merging
  2010-06-10 20:48 [RFC] ll-merge: Normalize files before merging Eyvind Bernhardsen
@ 2010-06-11  5:49 ` Johannes Sixt
  2010-06-11  7:34   ` Eyvind Bernhardsen
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Sixt @ 2010-06-11  5:49 UTC (permalink / raw)
  To: Eyvind Bernhardsen; +Cc: git

Am 6/10/2010 22:48, schrieb Eyvind Bernhardsen:
> Currently, merging across changes in line ending normalization is
> painful since all lines containing CRLF will conflict uselessly.
> 
> Fix ll-merge so that the "base", "theirs" and "ours" files are passed
> through convert_to_git() before a three-way merge.  This prevents
> differences that can be normalized away from blocking an automatic
> merge.

I think you are going overboard here. Normalization should only happen
only for data that moves from the worktree to the database. But during a
merge, at most one part can come from the worktree, methinks; you are
normalizing all three of them, though.

> This patch has already been useful to me, but I'm not sure it is the
> best possible solution to the problem (especially in terms of
> efficiency), hence the RFC.
> 
> Note that clean and ident filters will also be run, which might be a
> good thing.  Also, the tests require my crlf/text series from pu.
> --
> Eyvind

Please do not put a dash-dash-blank line before the patch; Thunderbird
takes it as the beginning of the signature and truncates the message in
the reply.

-- Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] ll-merge: Normalize files before merging
  2010-06-11  5:49 ` Johannes Sixt
@ 2010-06-11  7:34   ` Eyvind Bernhardsen
  2010-06-11  7:51     ` Johannes Sixt
  0 siblings, 1 reply; 8+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-11  7:34 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On 11. juni 2010 07:49, Johannes Sixt wrote:
> Am 6/10/2010 22:48, schrieb Eyvind Bernhardsen:
>> Currently, merging across changes in line ending normalization is
>> painful since all lines containing CRLF will conflict uselessly.
>>
>> Fix ll-merge so that the "base", "theirs" and "ours" files are passed
>> through convert_to_git() before a three-way merge.  This prevents
>> differences that can be normalized away from blocking an automatic
>> merge.
>
> I think you are going overboard here. Normalization should only happen
> only for data that moves from the worktree to the database. But during a
> merge, at most one part can come from the worktree, methinks; you are
> normalizing all three of them, though.

Well, that's sort of the point.  All three are normalized to (hopefully) 
minimize the differences between them, increasing the chance of a 
successful merge.

The problem I'm trying to solve is that "base" and "theirs" were not 
normalized when the files were added to the database, and this causes 
unnecessary conflicts with "ours".  Normalizing "ours" allows merging 
the other way to work, too.

It's a brute-force method, and there may be a smarter way, but it works 
for me.

[...]

> Please do not put a dash-dash-blank line before the patch; Thunderbird
> takes it as the beginning of the signature and truncates the message in
> the reply.

Oops, sorry!
-- 
Eyvind

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] ll-merge: Normalize files before merging
  2010-06-11  7:34   ` Eyvind Bernhardsen
@ 2010-06-11  7:51     ` Johannes Sixt
  2010-06-11  8:36       ` Finn Arne Gangstad
  2010-06-11 19:44       ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Johannes Sixt @ 2010-06-11  7:51 UTC (permalink / raw)
  To: Eyvind Bernhardsen; +Cc: git

Am 6/11/2010 9:34, schrieb Eyvind Bernhardsen:
> On 11. juni 2010 07:49, Johannes Sixt wrote:
>> I think you are going overboard here. Normalization should only happen
>> only for data that moves from the worktree to the database. But during a
>> merge, at most one part can come from the worktree, methinks; you are
>> normalizing all three of them, though.
> 
> Well, that's sort of the point.  All three are normalized to (hopefully)
> minimize the differences between them, increasing the chance of a
> successful merge.

I know what your point is. It is still inappropriate to call
normalize_file() on data that comes from the repository. It is not the
task of a merge procedure to blindly normalize data.

-- Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] ll-merge: Normalize files before merging
  2010-06-11  7:51     ` Johannes Sixt
@ 2010-06-11  8:36       ` Finn Arne Gangstad
  2010-06-11  8:47         ` Eyvind Bernhardsen
  2010-06-11 19:44       ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Finn Arne Gangstad @ 2010-06-11  8:36 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Eyvind Bernhardsen, git

On Fri, Jun 11, 2010 at 09:51:41AM +0200, Johannes Sixt wrote:
> Am 6/11/2010 9:34, schrieb Eyvind Bernhardsen:
> > On 11. juni 2010 07:49, Johannes Sixt wrote:
> >> I think you are going overboard here. Normalization should only happen
> >> only for data that moves from the worktree to the database. But during a
> >> merge, at most one part can come from the worktree, methinks; you are
> >> normalizing all three of them, though.
> > 
> > Well, that's sort of the point.  All three are normalized to (hopefully)
> > minimize the differences between them, increasing the chance of a
> > successful merge.
> 
> I know what your point is. It is still inappropriate to call
> normalize_file() on data that comes from the repository. It is not the
> task of a merge procedure to blindly normalize data.

I don't think this argument holds. If git doesn't call
convert_to_git() automatically and you get a merge conflict, git WILL
call convert_to_git() on the result anyway when you add it, so it
can't be that horrible that this happens automatically for you?

If you add something to .gitattributes that causes the repository
representation of a file to change, and then try to merge an older
branch, isn't it more helpful if it works than if it fails miserably?

It would probably be more correct to do

conert_to_git(convert_to_work_tree(x)) for the parts taken from base
and theirs, in theory I guess that this can be true:

convert_to_git(x) != convert_to_git(convert_to_work_tree(x))

- Finn Arne

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] ll-merge: Normalize files before merging
  2010-06-11  8:36       ` Finn Arne Gangstad
@ 2010-06-11  8:47         ` Eyvind Bernhardsen
  0 siblings, 0 replies; 8+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-11  8:47 UTC (permalink / raw)
  To: Finn Arne Gangstad; +Cc: Johannes Sixt, git

On 11. juni 2010 10:36, Finn Arne Gangstad wrote:

> It would probably be more correct to do
>
> conert_to_git(convert_to_work_tree(x)) for the parts taken from base
> and theirs, in theory I guess that this can be true:
>
> convert_to_git(x) != convert_to_git(convert_to_work_tree(x))

Interesting idea, might be a bit slow perhaps?  I'll do some 
benchmarketing.  Incidentally, it has to convert_to_work_tree() "ours" 
as well, since the merge code doesn't read anything from the worktree.

I think convert_to_git(convert_to_work_tree(x)) is actually less likely 
to cause trouble than convert_to_git(convert_to_git(x)), which is what 
blind normalization would end up doing for at least one side of the merge.

- Eyvind

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] ll-merge: Normalize files before merging
  2010-06-11  7:51     ` Johannes Sixt
  2010-06-11  8:36       ` Finn Arne Gangstad
@ 2010-06-11 19:44       ` Junio C Hamano
  2010-06-11 20:56         ` Eyvind Bernhardsen
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2010-06-11 19:44 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Eyvind Bernhardsen, git

Johannes Sixt <j.sixt@viscovery.net> writes:

> Am 6/11/2010 9:34, schrieb Eyvind Bernhardsen:
>> On 11. juni 2010 07:49, Johannes Sixt wrote:
>>> I think you are going overboard here. Normalization should only happen
>>> only for data that moves from the worktree to the database. But during a
>>> merge, at most one part can come from the worktree, methinks; you are
>>> normalizing all three of them, though.
>> 
>> Well, that's sort of the point.  All three are normalized to (hopefully)
>> minimize the differences between them, increasing the chance of a
>> successful merge.
>
> I know what your point is. It is still inappropriate to call
> normalize_file() on data that comes from the repository. It is not the
> task of a merge procedure to blindly normalize data.

It is not "blindly", but "running normalization one _extra time_, as the
repository data is supposed to be canonical already", which is utterly
wrong.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] ll-merge: Normalize files before merging
  2010-06-11 19:44       ` Junio C Hamano
@ 2010-06-11 20:56         ` Eyvind Bernhardsen
  0 siblings, 0 replies; 8+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-11 20:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git

On 11. juni 2010, at 21.44, Junio C Hamano wrote:

> Johannes Sixt <j.sixt@viscovery.net> writes:
> 
>> I know what your point is. It is still inappropriate to call
>> normalize_file() on data that comes from the repository. It is not the
>> task of a merge procedure to blindly normalize data.
> 
> It is not "blindly", but "running normalization one _extra time_, as the
> repository data is supposed to be canonical already", which is utterly
> wrong.

I agree that double normalization is evil, but the repository data isn't necessarily canonical if the configuration has changed since the data was added.

How do you feel about Finn Arne's idea of first convert_to_work_tree()-ing the data, then convert_to_git()ing it back?  That gets rid of the double normalization at the cost of some performance and memory usage (especially with CRLF output enabled).  I'm going to do some benchmarks to go along with my next stab at this.

- Eyvind

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-06-11 20:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-10 20:48 [RFC] ll-merge: Normalize files before merging Eyvind Bernhardsen
2010-06-11  5:49 ` Johannes Sixt
2010-06-11  7:34   ` Eyvind Bernhardsen
2010-06-11  7:51     ` Johannes Sixt
2010-06-11  8:36       ` Finn Arne Gangstad
2010-06-11  8:47         ` Eyvind Bernhardsen
2010-06-11 19:44       ` Junio C Hamano
2010-06-11 20:56         ` Eyvind Bernhardsen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).