git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Clemens Buchacher <drizzd@aon.at>
To: Jeff King <peff@peff.net>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	git@vger.kernel.org, "Jakub Narebski" <jnareb@gmail.com>,
	"Ted Ts'o" <tytso@mit.edu>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH 2/5] add object-cache infrastructure
Date: Tue, 12 Jul 2011 23:07:16 +0200	[thread overview]
Message-ID: <20110712210716.GB17322@toss.lan> (raw)
In-Reply-To: <20110712194540.GA21180@sigill.intra.peff.net>

On Tue, Jul 12, 2011 at 03:45:40PM -0400, Jeff King wrote:
>
> It has been a long time since I've looked at darcs, but from my
> recollection, it will only work with specific patch types. That is, it
> works if B and C are commutative. For text patches that touch the same
> area, that is not the case. But if "B" were a token-renaming patch, for
> example, I think it might work.

If they were commutative, we would not have a problem in git
either.

> Anyway, that is not really relevant to git. I think we decided long ago
> that being simple and stupid about the content changes (i.e., blob A
> became blob B) is better in general, even when there are a few corner
> cases that might have been better off the other way.

Yes, but that only applies to git merge. When we talk about
rebasing we are looking at individual patches rather than a single
global merge. For rebase I think "patch algebra" is very relevant,
and we have already implemented a simple patch algebra with
patch-id's.

> > It does have better granularity when detecting changes. For
> > example, it will recognize the changes of B' in B, even if B
> > contains non-conflicting hunks on top of the changes in B'. Git
> > only recognizes identical commits, and this is something where we
> > could improve without too much difficulty (think per-hunk
> > patch-ids).
> 
> I'd be curious to see an example worked out. In my experience, even if
> something like patch-ids don't match, it's not a big deal for the hunks
> that do match, because when we get to the actual content merge, we will
> realize that both sides made the same change to that hunk.  So it's not
> like you are getting unrelated conflicts; whatever small part of the
> diff made the patch-id different will be the part where you get the
> conflict, and the should merge cleanly.

I am reading that last part as "they should not merge cleanly". And
in general I agree. We have to resolve the conflict manually and
it's just a question of how the conflict is presented and resolved.
This is already being discussed some in a different branch of this
thread.

> Having said something so general, I'm sure there is probably some corner
> case that proves me wrong.

Exactly. The case I am talking about is where the patch-id's are
different but there are no conflicts. I have worked out an example
for git and darcs. Below are two scripts to demonstrate. In the
example, the patch-id is different because upstream changes the
patch in a way that does not conflict with the original patch. It
simply adds another change that goes into a different hunk. Git
fails to merge cleanly because the patch-id's are different. It
presents the user with an awkward conflict that looks like a
revert. Darcs, on the other hand, merges cleanly. It recognizes the
fact that all changes from the original patch are contained
upstream and do not conflict with the upstream version. The fact
that more changes are added on top does not bother darcs.

Now, one might argue that this is a corner case. But it's actually
very common. In the example, the patch-id changes because of an
extra change in a different text area. That is indeed unlikely.
However, the same problem will occur in a much more common case.
Let's say we have a patch with 10 hunks. The patch is applied
upstream, with only one difference in one of the hunks.
Subsequently, text areas affected by any of the other hunks change
upstream. When the original patch is rebased on top of that, it
will conflict with the one hunk that was changed in the upstream
version of that patch. And that's ok. Git should not decide which
version is correct. But in addition to that conflict there will
also be conflicts for all the other hunks, which the upstream patch
did _not_ modify. And all of those conflicts will look like
reverts.

I believe that is the main reason why rebase is so painful all the
time.

But I am not saying that we necessarily need a finer granularity of
patch-id's. If we rebase the patch on top of its upstream version
before rebasing it to the upstream head, as was already suggested
elsewhere, the problem described here will also go away on its own.

Clemens

---

#!/bin/sh
#
# Darcs recognizes matching upstream changes
#

testdir=test-darcs

mkdir "$testdir" || exit 1

(
	cd "$testdir"
	mkdir master
	cd master
	darcs init

	for line in $(seq 20)
	do
		echo $line >>file
	done

	darcs add file
	darcs record -a -m initial

	cd ..
	darcs get master side
	cd side
	sed -i '5 s/^.*$/original change/' file
	darcs record -a -m 'original change'

	cd ../master
	sed -i '5 s/.*/original change/' file
	sed -i '15 s/^.*$/with an extra hunk/' file
	darcs record -a -m 'original change'

	sed -i '5 s/.*/modified change/' file
	darcs record -a -m 'modified change'

	darcs pull -a ../side
)

#!/bin/sh
#
# Git does not recognize matching upstream changes
#

testdir=test-darcs

mkdir "$testdir" || exit 1

(
	cd "$testdir"
	mkdir master
	cd master
	darcs init

	for line in $(seq 20)
	do
		echo $line >>file
	done

	darcs add file
	darcs record -a -m initial

	cd ..
	darcs get master side
	cd side
	sed -i '5 s/^.*$/original change/' file
	darcs record -a -m 'original change'

	cd ../master
	sed -i '5 s/.*/original change/' file
	sed -i '15 s/^.*$/with an extra hunk/' file
	darcs record -a -m 'original change'

	sed -i '5 s/.*/modified change/' file
	darcs record -a -m 'modified change'

	darcs pull -a ../side
)

  reply	other threads:[~2011-07-12 21:07 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-11 16:13 [RFC/PATCH 0/5] generation numbers for faster traversals Jeff King
2011-07-11 16:16 ` [PATCH 1/5] decorate: allow storing values instead of pointers Jeff King
2011-07-11 16:39   ` Jeff King
2011-07-11 19:06   ` Junio C Hamano
2011-07-11 21:20     ` Jeff King
2011-07-11 16:17 ` [PATCH 2/5] add object-cache infrastructure Jeff King
2011-07-11 16:46   ` Jeff King
2011-07-11 16:58     ` Shawn Pearce
2011-07-11 19:17   ` Junio C Hamano
2011-07-11 22:01     ` Jeff King
2011-07-11 23:21       ` Junio C Hamano
2011-07-11 23:42         ` Junio C Hamano
2011-07-12  0:03         ` Jeff King
2011-07-12 19:38           ` Clemens Buchacher
2011-07-12 19:45             ` Jeff King
2011-07-12 21:07               ` Clemens Buchacher [this message]
2011-07-12 21:15                 ` Clemens Buchacher
2011-07-12 21:36                 ` Jeff King
2011-07-14  8:04                   ` Clemens Buchacher
2011-07-14 16:26                     ` Illia Bobyr
2011-07-13  1:33                 ` John Szakmeister
2011-07-12  0:14         ` Illia Bobyr
2011-07-12  5:35           ` Jeff King
2011-07-12 21:52             ` Illia Bobyr
2011-07-12  6:36       ` Miles Bader
2011-07-12 10:41   ` Jakub Narebski
2011-07-12 17:57     ` Jeff King
2011-07-12 18:41       ` Junio C Hamano
2011-07-13  6:37         ` Jeff King
2011-07-13 17:49           ` Junio C Hamano
2011-07-11 16:18 ` [PATCH 3/5] commit: add commit_generation function Jeff King
2011-07-11 17:57   ` Clemens Buchacher
2011-07-11 21:10     ` Jeff King
2011-07-11 16:18 ` [PATCH 4/5] pretty: support %G to show the generation number of a commit Jeff King
2011-07-11 16:18 ` [PATCH 5/5] limit "contains" traversals based on commit generation Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110712210716.GB17322@toss.lan \
    --to=drizzd@aon.at \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).