From mboxrd@z Thu Jan 1 00:00:00 1970 From: Clemens Buchacher Subject: Re: [PATCH 2/5] add object-cache infrastructure Date: Tue, 12 Jul 2011 23:07:16 +0200 Message-ID: <20110712210716.GB17322@toss.lan> References: <20110711161332.GA10057@sigill.intra.peff.net> <20110711161754.GB10418@sigill.intra.peff.net> <7vliw4d1hu.fsf@alter.siamese.dyndns.org> <20110711220107.GC30155@sigill.intra.peff.net> <7vk4bo9ze5.fsf@alter.siamese.dyndns.org> <20110712000304.GA32276@sigill.intra.peff.net> <20110712193844.GA17322@toss.lan> <20110712194540.GA21180@sigill.intra.peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Junio C Hamano , git@vger.kernel.org, Jakub Narebski , Ted Ts'o , Jonathan Nieder , =?iso-8859-1?Q?=C6var_Arnfj=F6r=F0?= Bjarmason To: Jeff King X-From: git-owner@vger.kernel.org Tue Jul 12 23:07:40 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QgkB9-0002QE-AI for gcvg-git-2@lo.gmane.org; Tue, 12 Jul 2011 23:07:39 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755619Ab1GLVH0 (ORCPT ); Tue, 12 Jul 2011 17:07:26 -0400 Received: from bsmtp3.bon.at ([213.33.87.17]:44519 "EHLO bsmtp.bon.at" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755613Ab1GLVHW (ORCPT ); Tue, 12 Jul 2011 17:07:22 -0400 Received: from localhost (unknown [80.123.242.182]) by bsmtp.bon.at (Postfix) with ESMTP id 0F592130050; Tue, 12 Jul 2011 23:07:17 +0200 (CEST) Content-Disposition: inline In-Reply-To: <20110712194540.GA21180@sigill.intra.peff.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, Jul 12, 2011 at 03:45:40PM -0400, Jeff King wrote: > > It has been a long time since I've looked at darcs, but from my > recollection, it will only work with specific patch types. That is, it > works if B and C are commutative. For text patches that touch the same > area, that is not the case. But if "B" were a token-renaming patch, for > example, I think it might work. If they were commutative, we would not have a problem in git either. > Anyway, that is not really relevant to git. I think we decided long ago > that being simple and stupid about the content changes (i.e., blob A > became blob B) is better in general, even when there are a few corner > cases that might have been better off the other way. Yes, but that only applies to git merge. When we talk about rebasing we are looking at individual patches rather than a single global merge. For rebase I think "patch algebra" is very relevant, and we have already implemented a simple patch algebra with patch-id's. > > It does have better granularity when detecting changes. For > > example, it will recognize the changes of B' in B, even if B > > contains non-conflicting hunks on top of the changes in B'. Git > > only recognizes identical commits, and this is something where we > > could improve without too much difficulty (think per-hunk > > patch-ids). > > I'd be curious to see an example worked out. In my experience, even if > something like patch-ids don't match, it's not a big deal for the hunks > that do match, because when we get to the actual content merge, we will > realize that both sides made the same change to that hunk. So it's not > like you are getting unrelated conflicts; whatever small part of the > diff made the patch-id different will be the part where you get the > conflict, and the should merge cleanly. I am reading that last part as "they should not merge cleanly". And in general I agree. We have to resolve the conflict manually and it's just a question of how the conflict is presented and resolved. This is already being discussed some in a different branch of this thread. > Having said something so general, I'm sure there is probably some corner > case that proves me wrong. Exactly. The case I am talking about is where the patch-id's are different but there are no conflicts. I have worked out an example for git and darcs. Below are two scripts to demonstrate. In the example, the patch-id is different because upstream changes the patch in a way that does not conflict with the original patch. It simply adds another change that goes into a different hunk. Git fails to merge cleanly because the patch-id's are different. It presents the user with an awkward conflict that looks like a revert. Darcs, on the other hand, merges cleanly. It recognizes the fact that all changes from the original patch are contained upstream and do not conflict with the upstream version. The fact that more changes are added on top does not bother darcs. Now, one might argue that this is a corner case. But it's actually very common. In the example, the patch-id changes because of an extra change in a different text area. That is indeed unlikely. However, the same problem will occur in a much more common case. Let's say we have a patch with 10 hunks. The patch is applied upstream, with only one difference in one of the hunks. Subsequently, text areas affected by any of the other hunks change upstream. When the original patch is rebased on top of that, it will conflict with the one hunk that was changed in the upstream version of that patch. And that's ok. Git should not decide which version is correct. But in addition to that conflict there will also be conflicts for all the other hunks, which the upstream patch did _not_ modify. And all of those conflicts will look like reverts. I believe that is the main reason why rebase is so painful all the time. But I am not saying that we necessarily need a finer granularity of patch-id's. If we rebase the patch on top of its upstream version before rebasing it to the upstream head, as was already suggested elsewhere, the problem described here will also go away on its own. Clemens --- #!/bin/sh # # Darcs recognizes matching upstream changes # testdir=test-darcs mkdir "$testdir" || exit 1 ( cd "$testdir" mkdir master cd master darcs init for line in $(seq 20) do echo $line >>file done darcs add file darcs record -a -m initial cd .. darcs get master side cd side sed -i '5 s/^.*$/original change/' file darcs record -a -m 'original change' cd ../master sed -i '5 s/.*/original change/' file sed -i '15 s/^.*$/with an extra hunk/' file darcs record -a -m 'original change' sed -i '5 s/.*/modified change/' file darcs record -a -m 'modified change' darcs pull -a ../side ) #!/bin/sh # # Git does not recognize matching upstream changes # testdir=test-darcs mkdir "$testdir" || exit 1 ( cd "$testdir" mkdir master cd master darcs init for line in $(seq 20) do echo $line >>file done darcs add file darcs record -a -m initial cd .. darcs get master side cd side sed -i '5 s/^.*$/original change/' file darcs record -a -m 'original change' cd ../master sed -i '5 s/.*/original change/' file sed -i '15 s/^.*$/with an extra hunk/' file darcs record -a -m 'original change' sed -i '5 s/.*/modified change/' file darcs record -a -m 'modified change' darcs pull -a ../side )