From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: Merge with git-pasky II. Date: Fri, 15 Apr 2005 03:22:26 -0700 Message-ID: <7vwtr4ibkt.fsf@assigned-by-dhcp.cox.net> References: <7vfyxtsurd.fsf@assigned-by-dhcp.cox.net> <7v64ypsqev.fsf@assigned-by-dhcp.cox.net> <7vvf6pr4oq.fsf@assigned-by-dhcp.cox.net> <20050414121624.GZ25711@pasky.ji.cz> <7vll7lqlbg.fsf@assigned-by-dhcp.cox.net> <20050414193507.GA22699@pasky.ji.cz> <7vmzs1osv1.fsf@assigned-by-dhcp.cox.net> <20050414233159.GX22699@pasky.ji.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Linus Torvalds , git@vger.kernel.org X-From: git-owner@vger.kernel.org Fri Apr 15 12:20:02 2005 Return-path: Received: from vger.kernel.org ([12.107.209.244]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DMNuu-0001ZA-Jh for gcvg-git@gmane.org; Fri, 15 Apr 2005 12:19:17 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261799AbVDOKWm (ORCPT ); Fri, 15 Apr 2005 06:22:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261800AbVDOKWm (ORCPT ); Fri, 15 Apr 2005 06:22:42 -0400 Received: from fed1rmmtao06.cox.net ([68.230.241.33]:22452 "EHLO fed1rmmtao06.cox.net") by vger.kernel.org with ESMTP id S261799AbVDOKW2 (ORCPT ); Fri, 15 Apr 2005 06:22:28 -0400 Received: from assigned-by-dhcp.cox.net ([68.4.60.172]) by fed1rmmtao06.cox.net (InterMail vM.6.01.04.00 201-2131-118-20041027) with ESMTP id <20050415102227.FJSH1497.fed1rmmtao06.cox.net@assigned-by-dhcp.cox.net>; Fri, 15 Apr 2005 06:22:27 -0400 To: Petr Baudis In-Reply-To: <20050414233159.GX22699@pasky.ji.cz> (Petr Baudis's message of "Fri, 15 Apr 2005 01:31:59 +0200") User-Agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org After I re-read [*R1*], in which Linus talks about dircache, especially this section: - The "current directory cache" describes some baseline. In particular, note the "some" part. It's not tied to any special baseline, and you can change your baseline any way you please. So it does NOT have to track any particular state in either the object database _or_ in your actual current working tree. In fact, all real interactions with "git" are really about updating this staging area one way or the other: you might check out the state from it into your working area (partially or fully), you can push your working area into the staging area (again, partially or fully). And if you want to, you can write the thing that the staging area represents as a "tree" into the object database, or you can merge a tree from the object database into the staging area. In other words: the staging area aka "current directory cache" is really how all interaction takes place. The object database never interacts directly with your working directory contents. ALL interactions go through the current directory cache. I started to have more doubts on the approach of *not* performing the merge in the dircache I set up specifically for merging, which is the direction in which you are pushing if I understand you correctly. Maybe I completely misunderstand what you want. This message is long but I need a clear understanding of what is expected to be useful to you, so please bear with me. PB> merge-tree.pl -b $base $(tree-id) $merged | parse-your-output Please help me understand this example you have given earlier. Here is my understanding of your assumption when the above pipeline takes place. Correct me if I am mistaken. * The user is in a working directory $W. It is controlled by git-tools and there are $W/.git/. directory and $W/.git/index dircache. * The dircache $W/.git/index started its life as a read-tree from some commit. The git-tools is keeping track of which commit it is somewhere, presumably in $W/.git/ directory. Let's call it $C (commit). ? Question. Is the $(tree-id) in your example the same as $C above? * The user have run [*1*] (see Footnote below) checkout-cache on $W/.git/index some time in the past and $W is full of working files. Some of them may or may not have modified. There may be some additions or deletions. So the contents of the working directory may not match the tree associated with $C. * The user may or may not have run [*1*] update-cache in $W. The contents of the dircache $W/.git/index may not match the tree associated with $C. ? Question. Are you forbidding the user to run update-cache by hand, and keeping track of the changes yourself, to be applied all at once at "git commit" time, thereby guaranteeing the $W/.git/index to match the tree associated with $C all times? From the description of The "GIT toolkit" section in README, it is not clear to me which part of his repository an end user is not supposed to muck with himself. * Now the user has some changes in his working directory and notices upstream or a side branch has notable changes desireble to be picked up. So he runs some git-tools command to cause the above quoted pipeline to run. ? Question. Does $merged in your example mean such an upstream or side branch? Is $base in your example the common ancestor between $C and $merged? Assuming that my above understanding of your model is correct, here are my "thinking aloud". - "merge-trees $base $C $merged" looks only at the git object database for those three trees named. The data structure of git object database is optimized to distinguish differences in those recorded trees (and hence recorded blobs they point at) without unpacking most of the files if the changes are small, because all the blobs involved are already hashed. It is not very good at comparing things in git object store and working files in random states, which would involve unpacking blobs and comparing, so "merge-trees" does not bother. - What can come out from merge-trees is therefore one of the following for each path from the union of paths contained in $base, $C, and $merged: (a) Neither $C nor $merged changed it --- merge result is what is in $C. (b) $C changed it but $merged did not --- merge result is what is in $C. (c) Both $C and $merged changed it in the same way --- merge result is what is in $C. (d) $C did not change it but $merged did --- merge result is what is in $merged. (e) Both $C and $merged changed it differently --- merge is needed and automatically succeeds between $C and $merge. (f) Both $C and $merged changed it differently --- merge is needed but have conflicts. - Assuming we are dealing with the case where working files are dirty and do not match what is in $C, among the above, (a)-(c) can be ignored by SCM. What the user has in his working files is exactly what he would have got if he started working from the merge result, although in reality the work was started from $C. Handling (d), (e) and (f) from SCM's point of view would be the same. They all involve 3-way merges between the file in the working directory, and the file from $merged, pivoting on the file from $base. In order to help SCM, merge-trees therefore should output SHA1 of blobs for such a file from $base and $merged and expect SCM to run "cat-file blob" on them and then merge or diff3. Up to the point of giving those two SHA1 out is the business of merge-trees and after that it is up to SCM. That would work. So I should base the design of output from merge-trees on the above analysis, which probably needs to be extended to cover differences between creation, modification, and deletion. - However, the above is quite different from the way Linus envisioned initially, on which my current implementation is based [*3*]. My current implementation is to record the merge outcome in the temporary dircache $W/,,merge/.git/index for cases (a)-(e). The last case (f) is problematic and needs human validation [*2*], so it is not recorded in that temporary dircache, but the files to be merged are left in that temporary directory and merge-trees stops there. It is expected that the end-user or SCM would merge the resulting file and run update-cache to update $W/,,merge/.git/index. After that happens, $W/,,merge/.git/index has the tree representing the desired result of the merge. It is expected that the end-user or SCM would write-tree, commit-tree there in the temporary directory, creating a new commit $C1. Then, it is expected that the SCM would make a patch file between $C and the user working directory, checks out $C1 (either in the user's working directory or another temporary directory; at this point merge-trees does not care because it has already done its job and exited), applies that patch to bring the user edits over to $C1. Then that directory would contain the desired merge of user edits. That is my understanding of how Linus originally wanted the tool to do his kernel work with to work. My hesitation to suggestions from you to change it not to keep its own merge dircache is coming from here. Not doing what I am currently doing to $W/,,merge/.git/index dircache would mean that SCM would have to do more, not less, to arrive at $C1 (the result of the clean $merge and $C merge pivoted at $base), where the real SCM merge begins. Although I suspect I am misunderstanding what you want, your messages so far suggest that what you want might be quite different from what Linus wants. Please do not misunderstand what I mean by saying this. I am not saying that Linus is always right [*4*] and therefore you are wrong for wanting something else. It is just that, if what I started writing needs to support both of those quite different needs, I need to know what they are. I think I understand what Linus wants well enough [*5*], but I am not certain about yours. [Footnotes] *1* By "The user have run" I mean either the user directly used the low-level plumbing command himself, or used git-tools to cause such command to run. *2* Strictly speaking, case (e) needs human validation as well, because successful textual merge does not guarantee sensible semantic merge. *3* See [*R2*] for descriptions on the way Linus wanted merge in git to happen. Especially around "5) At this point you need to MERGE" onwards. The current implementation handles (or attempts to handle) the `your working directory was fully committed' case described there. *4* According to Linus himself, he is always right ;-). [*R3*] *5* I consider [*R1*] and [*R2*] essential read for anybody wanting to understand merging operation in git object model (I am saying this for others; not for Pasky --- it would be like preaching to the choir ;-)). [References] *R1* http://marc.theaimsgroup.com/?i=%3CPine.LNX.4.58.0504110928360.1267%20()%20ppc970%20!%20osdl%20!%20org%3E *R2* http://marc.theaimsgroup.com/?i=%3CPine.LNX.4.58.0504121606580.4501%20()%20ppc970%20!%20osdl%20!%20org%3E *R3* http://www.uwsg.indiana.edu/hypermail/linux/kernel/0008.3/0555.html