git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] A new merge algorithm, take 3
@ 2005-09-07 16:47 Fredrik Kuivinen
  2005-09-07 16:50 ` [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory Fredrik Kuivinen
                   ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-07 16:47 UTC (permalink / raw)
  To: git; +Cc: junkio

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]

Hi,

Here is the new version of the merge algorithm patch.

The major changes compared to the previous patch are:

* No more messing around with merge-cache. git-ls-files used to get
  the unmerged entries instead.
* The python code is now contained in two files, git-merge-script and
  gitMergeCommon.py.
* The user interface is identical to the interface provided by
  git-resolve-script
* In the non-clean case the unmerged cache entries will not be
  removed from the cache.

I have also attached a test script which can redo every merge in a
repository with both git-resolve-script and git-merge-script. It will
report any non-clean merges and non-identical results for clean
merges. Do _not_ use this script in repositories you care about. It
calls 'git reset --hard' repeatedly and will probably not leave the
repository in its original state when it's done.

Of the 500 merge commits that currently exists in the kernel
repository 19 produces non-clean merges with git-merge-script. The
four merge cases listed in
<20050827014009.GB18880@c165.ib.student.liu.se> are cleanly merged by
git-merge-script. Every merge commit which is cleanly merged by
git-resolve-script is also cleanly merged by git-merge-script,
furthermore the results are identical. There are currently two merges
in the kernel repository which are not cleanly merged by
git-resolve-script but are cleanly merged by git-merge-script.

I guess the need for this has decreased with Daniel's new read-tree
code. Is there any chance of getting this code merged into mainline
git?

- Fredrik

[-- Attachment #2: testRepo.py --]
[-- Type: text/x-python, Size: 4684 bytes --]

#!/usr/bin/env python

import sys, math, random, os, re, signal, tempfile, time
from heapq import heappush, heappop
from sets import Set
from gitMergeCommon import *

def mergeMerge(a, b):
    print 'Running merge-script HEAD', b.sha, '...'
    [out, code] = runProgram(['git-merge-script', 'HEAD', b.sha, 'merge message'],
                             returnCode=True, pipeOutput=False)
    if code == 0:
        return True
    else:
        return False
    
def gitResolveMerge(a, b):
    print 'Running git resolve HEAD', b.sha, '...'
    [out, code] = runProgram(['git', 'resolve', 'HEAD', b.sha, 'merge message'],
                             returnCode=True, pipeOutput=False)

    if code == 0:
        return True
    else:
        return False

def doWork(graph, commits):
    print 'commits:', repr(commits)
    result = []
    totalMergeTime = 0
    totalResolveTime = 0
    numCommits = 0
    try:
        for commit in graph.commits:
            if len(commits) > 0 and not (commit.sha in commits):
                continue

            if len(commit.parents) > 1:
                res = commit.sha + ' : '
                if len(commit.parents) == 2:
                    numCommits += 1
                    print '---------------------------------------'
                    print 'Testing commit', commit.sha, '(tree)', commit.tree()
                    a = commit.parents[0]
                    b = commit.parents[1]

                    runProgram(['git-reset-script', '--hard', a.sha])
                    print 'Running git resolve...'
                    stdout.flush()
                    startTime = time.time()
                    resResolve = gitResolveMerge(a, b)
                    timeResolve = time.time() - startTime
                    totalResolveTime += timeResolve
                    
                    if resResolve:
                        resolveHead = Commit(runProgram(['git-rev-parse', '--verify', 'HEAD']).rstrip(), [a, b])

                    runProgram(['git-reset-script', '--hard', a.sha])
                    print 'Running merge...'
                    stdout.flush()
                    startTime = time.time()
                    resMerge = mergeMerge(a, b)
                    timeMerge = time.time() - startTime
                    totalMergeTime += timeMerge
                    
                    if resMerge:
                        mergeHead = Commit(runProgram(['git-rev-parse', '--verify', 'HEAD']).rstrip(), [a, b])

                    res += 'time r: ' + str(int(timeResolve)) + ' m: ' + str(int(timeMerge)) + '\t'
                    if resResolve and resMerge:
                        if resolveHead.tree() == mergeHead.tree():
                            res += 'Identical result'
                        else:
                            res += 'Non-identical results! resolve: ' + resolveHead.sha + \
                                   ' merge: ' + mergeHead.sha
                    else:
                        if resResolve:
                            res += 'resolve succeeded (' + resolveHead.sha + '), '
                        else:
                            res += 'resolve failed, '

                        if resMerge:
                            res += 'merge succeeded (' + mergeHead.sha + ')'
                        else:
                            res += 'merge failed'
                else:
                    res += 'Ignoring octupus merge'

                print res
                result.append(res)
                stdout.flush()
    finally:
        print '\n\n\nResults:'
        for r in result:
            print r
        print 'Avg resolve time:', float(totalResolveTime) / numCommits
        print 'Avg merge time:', float(totalMergeTime) / numCommits

def writeHead(head, sha):
    if sha[-1] != '\n':
        sha += '\n'

    try:
        f = open(os.environ['GIT_DIR'] + '/' + head, 'w')
        f.write(sha)
        f.close()
    except IOError, e:
        print 'Failed to write to', os.environ['GIT_DIR'] + '/' + head + ':', e.strerror
        sys.exit(1)
    return True

stdout = sys.stdout
setupEnvironment()
repoValid()

head = runProgram(['git-rev-parse', '--verify', 'HEAD^0']).rstrip()
print 'Building graph...'
stdout.flush()
graph = buildGraph([head])
print 'Graph building done.'
stdout.flush()
print 'Processing', len(graph.commits), 'commits (' + \
      str(len([x for x in graph.commits if len(x.parents) > 1])) + ' merge commits)...'

originalHead = open('.git/HEAD', 'r').read()
print 'Original head:', originalHead
stdout.flush()

writeHead('original-head', originalHead)

try:
    doWork(graph, sys.argv[1:])
finally:
    writeHead('HEAD', originalHead)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory.
  2005-09-07 16:47 [PATCH 0/2] A new merge algorithm, take 3 Fredrik Kuivinen
@ 2005-09-07 16:50 ` Fredrik Kuivinen
  2005-09-11  2:54   ` Unified merge driver pushed out to "master" branch Junio C Hamano
  2005-09-07 16:51 ` [PATCH 2/2] A new merge algorithm Fredrik Kuivinen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-07 16:50 UTC (permalink / raw)
  To: git; +Cc: junkio

This will be used by the merge code.

Signed-off-by: Fredrik Kuivinen <freku@ida.liu.se>


---

 read-tree.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

12d226103dc278c5d5aaf6b2dc776af3170ed233
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -7,6 +7,7 @@
 
 static int stage = 0;
 static int update = 0;
+static int ignore_working_dir = 0;
 
 static int unpack_tree(unsigned char *sha1)
 {
@@ -80,7 +81,10 @@ static void verify_uptodate(struct cache
 {
 	struct stat st;
 
-	if (!lstat(ce->name, &st)) {
+        if (ignore_working_dir)
+            return;
+
+        if (!lstat(ce->name, &st)) {
 		unsigned changed = ce_match_stat(ce, &st);
 		if (!changed)
 			return;
@@ -510,7 +514,7 @@ static int read_cache_unmerged(void)
 	return deleted;
 }
 
-static const char read_tree_usage[] = "git-read-tree (<sha> | -m [-u] <sha1> [<sha2> [<sha3>]])";
+static const char read_tree_usage[] = "git-read-tree (<sha> | -m [-u] [-i] <sha1> [<sha2> [<sha3>]])";
 
 static struct cache_file cache_file;
 
@@ -535,6 +539,11 @@ int main(int argc, char **argv)
 			continue;
 		}
 
+                if (!strcmp(arg, "-i")) {
+                        ignore_working_dir = 1;
+                        continue;
+                }
+
 		/* This differs from "-m" in that we'll silently ignore unmerged entries */
 		if (!strcmp(arg, "--reset")) {
 			if (stage || merge || emu23)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 2/2] A new merge algorithm
  2005-09-07 16:47 [PATCH 0/2] A new merge algorithm, take 3 Fredrik Kuivinen
  2005-09-07 16:50 ` [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory Fredrik Kuivinen
@ 2005-09-07 16:51 ` Fredrik Kuivinen
  2005-09-07 18:33 ` [PATCH 0/2] A new merge algorithm, take 3 Daniel Barkalow
  2005-09-07 18:36 ` Junio C Hamano
  3 siblings, 0 replies; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-07 16:51 UTC (permalink / raw)
  To: git; +Cc: junkio

The new algorithm handles multiple common ancestors in a better way.

Signed-off-by: Fredrik Kuivinen <freku@ida.liu.se>


---

 Makefile          |    3 
 git-merge-script  |  491 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 gitMergeCommon.py |  286 +++++++++++++++++++++++++++++++
 3 files changed, 779 insertions(+), 1 deletions(-)
 create mode 100644 git-merge-script
 create mode 100644 gitMergeCommon.py

4c86af6fac9d0f527c277c88d7ae3d200dd24f61
diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -66,7 +66,8 @@ SCRIPTS=git git-merge-one-file-script gi
 	git-format-patch-script git-sh-setup-script git-push-script \
 	git-branch-script git-parse-remote-script git-verify-tag-script \
 	git-ls-remote-script git-rename-script \
-	git-request-pull-script git-bisect-script
+	git-request-pull-script git-bisect-script gitMergeCommon.py \
+	git-merge-script
 
 SCRIPTS += git-count-objects-script
 SCRIPTS += git-revert-script
diff --git a/git-merge-script b/git-merge-script
new file mode 100755
--- /dev/null
+++ b/git-merge-script
@@ -0,0 +1,491 @@
+#!/usr/bin/env python
+
+import sys, math, random, os, re, signal, tempfile, stat, errno
+from heapq import heappush, heappop
+from sets import Set
+from gitMergeCommon import *
+
+alwaysWriteTree = False
+
+# The actual merge code
+# ---------------------
+
+def merge(h1, h2, branch1Name, branch2Name, graph, callDepth=0):
+    '''Merge the commits h1 and h2, return the resulting virtual
+    commit object and a flag indicating the cleaness of the merge.'''
+    assert(isinstance(h1, Commit) and isinstance(h2, Commit))
+    assert(isinstance(graph, Graph))
+
+    def infoMsg(*args):
+        sys.stdout.write('  '*callDepth)
+        printList(args)
+    infoMsg('Merging:')
+    infoMsg(h1)
+    infoMsg(h2)
+    sys.stdout.flush()
+
+    ca = getCommonAncestors(graph, h1, h2)
+    infoMsg('found', len(ca), 'common ancestor(s):')
+    for x in ca:
+        infoMsg(x)
+    sys.stdout.flush()
+
+    Ms = ca[0]
+    for h in ca[1:]:
+        [Ms, ignore] = merge(Ms, h,
+                             'Temporary shared merge branch 1',
+                             'Temporary shared merge branch 2',
+                             graph, callDepth+1)
+        assert(isinstance(Ms, Commit))
+
+    if callDepth == 0:
+        if len(ca) > 1:
+            runProgram(['git-read-tree', h1.tree()])
+            runProgram(['git-update-cache', '-q', '--refresh'])
+        # Use the original index if we only have one common ancestor
+        
+        updateWd = True
+        if alwaysWriteTree:
+            cleanCache = True
+        else:
+            cleanCache = False
+    else:
+        runProgram(['git-read-tree', h1.tree()])
+        updateWd = False
+        cleanCache = True
+
+    [shaRes, clean] = mergeTrees(h1.tree(), h2.tree(), Ms.tree(),
+                                 branch1Name, branch2Name,
+                                 cleanCache, updateWd)
+
+    if clean or alwaysWriteTree:
+        res = Commit(None, [h1, h2], tree=shaRes)
+        graph.addNode(res)
+    else:
+        res = None
+
+    return [res, clean]
+
+getFilesRE = re.compile('([0-9]+) ([a-z0-9]+) ([0-9a-f]{40})\t(.*)')
+def getFilesAndDirs(tree):
+    files = Set()
+    dirs = Set()
+    out = runProgram(['git-ls-tree', '-r', '-z', tree])
+    for l in out.split('\0'):
+        m = getFilesRE.match(l)
+        if m:
+            if m.group(2) == 'tree':
+                dirs.add(m.group(4))
+            elif m.group(2) == 'blob':
+                files.add(m.group(4))
+
+    return [files, dirs]
+
+class CacheEntry:
+    def __init__(self, path):
+        class Stage:
+            def __init__(self):
+                self.sha1 = None
+                self.mode = None
+        
+        self.stages = [Stage(), Stage(), Stage()]
+        self.path = path
+
+unmergedRE = re.compile('^([0-9]+) ([0-9a-f]{40}) ([1-3])\t(.*)$')
+def unmergedCacheEntries():
+    '''Create a dictionary mapping file names to CacheEntry
+    objects. The dictionary contains one entry for every path with a
+    non-zero stage entry.'''
+
+    lines = runProgram(['git-ls-files', '-z', '--unmerged']).split('\0')
+    lines.pop()
+
+    res = {}
+    for l in lines:
+        m = unmergedRE.match(l)
+        if m:
+            mode = int(m.group(1), 8)
+            sha1 = m.group(2)
+            stage = int(m.group(3)) - 1
+            path = m.group(4)
+
+            if res.has_key(path):
+                e = res[path]
+            else:
+                e = CacheEntry(path)
+                res[path] = e
+                
+            e.stages[stage].mode = mode
+            e.stages[stage].sha1 = sha1
+        else:
+            print 'Error: Merge program failed: Unexpected output from', \
+                  'git-ls-files:', l
+            sys.exit(1)
+    return res
+
+def mergeTrees(head, merge, common, branch1Name, branch2Name,
+               cleanCache, updateWd):
+    '''Merge the trees 'head' and 'merge' with the common ancestor
+    'common'. The name of the head branch is 'branch1Name' and the name of
+    the merge branch is 'branch2Name'. Return a tuple (tree, cleanMerge)
+    where tree is the resulting tree and cleanMerge is True iff the
+    merge was clean.'''
+    
+    assert(isSha(head) and isSha(merge) and isSha(common))
+
+    if common == merge:
+        print 'Already uptodate!'
+        return [head, True]
+
+    if updateWd:
+        updateArg = '-u'
+    else:
+        updateArg = '-i'
+    runProgram(['git-read-tree', updateArg, '-m', common, head, merge])
+    cleanMerge = True
+
+    [tree, code] = runProgram('git-write-tree', returnCode=True)
+    tree = tree.rstrip()
+    if code != 0:
+        [files, dirs] = getFilesAndDirs(head)
+        [filesM, dirsM] = getFilesAndDirs(merge)
+        files.union_update(filesM)
+        dirs.union_update(dirsM)
+        
+        cleanMerge = True
+        entries = unmergedCacheEntries()
+        for name in entries:
+            if not processEntry(entries[name], branch1Name, branch2Name,
+                                files, dirs, cleanCache, updateWd):
+                cleanMerge = False
+                
+        if cleanMerge or cleanCache:
+            tree = runProgram('git-write-tree').rstrip()
+        else:
+            tree = None
+    else:
+        cleanMerge = True
+
+    return [tree, cleanMerge]
+
+def processEntry(entry, branch1Name, branch2Name, files, dirs,
+                 cleanCache, updateWd):
+    '''Merge one cache entry. 'files' is a Set with the files in both of
+    the heads that we are going to merge. 'dirs' contains the
+    corresponding data for directories. If 'cleanCache' is True no
+    non-zero stages will be left in the cache for the path
+    corresponding to the entry 'entry'.'''
+
+# cleanCache == True  => Don't leave any non-stage 0 entries in the cache.
+#               False => Leave unmerged entries
+
+# updateWd  == True  => Update the working directory to correspond to the cache
+#              False => Leave the working directory unchanged
+
+# clean     == True  => non-conflict case
+#              False => conflict case
+
+# If cleanCache == False then the cache shouldn't be updated if clean == False
+
+    def updateFile(clean, sha, mode, path):
+        if cleanCache or (not cleanCache and clean):
+            runProgram(['git-update-cache', '--add', '--cacheinfo',
+                        '0%o' % mode, sha, path])
+
+        if updateWd:
+            prog = ['git-cat-file', 'blob', sha]
+            if stat.S_ISREG(mode):
+                try:
+                    os.unlink(path)
+                except OSError:
+                    pass
+                if mode & 0100:
+                    mode = 0777
+                else:
+                    mode = 0666
+                fd = os.open(path, os.O_WRONLY | os.O_TRUNC | os.O_CREAT, mode)
+                proc = subprocess.Popen(prog, stdout=fd)
+                proc.wait()
+                os.close(fd)
+            elif stat.S_ISLNK(mode):
+                linkTarget = runProgram(prog)
+                os.symlink(linkTarget, path)
+            else:
+                assert(False)
+            runProgram(['git-update-cache', '--', path])
+
+    def removeFile(clean, path):
+        if cleanCache or (not cleanCache and clean):
+            runProgram(['git-update-cache', '--force-remove', '--', path])
+
+        if updateWd:
+            try:
+                os.unlink(path)
+            except OSError, e:
+                if e.errno != errno.ENOENT and e.errno != errno.EISDIR:
+                    raise
+
+    def uniquePath(path, branch):
+        newPath = path + '_' + branch
+        suffix = 0
+        while newPath in files or newPath in dirs:
+            suffix += 1
+            newPath = path + '_' + branch + '_' + str(suffix)
+        files.add(newPath)
+        return newPath
+
+    debug('processing', entry.path, 'clean cache:', cleanCache,
+          'wd:', updateWd)
+
+    cleanMerge = True
+
+    path = entry.path
+    oSha = entry.stages[0].sha1
+    oMode = entry.stages[0].mode
+    aSha = entry.stages[1].sha1
+    aMode = entry.stages[1].mode
+    bSha = entry.stages[2].sha1
+    bMode = entry.stages[2].mode
+
+    assert(oSha == None or isSha(oSha))
+    assert(aSha == None or isSha(aSha))
+    assert(bSha == None or isSha(bSha))
+
+    assert(oMode == None or type(oMode) is int)
+    assert(aMode == None or type(aMode) is int)
+    assert(bMode == None or type(bMode) is int)
+
+    if (oSha and (not aSha or not bSha)):
+    #
+    # Case A: Deleted in one
+    #
+        if (not aSha     and not bSha) or \
+           (aSha == oSha and not bSha) or \
+           (not aSha     and bSha == oSha):
+    # Deleted in both or deleted in one and unchanged in the other
+            if aSha:
+                print 'Removing ' + path
+            removeFile(True, path)
+        else:
+    # Deleted in one and changed in the other
+            cleanMerge = False
+            if not aSha:
+                print 'CONFLICT (del/mod): "' + path + '" deleted in', \
+                      branch1Name, 'and modified in', branch2Name, \
+                      '. Version', branch2Name, ' of "' + path + \
+                      '" left in tree'
+                mode = bMode
+                sha = bSha
+            else:
+                print 'CONFLICT (mod/del): "' + path + '" deleted in', \
+                      branch2Name, 'and modified in', branch1Name + \
+                      '. Version', branch1Name, 'of "' + path + \
+                      '" left in tree'
+                mode = aMode
+                sha = aSha
+
+            updateFile(False, sha, mode, path)
+    
+    elif (not oSha and aSha     and not bSha) or \
+         (not oSha and not aSha and bSha):
+    #
+    # Case B: Added in one.
+    #
+        if aSha:
+            addBranch = branch1Name
+            otherBranch = branch2Name
+            mode = aMode
+            sha = aSha
+            conf = 'file/dir'
+        else:
+            addBranch = branch2Name
+            otherBranch = branch1Name
+            mode = bMode
+            sha = bSha
+            conf = 'dir/file'
+    
+        if path in dirs:
+            cleanMerge = False
+            newPath = uniquePath(path, addBranch)
+            print 'CONFLICT (' + conf + \
+                  '): There is a directory with name "' + path + '" in', \
+                  otherBranch + '. Adding "' + path + '" as "' + newPath + '"'
+
+            removeFile(False, path)
+            path = newPath
+        else:
+            print 'Adding "' + path + '"'
+
+        updateFile(True, sha, mode, path)
+    
+    elif not oSha and aSha and bSha:
+    #
+    # Case C: Added in both (check for same permissions).
+    #
+        if aSha == bSha:
+            if aMode != bMode:
+                cleanMerge = False
+                print 'CONFLICT: File "' + path + \
+                      '" added identically in both branches,'
+                print 'CONFLICT: but permissions conflict', '0%o' % aMode, \
+                      '->', '0%o' % bMode
+                print 'CONFLICT: adding with permission:', '0%o' % aMode
+
+                updateFile(False, aSha, aMode, path)
+            else:
+                # This case is handled by git-read-tree
+                assert(False)
+        else:
+            cleanMerge = False
+            newPath1 = uniquePath(path, branch1Name)
+            newPath2 = uniquePath(path, branch2Name)
+            print 'CONFLICT (add/add): File "' + path + \
+                  '" added non-identically in both branches.', \
+                  'Adding "' + newPath1 + '" and "' + newPath2 + '" instead.'
+            removeFile(False, path)
+            updateFile(False, aSha, aMode, newPath1)
+            updateFile(False, bSha, bMode, newPath2)
+
+    elif oSha and aSha and bSha:
+    #
+    # case D: Modified in both, but differently.
+    #
+        print 'Auto-merging', path 
+        orig = runProgram(['git-unpack-file', oSha]).rstrip()
+        src1 = runProgram(['git-unpack-file', aSha]).rstrip()
+        src2 = runProgram(['git-unpack-file', bSha]).rstrip()
+        [out, ret] = runProgram(['merge',
+                                 '-L', branch1Name + '/' + path,
+                                 '-L', 'orig/' + path,
+                                 '-L', branch2Name + '/' + path,
+                                 src1, orig, src2], returnCode=True)
+
+        if aMode == oMode:
+            mode = bMode
+        else:
+            mode = aMode
+
+        sha = runProgram(['git-hash-object', '-t', 'blob', '-w',
+                          src1]).rstrip()
+
+        if ret != 0:
+            cleanMerge = False
+            print 'CONFLICT (content): Merge conflict in "' + path + '".'
+            updateFile(False, sha, mode, path)
+        else:
+            updateFile(True, sha, mode, path)
+
+        os.unlink(orig)
+        os.unlink(src1)
+        os.unlink(src2)
+    else:
+        print 'ERROR: Fatal merge failure.'
+        print "ERROR: Shouldn't happen"
+        sys.exit(1)
+
+    return cleanMerge
+
+def dropHeads():
+    try:
+        os.unlink(os.environ['GIT_DIR'] + '/MERGE_HEAD')
+    except OSError:
+        pass
+
+    try:
+        os.unlink(os.environ['GIT_DIR'] + '/LAST_MERGE')
+    except OSError:
+        pass
+
+def writeHead(head, sha):
+    if sha[-1] != '\n':
+        sha += '\n'
+
+    try:
+        f = open(os.environ['GIT_DIR'] + '/' + head, 'w')
+        f.write(sha)
+        f.close()
+    except IOError, e:
+        print 'Failed to write to', os.environ['GIT_DIR'] + '/' + head + ':', \
+              e.strerror
+        sys.exit(1)
+    return True
+
+def checkCleanTree():
+    [ignore, code] = runProgram(['git-update-cache', '--refresh'],
+                                returnCode = True)
+    if code != 0:
+        return False
+    out = runProgram(['git-diff-cache', '--name-only', '--cached', 'HEAD'])
+    if len(out) > 0:
+        return False
+    return True
+
+def doCommit(msg, p1, p2, tree):
+    assert(isSha(tree))
+    commit = runProgram(['git-commit-tree', tree, '-p', p1, '-p', p2],
+                        msg + '\n').rstrip()
+    assert(isSha(commit))
+    writeHead('HEAD', commit)
+    return commit.rstrip()
+
+def usage():
+    print 'Usage:', sys.argv[0], ' <head> <remote> <merge-message>'
+    sys.exit(1)
+
+if not checkCleanTree():
+    print 'Either the cache is out of sync or the working tree does not match HEAD.'
+    print 'Aborting merge.'
+    sys.exit(1)
+
+setupEnvironment()
+repoValid()
+
+try:
+    nextArg = 1
+    if sys.argv[nextArg] == '--write-tree':
+        alwaysWriteTree = True
+        nextArg += 1
+    else:
+        alwaysWriteTree = False
+
+    h1 = firstBranch = sys.argv[nextArg]
+    nextArg += 1
+    h2 = secondBranch = sys.argv[nextArg]
+    nextArg += 1
+    commitMessage = sys.argv[nextArg]
+    nextArg += 1
+except IndexError:
+    usage()
+    
+if len(sys.argv) != nextArg:
+    usage()
+
+print 'Merging', h1, 'with', h2
+h1 = runProgram(['git-rev-parse', '--verify', h1 + '^0']).rstrip()
+h2 = runProgram(['git-rev-parse', '--verify', h2 + '^0']).rstrip()
+
+writeHead('ORIG_HEAD', h1)
+writeHead('LAST_MERGE', h2)
+
+graph = buildGraph([h1, h2])
+
+[res, clean] = merge(graph.shaMap[h1], graph.shaMap[h2],
+                     firstBranch, secondBranch, graph)
+
+print ''
+
+if clean:
+    cmit = doCommit(commitMessage, h1, h2, res.tree())
+    print 'Committed merge', cmit
+    print runProgram('git-diff-tree -p ORIG_HEAD ' + cmit + \
+                     ' | git-apply --stat')
+    dropHeads()
+else:
+    print 'Automatic merge failed, fix up by hand'
+    writeHead('MERGE_HEAD', h2)
+
+if alwaysWriteTree:
+    print res.tree()
+
+if not clean:
+    sys.exit(1)
diff --git a/gitMergeCommon.py b/gitMergeCommon.py
new file mode 100644
--- /dev/null
+++ b/gitMergeCommon.py
@@ -0,0 +1,286 @@
+import sys, re, os, traceback
+from sets import Set
+
+if sys.version_info[0] < 2 or \
+       (sys.version_info[0] == 2 and sys.version_info[1] < 4):
+    print 'Python version 2.4 required, found', \
+          str(sys.version_info[0])+'.'+str(sys.version_info[1])+'.'+ \
+          str(sys.version_info[2])
+    sys.exit(1)
+
+import subprocess
+
+# Debugging machinery
+# -------------------
+
+DEBUG = 0
+functionsToDebug = Set()
+
+def addDebug(func):
+    if type(func) == str:
+        functionsToDebug.add(func)
+    else:
+        functionsToDebug.add(func.func_name)
+
+def debug(*args):
+    if DEBUG:
+        funcName = traceback.extract_stack()[-2][2]
+        if funcName in functionsToDebug:
+            printList(args)
+
+def printList(list):
+    for x in list:
+        sys.stdout.write(str(x))
+        sys.stdout.write(' ')
+    sys.stdout.write('\n')
+
+# Program execution
+# -----------------
+
+class ProgramError(Exception):
+    def __init__(self, progStr, error):
+        self.progStr = progStr
+        self.error = error
+
+addDebug('runProgram')
+def runProgram(prog, input=None, returnCode=False, env=None, pipeOutput=True):
+    debug('runProgram prog:', str(prog), 'input:', str(input))
+    if type(prog) is str:
+        progStr = prog
+    else:
+        progStr = ' '.join(prog)
+    
+    try:
+        if pipeOutput:
+            stderr = subprocess.STDOUT
+            stdout = subprocess.PIPE
+        else:
+            stderr = None
+            stdout = None
+        pop = subprocess.Popen(prog,
+                               shell = type(prog) is str,
+                               stderr=stderr,
+                               stdout=stdout,
+                               stdin=subprocess.PIPE,
+                               env=env)
+    except OSError, e:
+        debug('strerror:', e.strerror)
+        raise ProgramError(progStr, e.strerror)
+
+    if input != None:
+        pop.stdin.write(input)
+    pop.stdin.close()
+
+    if pipeOutput:
+        out = pop.stdout.read()
+    else:
+        out = ''
+
+    code = pop.wait()
+    if returnCode:
+        ret = [out, code]
+    else:
+        ret = out
+    if code != 0 and not returnCode:
+        debug('error output:', out)
+        debug('prog:', prog)
+        raise ProgramError(progStr, out)
+#    debug('output:', out.replace('\0', '\n'))
+    return ret
+
+# Git repository validation and initialization
+# --------------------------------------------
+
+def setupEnvironment():
+    if 'GIT_DIR' not in os.environ:
+        os.environ['GIT_DIR'] = '.git'
+
+    if not os.environ.has_key('GIT_OBJECT_DIRECTORY'):
+        os.environ['GIT_OBJECT_DIRECTORY'] = os.environ['GIT_DIR'] + '/objects'
+
+def repoValid():
+    if not (os.path.exists(os.environ['GIT_DIR']) and
+            os.path.exists(os.environ['GIT_DIR'] + '/refs') and
+            os.path.exists(os.environ['GIT_OBJECT_DIRECTORY']) and
+            os.path.exists(os.environ['GIT_OBJECT_DIRECTORY'] + '/00')):
+        print "Not a Git archive"
+        sys.exit(1)
+
+# Code for computing common ancestors
+# -----------------------------------
+
+currentId = 0
+def getUniqueId():
+    global currentId
+    currentId += 1
+    return currentId
+
+# The 'virtual' commit objects have SHAs which are integers
+shaRE = re.compile('^[0-9a-f]{40}$')
+def isSha(obj):
+    return (type(obj) is str and bool(shaRE.match(obj))) or \
+           (type(obj) is int and obj >= 1)
+
+class Commit:
+    def __init__(self, sha, parents, tree=None):
+        self.parents = parents
+        self.firstLineMsg = None
+        self.children = []
+
+        if tree:
+            tree = tree.rstrip()
+            assert(isSha(tree))
+        self._tree = tree
+
+        if not sha:
+            self.sha = getUniqueId()
+            self.virtual = True
+            self.firstLineMsg = 'virtual commit'
+            assert(isSha(tree))
+        else:
+            self.virtual = False
+            self.sha = sha.rstrip()
+        assert(isSha(self.sha))
+
+    def tree(self):
+        self.getInfo()
+        assert(self._tree != None)
+        return self._tree
+
+    def shortInfo(self):
+        self.getInfo()
+        return str(self.sha) + ' ' + self.firstLineMsg
+
+    def __str__(self):
+        return self.shortInfo()
+
+    def getInfo(self):
+        if self.virtual or self.firstLineMsg != None:
+            return
+        else:
+            info = runProgram(['git-cat-file', 'commit', self.sha])
+            info = info.split('\n')
+            msg = False
+            for l in info:
+                if msg:
+                    self.firstLineMsg = l
+                    break
+                else:
+                    if l.startswith('tree'):
+                        self._tree = l[5:].rstrip()
+                    elif l == '':
+                        msg = True
+
+class Graph:
+    def __init__(self):
+        self.commits = []
+        self.shaMap = {}
+
+    def addNode(self, node):
+        assert(isinstance(node, Commit))
+        self.shaMap[node.sha] = node
+        self.commits.append(node)
+        for p in node.parents:
+            p.children.append(node)
+        return node
+
+    def reachableNodes(self, n1, n2):
+        res = {}
+        def traverse(n):
+            res[n] = True
+            for p in n.parents:
+                traverse(p)
+
+        traverse(n1)
+        traverse(n2)
+        return res
+
+    def fixParents(self, node):
+        for x in range(0, len(node.parents)):
+            node.parents[x] = self.shaMap[node.parents[x]]
+
+# addDebug('buildGraph')
+def buildGraph(heads):
+    debug('buildGraph heads:', heads)
+    for h in heads:
+        assert(isSha(h))
+
+    g = Graph()
+
+    out = runProgram(['git-rev-list', '--parents'] + heads)
+    for l in out.split('\n'):
+        if l == '':
+            continue
+        shas = l.split(' ')
+
+        # This is a hack, we temporarily use the 'parents' attribute
+        # to contain a list of SHA1:s. They are later replaced by proper
+        # Commit objects.
+        c = Commit(shas[0], shas[1:])
+
+        g.commits.append(c)
+        g.shaMap[c.sha] = c
+
+    for c in g.commits:
+        g.fixParents(c)
+
+    for c in g.commits:
+        for p in c.parents:
+            p.children.append(c)
+    return g
+
+# Write the empty tree to the object database and return its SHA1
+def writeEmptyTree():
+    tmpIndex = os.environ['GIT_DIR'] + '/merge-tmp-index'
+    def delTmpIndex():
+        try:
+            os.unlink(tmpIndex)
+        except OSError:
+            pass
+    delTmpIndex()
+    newEnv = os.environ.copy()
+    newEnv['GIT_INDEX_FILE'] = tmpIndex
+    res = runProgram(['git-write-tree'], env=newEnv).rstrip()
+    delTmpIndex()
+    return res
+
+def addCommonRoot(graph):
+    roots = []
+    for c in graph.commits:
+        if len(c.parents) == 0:
+            roots.append(c)
+
+    superRoot = Commit(sha=None, parents=[], tree=writeEmptyTree())
+    graph.addNode(superRoot)
+    for r in roots:
+        r.parents = [superRoot]
+    superRoot.children = roots
+    return superRoot
+
+def getCommonAncestors(graph, commit1, commit2):
+    '''Find the common ancestors for commit1 and commit2'''
+    assert(isinstance(commit1, Commit) and isinstance(commit2, Commit))
+
+    def traverse(start, set):
+        stack = [start]
+        while len(stack) > 0:
+            el = stack.pop()
+            set.add(el)
+            for p in el.parents:
+                if p not in set:
+                    stack.append(p)
+    h1Set = Set()
+    h2Set = Set()
+    traverse(commit1, h1Set)
+    traverse(commit2, h2Set)
+    shared = h1Set.intersection(h2Set)
+
+    if len(shared) == 0:
+        shared = [addCommonRoot(graph)]
+        
+    res = Set()
+
+    for s in shared:
+        if len([c for c in s.children if c in shared]) == 0:
+            res.add(s)
+    return list(res)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-07 16:47 [PATCH 0/2] A new merge algorithm, take 3 Fredrik Kuivinen
  2005-09-07 16:50 ` [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory Fredrik Kuivinen
  2005-09-07 16:51 ` [PATCH 2/2] A new merge algorithm Fredrik Kuivinen
@ 2005-09-07 18:33 ` Daniel Barkalow
  2005-09-08  6:06   ` Fredrik Kuivinen
  2005-09-07 18:36 ` Junio C Hamano
  3 siblings, 1 reply; 40+ messages in thread
From: Daniel Barkalow @ 2005-09-07 18:33 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git, junkio

On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:

> Of the 500 merge commits that currently exists in the kernel
> repository 19 produces non-clean merges with git-merge-script. The
> four merge cases listed in
> <20050827014009.GB18880@c165.ib.student.liu.se> are cleanly merged by
> git-merge-script. Every merge commit which is cleanly merged by
> git-resolve-script is also cleanly merged by git-merge-script,
> furthermore the results are identical. There are currently two merges
> in the kernel repository which are not cleanly merged by
> git-resolve-script but are cleanly merged by git-merge-script.

If you use my read-tree and change git-resolve-script to pass all of the 
ancestors to it, how does it do? I expect you'll still be slightly ahead, 
because we don't (yet) have content merge with multiple ancestors. You 
should also check the merge that Tony Luck reported, which undid a revert, 
as well as the one that Len Brown reported around the same time which had 
similar problems. I think maintainer trees are a much better test for a 
merge algorithm, because the kernel repository is relatively linear, while 
maintainers tend more to merge things back and forth.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-07 16:47 [PATCH 0/2] A new merge algorithm, take 3 Fredrik Kuivinen
                   ` (2 preceding siblings ...)
  2005-09-07 18:33 ` [PATCH 0/2] A new merge algorithm, take 3 Daniel Barkalow
@ 2005-09-07 18:36 ` Junio C Hamano
       [not found]   ` <431F34FF.5050301@citi.umich.edu>
  2005-09-08 20:54   ` [PATCH 0/2] A new merge algorithm, take 3 Junio C Hamano
  3 siblings, 2 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-07 18:36 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git

Fredrik Kuivinen <freku045@student.liu.se> writes:

> I guess the need for this has decreased with Daniel's new read-tree
> code. Is there any chance of getting this code merged into mainline
> git?

I do not think Daniel's code decreased anything.  The two
algorithms are solving the same problem but they do it
differently.  There would be cases where multi-base merge would
give better results over your base-on-merge-bases merge; and in
other cases yours would perform better.  I'd like to leave what
merge strategy to use as an user option, and leave the door open
for other merge strategies to emerge later.  I know Pasky wants
to look into pcdv merge and other alternatives.

This is still off-the-top-of-my-head, but the top-level merge
entry point for the end user would be just:

    git merge <head> <remote> <merge-message>

and by default 'git-merge-script' (sorry, I am taking over the
name of your script for this 'generic driver for underlying
merge strategy scripts') would do something like:

 - Run 'git-merge-base -a' to find common ancestors.

 - If there is one single common ancestor, do what the current
   'git-resolve-script' does: if fast-forward, declare success
   and exit without touching anything but $GIT_DIR/HEAD;
   otherwise, run read-tree -m 3-way followed by merge-cache,
   and commit with the given message if the merge was clean and
   exit.  If the merge was not clean, go on to the next step.

 - User preference (maybe $HOME/.git/merge-preference, maybe
   command line option to 'git merge', maybe interactive
   prompting) can specify which complex merge strategies to use.
   we probably would want to be able to say "try this, that and
   that-over-there in this order" in the preference. One of the
   strategies could be 'I give up, just exit and sort it out by
   hand'.

 - The chosen merge backend, be it the one that uses Daniel's
   multi-base merge or your base-on-merge-bases merge, takes
   over.  In any case, the backend should attempt to resolve the
   two heads, and stop at the point just before committing.

 - Evaluate how good the merge is by looking at the result from
   the backend at this point, and if it is not good enough,
   reset the working tree back to the pre-merge state and try
   the next backend.  The looping code needs a termination
   clause that picks the best result after exhausting the list
   of backends.

 - Then, if the merge backend auto-resolved the heads, we
   commit with <merge-message>; otherwise we exit with
   non-zero status and have the user sort it out.

The above assumes that (1) we are already doing a reasonable
thing in simple one-common-ancestor case; (2) it is in the fast
path and we want the multiple backend code out of it.  You could
think of the current 'git-resolve-script' code equivalent of
having a single 'complex merge backend' that just 'gives up'.

If the above is a good user model, we need a couple of interface
convention between merge backends and the above driver:

 - Some merge backends (like yours) would require that the
   starting tree is clean while others may not care as long as
   the paths involved are clean (i.e. index matches <head> and
   the working file matches index -- the traditional code and
   Daniel's even allow such a working file being different from
   index if it happens to match the merge result).  They need to
   be able to say "declined to merge even though I might do a
   better job than others if given a clean working tree".  Then
   the user can retry the same merge after getting the working
   tree in a clean state.  I personally feel that we do not need
   this complexity and it is reasonable to always require the
   starting tree to be clean when the merge is not a
   fast-forward, though.

 - When a merge backend finishes, it may leave the working tree
   in a failed merge state.  If we were to try different
   backends in the loop to find the best result, we would need
   some way to assign scores to them.  The score should be able
   to tell us if the result can be auto-committable or not, and
   if not how bad it is.  I think exit status from the backend
   can be used to tell us the former (i.e. exit non-zero if your
   result has conflicts and you do not want your result to be
   committed immediately), and number of cache-dirty entries can
   be used as an indication of how bad it is (i.e. leave the
   paths you know have not been cleanly merged cache-dirty -- do
   not run git-update-cache on them).  This is essentially the
   same convention used by the current 'git-resolve-script'.

I think the 'renaming merge heuristics' Linus outlined in
another thread will fall naturally into this picture as a merge
backend.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-07 18:33 ` [PATCH 0/2] A new merge algorithm, take 3 Daniel Barkalow
@ 2005-09-08  6:06   ` Fredrik Kuivinen
  2005-09-08 15:27     ` Daniel Barkalow
  0 siblings, 1 reply; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-08  6:06 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Fredrik Kuivinen, git, junkio

On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote:
> On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:
> 
> > Of the 500 merge commits that currently exists in the kernel
> > repository 19 produces non-clean merges with git-merge-script. The
> > four merge cases listed in
> > <20050827014009.GB18880@c165.ib.student.liu.se> are cleanly merged by
> > git-merge-script. Every merge commit which is cleanly merged by
> > git-resolve-script is also cleanly merged by git-merge-script,
> > furthermore the results are identical. There are currently two merges
> > in the kernel repository which are not cleanly merged by
> > git-resolve-script but are cleanly merged by git-merge-script.
> 
> If you use my read-tree and change git-resolve-script to pass all of the 
> ancestors to it, how does it do? I expect you'll still be slightly ahead, 
> because we don't (yet) have content merge with multiple ancestors. You 
> should also check the merge that Tony Luck reported, which undid a revert, 
> as well as the one that Len Brown reported around the same time which had 
> similar problems. I think maintainer trees are a much better test for a 
> merge algorithm, because the kernel repository is relatively linear, while 
> maintainers tend more to merge things back and forth.

Junio tested some of the multiple common ancestor cases with your
version of read-tree and reported his results in
<7virxeycod.fsf@assigned-by-dhcp.cox.net>.

The two cases my algorithm merges cleanly and git-resolve-script do
not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
ancestors. The second one have, as far as I know, not been tested with
your read-tree.

The merge cases reported by Tony Luck and Len Brown are both cleanly
merged by my code.

You are probably right about the maintainer trees. I should have a
look at some of them. Do you know any specific repositories with
interesting merge cases?

- Fredrik

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
       [not found]     ` <7vvf1cz64l.fsf@assigned-by-dhcp.cox.net>
@ 2005-09-08 15:06       ` Chuck Lever
  2005-09-08 16:51         ` Junio C Hamano
  0 siblings, 1 reply; 40+ messages in thread
From: Chuck Lever @ 2005-09-08 15:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]

Junio C Hamano wrote:
> Chuck Lever <cel@citi.umich.edu> writes:
> 
> 
>>for the past two weeks i've been attempting to replace the active_cache 
>>array with an abstract data type (linked list just as a prototype) in 
>>order to eliminate the need to use memmove() during insertions and 
>>deletions.
>>
>>one big win is that the merge algorithm no longer needs to overwrite the 
>>active_cache.  you can keep two lists and simply move elements from one 
>>to the other as you do the merge, and then hook the new, merged list 
>>back to the cache head when you're done.
>>
>>anyway, i'm at a point where i could use some help and advice if this is 
>>something you think could be useful in the long run.
> 
> 
> This is interesting.
> 
> For something like a kernel tree that has around 18k entries, a
> single memmove will at the worst case move that many pointers,
> so naturally a list based implementation would perform well for
> many insertions and deletions.   On the other hand, the list
> based implementation would make it very expensive to find an
> entry read-only, I suspect, unlike the array based
> implementation which lets us do binary search.

using a list was an easy prototype to see what issues there are when 
converting from an array into an abstract data type.  i've created a few 
helpers that allow me to limit the list implementation changes to 
cache.h, read-cache.c, and read-tree.c; but having these helpers means 
it should be fairly simple to switch to any reasonable data structure. 
(the helper parts are already working and i can post them to the list if 
folks are interested).

once the list implementation is working well, i plan to go back and 
replace it with something more sophisticated which can perform 
insertion, deletion, and searching efficiently.

in the long run, i see using a balanced tree.  that would make 
insertions and searches quick, and prevent the tree from degenerating 
into a linked list due to a series of in-order insertions.  a splay tree 
might be even more entertaining, as it always remains balanced, and a 
search operation places the found node or insertion point right at the 
root of the tree.

each node in the tree would represent a unique path, and have references 
to the entries of all four stages, contained in an array.  that would 
simplify several routines in read-cache.c and probably make the merge 
logic even easier to understand and more efficient.

> I haven't timed it like you did, but my gut feeling is that the
> most wastage during the merge is coming from having to move
> entries because we "insert into" or "remove from" the same
> active-cache array.

the merge actually replaces entries in place, so it is already fairly 
efficient.  the wastage in the merge case arises from the many list 
insertions done by read_cache, all of which involve moving some of the 
active cache array.

> If we allocate a new array and populate it
> from scratch as we iterate through the paths being merged and
> replace the active_cache pointer at the very end, we would do
> much better without going to a linked list implementation, which
> would penalize the normal read-only case.

i can already tell you that using a separate source and destination data 
structure during the merge simplifies the logic and makes it easier to 
understand.  i imagine it will also be easier to maintain and enhance in 
the long run.

> I think what Daniel
> did to read-tree recently, still in the proposed updates branch,
> would make this kind of change far easier to implement.

as i'm new to git development, i wasn't aware of the proposed branch.  i 
will see if i can take a look (or send me a pointer to the list archives 
if he has posted them here).

> I wanted to cc this message to Daniel but your message to me was
> somehow private so I did not.  Please feel free to bring this
> issue up either with him directly or on the list.

i wanted to assess whether this was a reasonable idea or if there was 
already work in progress in this area.  thanks for your response!

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08  6:06   ` Fredrik Kuivinen
@ 2005-09-08 15:27     ` Daniel Barkalow
  2005-09-08 20:05       ` Fredrik Kuivinen
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Barkalow @ 2005-09-08 15:27 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git, junkio

On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

> On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote:
> > On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:
> > 
> > > Of the 500 merge commits that currently exists in the kernel
> > > repository 19 produces non-clean merges with git-merge-script. The
> > > four merge cases listed in
> > > <20050827014009.GB18880@c165.ib.student.liu.se> are cleanly merged by
> > > git-merge-script. Every merge commit which is cleanly merged by
> > > git-resolve-script is also cleanly merged by git-merge-script,
> > > furthermore the results are identical. There are currently two merges
> > > in the kernel repository which are not cleanly merged by
> > > git-resolve-script but are cleanly merged by git-merge-script.
> > 
> > If you use my read-tree and change git-resolve-script to pass all of the 
> > ancestors to it, how does it do? I expect you'll still be slightly ahead, 
> > because we don't (yet) have content merge with multiple ancestors. You 
> > should also check the merge that Tony Luck reported, which undid a revert, 
> > as well as the one that Len Brown reported around the same time which had 
> > similar problems. I think maintainer trees are a much better test for a 
> > merge algorithm, because the kernel repository is relatively linear, while 
> > maintainers tend more to merge things back and forth.
> 
> Junio tested some of the multiple common ancestor cases with your
> version of read-tree and reported his results in
> <7virxeycod.fsf@assigned-by-dhcp.cox.net>.

Oh, right. I'm clearly not paying enough attention here.

> The two cases my algorithm merges cleanly and git-resolve-script do
> not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
> 0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
> ancestors. The second one have, as far as I know, not been tested with
> your read-tree.

Okay, I'll have to check whether the result I get seems right. I take it 
your result agrees with what the users actually produced by hand?

> The merge cases reported by Tony Luck and Len Brown are both cleanly
> merged by my code.

Do they come out correctly? Both of those have cases which cannot be 
decided correctly with only the ancestor trees, due to one branch 
reverting a patch that was only in one ancestor. The correct result is to 
revert that patch, but figuring out that requires looking at more trees. I 
think your algorithm should work for this case, but it would be good to 
have verification. (IIRC, Len got the correct result while Tony got the 
wrong result and then corrected it later.)

> You are probably right about the maintainer trees. I should have a
> look at some of them. Do you know any specific repositories with
> interesting merge cases?

Not especially, except that I would guess that people who have reported 
hitting bad cases would be more likely to have other interesting merges in 
their trees. You might also try merging maintainer trees with each other, 
since it's relatively likely that there would be complicating overlap that 
only doesn't cause confusion because things get rearranged in -mm. For 
that matter, I bet you'd get plenty of test cases out of trying to 
replicate -mm as a git tree.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 15:06       ` Chuck Lever
@ 2005-09-08 16:51         ` Junio C Hamano
  2005-09-08 17:19           ` Linus Torvalds
  0 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2005-09-08 16:51 UTC (permalink / raw)
  To: cel; +Cc: git

Chuck Lever <cel@citi.umich.edu> writes:

> Junio C Hamano wrote:
>> Chuck Lever <cel@citi.umich.edu> writes:
>>
> once the list implementation is working well, i plan to go back and 
> replace it with something more sophisticated which can perform 
> insertion, deletion, and searching efficiently.

Well, what I wanted to see was if we can get away without
introducing more sophisticated data structures, which would add
complexity.

>> I haven't timed it like you did, but my gut feeling is that the
>> most wastage during the merge is coming from having to move
>> entries because we "insert into" or "remove from" the same
>> active-cache array.
>
> the merge actually replaces entries in place, so it is already fairly 
> efficient.  the wastage in the merge case arises from the many list 
> insertions done by read_cache, all of which involve moving some of the 
> active cache array.

Yes, the reading of three trees upfront is probably the culprit
in your case, and I think that is something Daniel's read-tree
rewrite can help.  I still see some remove_cache_entry_at()
there because it works on the active_cache in place, but the
structure of the new code should be much easier to make the kind
of modification we are talking about.

>> I think what Daniel
>> did to read-tree recently, still in the proposed updates branch,
>> would make this kind of change far easier to implement.
>
> as i'm new to git development, i wasn't aware of the proposed branch.  i 
> will see if i can take a look (or send me a pointer to the list archives 
> if he has posted them here).

If you already have a local copy of git.git repository you work in:

	git fetch http://kernel.org/pub/scm/git/git.git +pu:refs/heads/ko-pu
        git checkout -f ko-pu

Otherwise

	git clone http://kernel.org/pub/scm/git/git.git peek
        cd peek
        git checkout -f pu

The interesting change from the mainline is in read-tree.c

You can browse http://kernel.org/git/ and see commitdiffs in the
"pu" branch if you are not interested in slurping the whole
proposed updates branch.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 16:51         ` Junio C Hamano
@ 2005-09-08 17:19           ` Linus Torvalds
  2005-09-08 17:51             ` Junio C Hamano
  2005-09-08 18:16             ` Chuck Lever
  0 siblings, 2 replies; 40+ messages in thread
From: Linus Torvalds @ 2005-09-08 17:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: cel, git



On Thu, 8 Sep 2005, Junio C Hamano wrote:
> 
> Yes, the reading of three trees upfront is probably the culprit
> in your case

However, note that _most_ tree reading just reads one.

Merges may take half a second, and yes, when I did it, the fact that we 
move things around in the array is by far the highest cost. But the thing 
is, if merges take half a second, that's still not only damn good, it's 
not even the most common operation.

Yes, the active_cache[] layout as one big array is inconvenient for 
read_tree(), which tends to want to interleave different trees in the 
array and thus forces things to be moved around.

But remember what the most common use for the index is - it's the "single 
tree" case from read_cache(). That's _so_ much more common than 
read_tree() that it's not even funny.

So the data structure is optimized for a different case than reading in 
trees. Big deal. That optimization is definitely worth it: it allows us to 
do the read_cache() with the actual index entries being totally read-only 
(a linked list would have to add a "next" pointer to the cache entries and 
not allow the in-place thing that read_cache() does).

		Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 17:19           ` Linus Torvalds
@ 2005-09-08 17:51             ` Junio C Hamano
  2005-09-08 18:16             ` Chuck Lever
  1 sibling, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-08 17:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: cel, git

Linus Torvalds <torvalds@osdl.org> writes:

> So the data structure is optimized for a different case than reading in 
> trees. Big deal. That optimization is definitely worth it: it allows us to 
> do the read_cache() with the actual index entries being totally read-only 
> (a linked list would have to add a "next" pointer to the cache entries and 
> not allow the in-place thing that read_cache() does).

Yes, you are right.  What is being discussed is to help
read-tree.c while keeping that nice property.

I think Daniel's patch is going in the right direction.  It can
be told to populate the resulting cache from scratch by reading
the current cache and the trees being read, instead of inserting
and removing the current cache as it does right now.  Once that
is done, we would not have to do repeated memmove to insert and
delete an entry one at a time anymore.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 17:19           ` Linus Torvalds
  2005-09-08 17:51             ` Junio C Hamano
@ 2005-09-08 18:16             ` Chuck Lever
  2005-09-08 18:35               ` Linus Torvalds
  1 sibling, 1 reply; 40+ messages in thread
From: Chuck Lever @ 2005-09-08 18:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 1599 bytes --]

Linus Torvalds wrote:
> 
> On Thu, 8 Sep 2005, Junio C Hamano wrote:
> 
>>Yes, the reading of three trees upfront is probably the culprit
>>in your case
> 
> 
> However, note that _most_ tree reading just reads one.
> 
> Merges may take half a second, and yes, when I did it, the fact that we 
> move things around in the array is by far the highest cost. But the thing 
> is, if merges take half a second, that's still not only damn good, it's 
> not even the most common operation.

in my case the merges were taking significantly longer than a half 
second.  making this change is certainly not worth it if merges are 
running fast...

> Yes, the active_cache[] layout as one big array is inconvenient for 
> read_tree(), which tends to want to interleave different trees in the 
> array and thus forces things to be moved around.
> 
> But remember what the most common use for the index is - it's the "single 
> tree" case from read_cache(). That's _so_ much more common than 
> read_tree() that it's not even funny.

read_cache is fast as implemented.  the issue is that the next thing 
that is often done is a cache insert, which requires a memmove, which is 
slow.

> So the data structure is optimized for a different case than reading in 
> trees. Big deal. That optimization is definitely worth it: it allows us to 
> do the read_cache() with the actual index entries being totally read-only 
> (a linked list would have to add a "next" pointer to the cache entries and 
> not allow the in-place thing that read_cache() does).

they are still read-only with my linked list implementation.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 18:16             ` Chuck Lever
@ 2005-09-08 18:35               ` Linus Torvalds
  2005-09-08 18:58                 ` Chuck Lever
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2005-09-08 18:35 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Junio C Hamano, git



On Thu, 8 Sep 2005, Chuck Lever wrote:
> 
> in my case the merges were taking significantly longer than a half 
> second.  making this change is certainly not worth it if merges are 
> running fast...

Note that in cold-cache cases, all the expense of read-tree is in actually
reading the tree objects themselves (a kernel tree has more than a
thousand subdirectories). Also, a full "git pull" will do the diffstat 
etc, and then the expense ends up being in the actual "git diff" part.

So read-tree itself may be half a second, but a merge ends up having other 
parts.

> they are still read-only with my linked list implementation.

Btw, in the sparse project, we have this really smart "pointer list" data
structure, which is extremely space- and time-efficient. It ends up
_looking_ like a linked list, but it batches things up in hunks of 29
entries (29 pointers plus overhead gives you blocks of 32 longwords, which
is the allocation size) and thus gives basically a cache-friendly
doubly-linked list. It knows how to do insertions, traversals etc very 
efficiently.

Any interest?

		Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 18:35               ` Linus Torvalds
@ 2005-09-08 18:58                 ` Chuck Lever
  2005-09-08 20:59                   ` Linus Torvalds
  2005-09-09  7:44                   ` [RFH] Merge driver Junio C Hamano
  0 siblings, 2 replies; 40+ messages in thread
From: Chuck Lever @ 2005-09-08 18:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2301 bytes --]

Linus Torvalds wrote:
> 
> On Thu, 8 Sep 2005, Chuck Lever wrote:
> 
>>in my case the merges were taking significantly longer than a half 
>>second.  making this change is certainly not worth it if merges are 
>>running fast...
> 
> 
> Note that in cold-cache cases, all the expense of read-tree is in actually
> reading the tree objects themselves (a kernel tree has more than a
> thousand subdirectories). Also, a full "git pull" will do the diffstat 
> etc, and then the expense ends up being in the actual "git diff" part.
> 
> So read-tree itself may be half a second, but a merge ends up having other 
> parts.

i measured this using the following test...

i have a linux kernel git repository under control of stgit and it has 
about 70 patches in it.  i did an "stg status" to heat the page cache. 
i popped all the patches, then did an "stg pull origin".

i started oprofile, and did an "stg push -a".  it took about 9 minutes.

i stopped oprofile and looked at the results.  roughly 65% of the total 
CPU time was spent in libc:memmove.  after instrumenting git, i 
determined that in the "stg push" case the critical memmove was the one 
in add_cache_entry.

note that i'm not saying that the 9 minutes of wall clock time was 
entirely due to CPU... catalin has been steadily improving "stg push" so 
this time has shortened by more than half, recently.  but i do notice 
that working in a kernel repository is significantly slower than working 
in my git or stgit source repositories, which are smaller by far.  the 
small repositories behave just as i expect, the tools respond quite 
snappily.

>>they are still read-only with my linked list implementation.
> 
> Btw, in the sparse project, we have this really smart "pointer list" data
> structure, which is extremely space- and time-efficient. It ends up
> _looking_ like a linked list, but it batches things up in hunks of 29
> entries (29 pointers plus overhead gives you blocks of 32 longwords, which
> is the allocation size) and thus gives basically a cache-friendly
> doubly-linked list. It knows how to do insertions, traversals etc very 
> efficiently.
> 
> Any interest?

i'm not married to splay trees.  i think we should explore several 
different data structures before picking one, and this one sounds 
reasonable to try.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 15:27     ` Daniel Barkalow
@ 2005-09-08 20:05       ` Fredrik Kuivinen
  2005-09-08 21:27         ` Daniel Barkalow
  0 siblings, 1 reply; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-08 20:05 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Fredrik Kuivinen, git, junkio

On Thu, Sep 08, 2005 at 11:27:35AM -0400, Daniel Barkalow wrote:

...

> > The two cases my algorithm merges cleanly and git-resolve-script do
> > not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
> > 0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
> > ancestors. The second one have, as far as I know, not been tested with
> > your read-tree.
> 
> Okay, I'll have to check whether the result I get seems right. I take it 
> your result agrees with what the users actually produced by hand?
 
The first one agrees with what was actually committed. For the second
one the difference between the tree produced by the algorithm and what
was committed is:

diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h
--- a/include/net/ieee80211.h
+++ b/include/net/ieee80211.h
@@ -425,9 +425,7 @@ struct ieee80211_stats {
 
 struct ieee80211_device;
 
-#if 0 /* for later */
 #include "ieee80211_crypt.h"
-#endif
 
 #define SEC_KEY_1         (1<<0)
 #define SEC_KEY_2         (1<<1)


I have looked at the files and common ancestors involved and I think
that this change have been introduced manually. I may have missed
something when I analysed it though...


> > The merge cases reported by Tony Luck and Len Brown are both cleanly
> > merged by my code.
> 
> Do they come out correctly? Both of those have cases which cannot be 
> decided correctly with only the ancestor trees, due to one branch 
> reverting a patch that was only in one ancestor. The correct result is to 
> revert that patch, but figuring out that requires looking at more trees. I 
> think your algorithm should work for this case, but it would be good to 
> have verification. (IIRC, Len got the correct result while Tony got the 
> wrong result and then corrected it later.)

Len's merge case come out identically to the tree he committed. I have
described what I got for Tony's case in
<20050826184731.GA13629@c165.ib.student.liu.se> (my merge algorithm
produces the result Tony expected to get, but he didn't get that from
git-resolve-script).

- Fredrik

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-07 18:36 ` Junio C Hamano
       [not found]   ` <431F34FF.5050301@citi.umich.edu>
@ 2005-09-08 20:54   ` Junio C Hamano
  2005-09-08 21:23     ` A Large Angry SCM
  1 sibling, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2005-09-08 20:54 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> ... I'd like to leave what
> merge strategy to use as an user option, and leave the door open
> for other merge strategies to emerge later.  I know Pasky wants
> to look into pcdv merge and other alternatives.
>
> This is still off-the-top-of-my-head, but the top-level merge
> entry point for the end user would be just:
>
>     git merge <head> <remote> <merge-message>
>
> and by default 'git-merge-script' (sorry, I am taking over the
> name of your script for this 'generic driver for underlying
> merge strategy scripts') would do something like:

And here is "an illustration of concept" patch, requesting for
comments.  It probably is buggy as h*ll, but I am showing
interfaces early, hoping this is something people can agree on
to use to plug various merge strategies into.

The patch uses 'git-show-branches --merge-base' as an improved
'git-merge-base -a' that can do more than two heads, which still
only exists in my development repository, but I hope you get the
idea.

------------
Subject: [PATCH] Multi-backend merge driver.

This is just an illustration of concept patch.

The new command 'git merge' takes the current head and one or more
remote heads, with the commit log message for the automated case.

If the heads being merged are simple fast-forwards, it acts the
same way as the current 'git resolve'.  Otherwise, it tries
different merge strategies and takes the result from the one that
succeeded auto-merging, if there is any.

If no merge strategy succeeds auto-merging, their results are
evaluated for number of paths needed for hand resolving, and the
one with the least number of such paths is left in the working
tree.  The user is asked to resolve them by hand and make a
commit manually.

The calling convention from the 'git merge' driver to merge
strategy programs is very simple:

 - A strategy program is to be called 'git-merge-<strategy>'.

 - They take input of this form:

	<common1> <common2> ... '--' <head> <remote1> <remote2>...

   That is, one or more the common ancestors, double dash, the
   current head, and one or more remote heads being merged into
   the current branch.

 - Before a strategy program is called, the working tree is
   matched to the current <head>.

 - The strategy program exits with status code 0 when it
   successfully auto-merges the given heads.  It should do
   update-cache for all the merged paths when it does so -- the
   index file will be used to record the merge result as a
   commit by the driver.

 - The strategy program exits with status code 1 when it leaves
   conflicts behind.  It should do update-cache for all the
   merged paths that it successfully auto-merged, and leave the
   cache entry in the index file as the same as <head> for paths
   it could not auto-merge, and leave its best-effort result
   with conflict markers in the working tree when it does so.

 - The strategy program exists with status code other than 0 or
   1 if it does not handle the given merge at all.

As examples, this commit comes with merge strategies based on
'git resolve' and 'git octopus'.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 git-commit.sh        |    2 
 git-merge-octopus.sh |   86 +++++++++++++++++++++
 git-merge-resolve.sh |   80 ++++++++++++++++++++
 git-merge.sh         |  205 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 372 insertions(+), 1 deletions(-)
 create mode 100755 git-merge-octopus.sh
 create mode 100755 git-merge-resolve.sh
 create mode 100755 git-merge.sh

a65a614ce7f67c3ed4dee3bed3ab4d86d3f79577
diff --git a/git-commit.sh b/git-commit.sh
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -158,7 +158,7 @@ if [ ! -r "$GIT_DIR/HEAD" ]; then
 	PARENTS=""
 else
 	if [ -f "$GIT_DIR/MERGE_HEAD" ]; then
-		PARENTS="-p HEAD -p MERGE_HEAD"
+		PARENTS="-p HEAD "`sed -e 's/^/-p /' "$GIT_DIR/MERGE_HEAD"`
 	fi
 	if test "$use_commit" != ""
 	then
diff --git a/git-merge-octopus.sh b/git-merge-octopus.sh
new file mode 100755
--- /dev/null
+++ b/git-merge-octopus.sh
@@ -0,0 +1,86 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+# Resolve two or more trees.
+#
+
+# The first parameters up to -- are merge bases; the rest are heads.
+bases= head= remotes= sep_seen=
+for arg
+do
+	case ",$sep_seen,$head,$arg," in
+	*,--,)
+		sep_seen=yes
+		;;
+	,yes,,*)
+		head=$arg
+		;;
+	,yes,*)
+		remotes="$remotes$arg "
+		;;
+	*)
+		bases="$bases$arg "
+		;;
+	esac
+done
+
+# Reject if this is not an Octopus -- resolve should be used instead.
+case "$remotes" in
+?*' '?*)
+	;;
+*)
+	exit 2 ;;
+esac
+
+# MRC is the current "merge reference commit"
+# MRT is the current "merge result tree"
+
+MRC=$head MSG= PARENT="-p $head"
+MRT=$(git-write-tree)
+CNT=1 ;# counting our head
+NON_FF_MERGE=0
+for SHA1 in $remotes
+do
+	common=$(git-merge-base $MRC $SHA1) ||
+		die "Unable to find common commit with $SHA1"
+
+	if test "$common" = $SHA1
+	then
+		echo "Already up-to-date with $SHA1"
+		continue
+	fi
+
+	CNT=`expr $CNT + 1`
+	PARENT="$PARENT -p $SHA1"
+
+	if test "$common,$NON_FF_MERGE" = "$MRC,0"
+	then
+		# The first head being merged was a fast-forward.
+		# Advance MRC to the head being merged, and use that
+		# tree as the intermediate result of the merge.
+		# We still need to count this as part of the parent set.
+
+		echo "Fast forwarding to: $SHA1"
+		git-read-tree -u -m $head $SHA1 || exit
+		MRC=$SHA1 MRT=$(git-write-tree)
+		continue
+	fi
+
+	NON_FF_MERGE=1
+
+	echo "Trying simple merge with $SHA1"
+	git-read-tree -u -m $common $MRT $SHA1 || exit 2
+	next=$(git-write-tree 2>/dev/null)
+	if test $? -ne 0
+	then
+		echo "Simple merge did not work, trying automatic merge."
+		git-merge-index -o git-merge-one-file -a ||
+		exit 2 ; # Automatic merge failed; should not be doing Octopus
+		next=$(git-write-tree 2>/dev/null)
+	fi
+	MRC=$common
+	MRT=$next
+done
+
+exit 0
diff --git a/git-merge-resolve.sh b/git-merge-resolve.sh
new file mode 100755
--- /dev/null
+++ b/git-merge-resolve.sh
@@ -0,0 +1,80 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Linus Torvalds
+#
+# Resolve two trees.
+
+# The first parameters up to -- are merge bases; the rest are heads.
+bases= head= remotes= sep_seen=
+for arg
+do
+	case ",$sep_seen,$head,$arg," in
+	*,--,)
+		sep_seen=yes
+		;;
+	,yes,,*)
+		head=$arg
+		;;
+	,yes,*)
+		remotes="$remotes$arg "
+		;;
+	*)
+		bases="$bases$arg "
+		;;
+	esac
+done
+
+# Give up if we are given more than two remotes -- not handling octopus.
+case "$remotes" in
+?*' '?*)
+	exit 2 ;;
+esac
+
+# Find an optimum merge base if there are more than one candidates.
+case "$bases" in
+?*' '?*)
+	echo "Trying to find the optimum merge base."
+	G=.tmp-index$$
+	best=
+	best_cnt=-1
+	for c in $bases
+	do
+		rm -f $G
+		GIT_INDEX_FILE=$G git-read-tree -m $c $head $remotes \
+			 2>/dev/null ||	continue
+		# Count the paths that are unmerged.
+		cnt=`GIT_INDEX_FILE=$G git-ls-files --unmerged | wc -l`
+		if test $best_cnt -le 0 -o $cnt -le $best_cnt
+		then
+			best=$c
+			best_cnt=$cnt
+			if test "$best_cnt" -eq 0
+			then
+				# Cannot do any better than all trivial merge.
+				break
+			fi
+		fi
+	done
+	rm -f $G
+	common="$best"
+	;;
+*)
+	common="$bases"
+	;;
+esac
+
+git-update-index --refresh 2>/dev/null
+git-read-tree -u -m $common $head $remotes || exit 2
+echo "Trying simple merge."
+if result_tree=$(git-write-tree  2>/dev/null)
+then
+	exit 0
+else
+	echo "Simple merge failed, trying Automatic merge."
+	if git-merge-index -o git-merge-one-file -a
+	then
+		exit 0
+	else
+		exit 1
+	fi
+fi
diff --git a/git-merge.sh b/git-merge.sh
new file mode 100755
--- /dev/null
+++ b/git-merge.sh
@@ -0,0 +1,205 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+. git-sh-setup || die "Not a git archive"
+
+LF='
+'
+
+usage () {
+    die "git-merge [-n] [-s <strategy>]... <merge-message> <head> <remote>+"
+}
+
+# all_strategies='resolve barkarow fredrik octopus'
+
+all_strategies='resolve octopus'
+default_strategies='resolve octopus'
+use_strategies=
+
+dropheads() {
+	rm -f -- "$GIT_DIR/MERGE_HEAD" || exit 1
+}
+
+summary() {
+	case "$no_summary" in
+	'')
+		git-diff-tree -p -M $head "$1" |
+		git-apply --stat --summary
+		;;
+	esac
+}
+
+while case "$#" in 0) break ;; esac
+do
+	case "$1" in
+	-n|--n|--no|--no-|--no-s|--no-su|--no-sum|--no-summ|\
+		--no-summa|--no-summar|--no-summary)
+		no_summary=t ;;
+	-s=*|--s=*|--st=*|--str=*|--stra=*|--strat=*|--strate=*|\
+		--strateg=*|--strategy=*|\
+	-s|--s|--st|--str|--stra|--strat|--strate|--strateg|--strategy)
+		case "$#,$1" in
+		*,*=*)
+			strategy=`expr "$1" : '-[^=]*=\(.*\)'` ;;
+		0,*)
+			usage ;;
+		*)
+			strategy="$2"
+			shift ;;
+		esac
+		case " $all_strategies " in
+		*" $strategy "*)
+			use_strategies="$use_strategies$strategy " ;;
+		*)
+			die "available strategies are: $all_strategies" ;;
+		esac
+		;;
+	-*)	usage ;;
+	*)	break ;;
+	esac
+	shift
+done
+
+case "$use_strategies" in
+'')
+	use_strategies=$default_strategies
+	;;
+esac
+test "$#" -le 2 && usage ;# we need at least two heads.
+
+merge_msg="$1"
+shift
+head=$(git-rev-parse --verify "$1"^0) || usage
+shift
+
+# All the rest are remote heads
+for remote
+do
+	git-rev-parse --verify "$remote'^0 >/dev/null ||
+	    die "$remote - not something we can merge"
+done
+
+common=$(git-show-branch --merge-base $head "$@")
+echo "$head" >"$GIT_DIR/ORIG_HEAD"
+
+case "$#,$common" in
+*,'')
+	die "Unable to find common commit between $head and $*"
+	;;
+1,"$1")
+	# If head can reach all the merge then we are up to date.
+	# but first the most common case of merging one remote
+	echo "Already up-to-date. Yeeah!"
+	dropheads
+	exit 0
+	;;
+1,"$head")
+	# Again the most common case of merging one remote.
+	echo "Updating from $head to $1."
+	git-update-index --refresh 2>/dev/null
+	git-read-tree -u -m $head "$1" || exit 1
+	echo "$1" > "$GIT_DIR/HEAD"
+	summary "$1"
+	dropheads
+	exit 0
+	;;
+1,*)
+	# We are not doing octopus and not fast forward.  Need a
+	# real merge.
+	;;
+*)
+	# An octopus.  If we can reach all the remote we are up to date.
+	up_to_date=t
+	for remote
+	do
+		common_one=$(git-merge-base $head $remote)
+		if test "$common_one" != "$remote"
+		then
+			up_to_date=f
+			break
+		fi
+	done
+	if test "$up_to_date" = t
+	then
+		echo "Already up-to-date. Yeeah!"
+		dropheads
+		exit 0
+	fi
+	;;
+esac
+
+# At this point we need a real merge.  Require that the tree matches
+# exactly our head.
+
+git-update-index --refresh &&
+test '' = `git-diff-index --cache --name-only $head` || {
+	die "Need real merge but the working tree has local changes."
+}
+
+result_tree= best_cnt=-1 best_strategy= wt_strategy=
+for strategy in $use_strategies
+do
+    test "$wt_strategy" = '' || git reset --hard $head
+
+    echo "Trying merge strategy $strategy..."
+    wt_strategy=$strategy
+    git-merge-$strategy $common -- $head "$@" || {
+
+	# The backend exits with 1 when conflicts are left to be resolved,
+	# with 2 when it does not handle the given merge at all.
+
+	case "$?" in
+	1)
+	    cnt=`git-diff-files --name-only | wc -l`
+	    if test $best_cnt -le 0 -o $cnt -le $best_cnt
+	    then
+		best_strategy=$strategy
+		best_cnt=$cnt
+	    fi
+	esac
+	continue
+    }
+
+    # Automerge succeeded.
+    result_tree=$(git-write-tree) && break
+done
+
+# If we have a resulting tree, that means the strategy module
+# auto resolved the merge cleanly.
+if test '' != $result_tree
+then
+    parents="-p $head"
+    for remote
+    do
+        parents="$parents -p $remote"
+    done
+    result_commit=$(echo "$merge_msg" | git-commit-tree $result_tree $parents)
+    echo "Committed merge $result_commit, made by $wt_strategy."
+    echo $result_commit >"$GIT_DIR"/HEAD
+    summary $result_commit
+    dropheads
+    exit 0
+fi
+
+# Pick the result from the best strategy and have the user fix it up.
+case "$best_strategy" in
+'')
+	git reset --hard $head
+	die "No merge strategy handled the merge."
+	;;
+"$wt_strategy")
+	# We already have its result in the working tree.
+	;;
+*)
+	echo "Using the $strategy to prepare resolving by hand."
+	git reset --hard $head
+	git-merge-$best_strategy $common -- $head "$@"
+	;;
+esac
+for remote
+do
+	echo $remote
+done >"$GIT_DIR/MERGE_HEAD"
+die "Automatic merge failed; fix up by hand"

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 18:58                 ` Chuck Lever
@ 2005-09-08 20:59                   ` Linus Torvalds
  2005-09-09  7:44                   ` [RFH] Merge driver Junio C Hamano
  1 sibling, 0 replies; 40+ messages in thread
From: Linus Torvalds @ 2005-09-08 20:59 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Junio C Hamano, git



On Thu, 8 Sep 2005, Chuck Lever wrote:
> > 
> > Btw, in the sparse project, we have this really smart "pointer list" data
> > structure, which is extremely space- and time-efficient. It ends up
> > _looking_ like a linked list, but it batches things up in hunks of 29
> > entries (29 pointers plus overhead gives you blocks of 32 longwords, which
> > is the allocation size) and thus gives basically a cache-friendly
> > doubly-linked list. It knows how to do insertions, traversals etc very 
> > efficiently.
> > 
> > Any interest?
> 
> i'm not married to splay trees.  i think we should explore several 
> different data structures before picking one, and this one sounds 
> reasonable to try.

Actually, I just realized that one of the issues is just plain looking up
a name (ie "pos = cache_name_pos(..)", and the sparse list don't make that
very efficient unless you have a good starting point.

So a splay tree is probably more appropriate.

			Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 20:54   ` [PATCH 0/2] A new merge algorithm, take 3 Junio C Hamano
@ 2005-09-08 21:23     ` A Large Angry SCM
  0 siblings, 0 replies; 40+ messages in thread
From: A Large Angry SCM @ 2005-09-08 21:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, git

Junio C Hamano wrote:
[...]
> Subject: [PATCH] Multi-backend merge driver.
> 
> This is just an illustration of concept patch.
> 
> The new command 'git merge' takes the current head and one or more
> remote heads, with the commit log message for the automated case.
> 
> If the heads being merged are simple fast-forwards, it acts the
> same way as the current 'git resolve'.  Otherwise, it tries
> different merge strategies and takes the result from the one that
> succeeded auto-merging, if there is any.
> 
> If no merge strategy succeeds auto-merging, their results are
> evaluated for number of paths needed for hand resolving, and the
> one with the least number of such paths is left in the working
> tree.  The user is asked to resolve them by hand and make a
> commit manually.
[...]

This last paragraph concerns me because it limits us to a very simple 
metric, "number of paths needed for hand resolving". Some "hand 
resolving" situations can be more complex than others and it would nice 
to use that information if it was available.

Of course I have no idea what that information may be or how it can be 
computed.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/2] A new merge algorithm, take 3
  2005-09-08 20:05       ` Fredrik Kuivinen
@ 2005-09-08 21:27         ` Daniel Barkalow
  0 siblings, 0 replies; 40+ messages in thread
From: Daniel Barkalow @ 2005-09-08 21:27 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git, junkio

On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

> The first one agrees with what was actually committed. For the second
> one the difference between the tree produced by the algorithm and what
> was committed is:
> 
> diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h
> --- a/include/net/ieee80211.h
> +++ b/include/net/ieee80211.h
> @@ -425,9 +425,7 @@ struct ieee80211_stats {
>  
>  struct ieee80211_device;
>  
> -#if 0 /* for later */
>  #include "ieee80211_crypt.h"
> -#endif
>  
>  #define SEC_KEY_1         (1<<0)
>  #define SEC_KEY_2         (1<<1)
> 
> 
> I have looked at the files and common ancestors involved and I think
> that this change have been introduced manually. I may have missed
> something when I analysed it though...

Certainly possible that it was done manually.

> > > The merge cases reported by Tony Luck and Len Brown are both cleanly
> > > merged by my code.
> > 
> > Do they come out correctly? Both of those have cases which cannot be 
> > decided correctly with only the ancestor trees, due to one branch 
> > reverting a patch that was only in one ancestor. The correct result is to 
> > revert that patch, but figuring out that requires looking at more trees. I 
> > think your algorithm should work for this case, but it would be good to 
> > have verification. (IIRC, Len got the correct result while Tony got the 
> > wrong result and then corrected it later.)
> 
> Len's merge case come out identically to the tree he committed. I have
> described what I got for Tony's case in
> <20050826184731.GA13629@c165.ib.student.liu.se> (my merge algorithm
> produces the result Tony expected to get, but he didn't get that from
> git-resolve-script).

Good. It looks to me like this is a good algorithm in practice, then.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFH] Merge driver
  2005-09-08 18:58                 ` Chuck Lever
  2005-09-08 20:59                   ` Linus Torvalds
@ 2005-09-09  7:44                   ` Junio C Hamano
  2005-09-09 16:05                     ` Daniel Barkalow
                                       ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-09  7:44 UTC (permalink / raw)
  To: Fredrik Kuivinen, Daniel Barkalow, Linus Torvalds; +Cc: cel, git

I have several requests to people who are interested in merges
and read-tree changes.

I am pretty much set to use the recent read-tree updates Daniel
has been working on.  The only reason it has not hit the
"master" branch yet, except that it still has known leaks that
have not been plugged, is because read-tree is so fundamental to
everything we do, and I am trying to be extremely conservative
here.  I've beaten it myself reasonably well and have not found
any regressions (except removal of --emu23 which I believe
nobody uses anyway), but I'd appreciate people to try it out and
see if it performs well for your dataset.

If you are planning further surgery on read-tree code, please
base your changes on Daniel's rewrite to avoid your effort being
wasted.  This request goes both to Chuck (active_cache
abstraction) and Fredrik (addition of 'ignore index and working
tree matching rules' [*1*]).

A proposed merge driver 'git-merge' is in the proposed updates
branch.  This is intended to be the top-level user interface to
the merge machinery which drives multiple merge strategy
scripts, and I am hoping that I can eventually (1) retire
'git-resolve' and 'git-octopus' (they simply become merge
strategy scripts driven by 'git-merge') and (2) call 'git-merge'
from 'git-pull'.  What I have in the proposed updates branch has
been fixed since my earlier message to the list and has a new
merge strategy script, in addition to 'resolve' and 'octopus',
called 'git-merge-multibase'.  This uses Daniel's read-tree that
can use more than one merge bases.  I request Daniel to give OK
to its name or suggest a better name for this script -- I would
even accept 'git-merge-barkalow' if you want ;-).

If you are planning to implement a new merge strategy, please
use the ones in the proposed updates branch as examples, and
complain and suggest improvements if you find the interface
between the strategy scripts and the driver lacking.  This
request goes primarily to Fredrik.  I'm interested in doing the
renaming merge that would have helped HPA's klibc-kbuild vs
klibc case myself but if somebody else is so inclined please go
wild.

And finally, a request to everybody; please try out 'git-merge'
and see how you like it.

`git-merge` [-n] [-s <strategy>]... <msg> <head> <remote> <remote>...

-n::
	Do not show diffstat at the end of the merge.

-s <strategy>::
	use that merge strategy; can be given more than once to
	specify them in the order they should be tried.  If
	there is no `-s` option, built-in list of strategies is
	used instead.

<head>::
	our branch head commit.

<remote>::
	other branch head merged into our branch.  You need at
	least one <remote>.  Specifying more than one <remote>
	obviously means you are trying an Octopus.

Here is a sample transcript from a test resolving one of the
'more-than-one-merge-base' commits Fredrik found in the kernel
repository (": siamese;" is my $PS1; "  " is my $PS2).

    : siamese; git reset --hard b8112df71cae7d6a86158caeb19d215f56c4f9ab
    : siamese; git merge -n \
      'reproduce 0e396ee43e445cb7c215a98da4e76d0ce354d9d7' \
      HEAD 2089a0d38bc9c2cdd084207ebf7082b18cf4bf58
    Trying merge strategy resolve...
    Trying to find the optimum merge base.
    Trying simple merge.
    Simple merge failed, trying Automatic merge.
    Removing drivers/net/fmv18x.c
    Auto-merging drivers/net/r8169.c.
    merge: warning: conflicts during merge
    ERROR: Merge conflict in drivers/net/r8169.c.
    Removing drivers/net/sk_g16.c
    Removing drivers/net/sk_g16.h
    fatal: merge program failed
    Rewinding the tree to pristine...
    Trying merge strategy multibase...
    Trying simple merge.
    Simple merge failed, trying Automatic merge.
    Removing drivers/net/fmv18x.c
    Auto-merging drivers/net/r8169.c.
    merge: warning: conflicts during merge
    ERROR: Merge conflict in drivers/net/r8169.c.
    Removing drivers/net/sk_g16.c
    Removing drivers/net/sk_g16.h
    fatal: merge program failed
    Rewinding the tree to pristine...
    Trying merge strategy octopus...
    Rewinding the tree to pristine...
    Using the multibase to prepare resolving by hand.
    Trying simple merge.
    Simple merge failed, trying Automatic merge.
    Removing drivers/net/fmv18x.c
    Auto-merging drivers/net/r8169.c.
    merge: warning: conflicts during merge
    ERROR: Merge conflict in drivers/net/r8169.c.
    Removing drivers/net/sk_g16.c
    Removing drivers/net/sk_g16.h
    fatal: merge program failed
    Automatic merge failed; fix up by hand
    : siamese; git-update-cache --refresh
    drivers/net/r8169.c: needs update
    : siamese; echo $?
    1
    : siamese; git diff -r 0e396ee43e445cb7c215a98da4e76d0ce354d9d7
    :100644 100644 ce449fe 0000000 M	drivers/net/r8169.c
    : siamese; exit

In the above example, 'resolve', 'multibase' and 'octopus' are
tried in turn (actually, 'octopus' notices that it is given only
one <remote> and refuses to do anything without wasting time).
Since all of these strategies fail to fully auto-merge the given
heads, and 'multibase' gives the smallest number of conflicts,
its result is left in the working tree for the user to resolve
by hand.  You resolve this and commit the same way as you
currently do with 'git-resolve' that results in conflicts.


[Footnote]

*1* Fredrik, I have been wondering if we can just say that lack
of '-u' flag implies '-i'.  Is there a good reason that
'git-read-tree -m O A B' without '-u' should care if working
tree is up to date for the paths involved?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFH] Merge driver
  2005-09-09  7:44                   ` [RFH] Merge driver Junio C Hamano
@ 2005-09-09 16:05                     ` Daniel Barkalow
  2005-09-09 16:43                       ` Junio C Hamano
  2005-09-11  4:58                     ` Junio C Hamano
  2005-09-12 21:08                     ` Fredrik Kuivinen
  2 siblings, 1 reply; 40+ messages in thread
From: Daniel Barkalow @ 2005-09-09 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, Linus Torvalds, cel, git

On Fri, 9 Sep 2005, Junio C Hamano wrote:

> I have several requests to people who are interested in merges
> and read-tree changes.
> 
> I am pretty much set to use the recent read-tree updates Daniel
> has been working on.  The only reason it has not hit the
> "master" branch yet, except that it still has known leaks that
> have not been plugged, is because read-tree is so fundamental to
> everything we do, and I am trying to be extremely conservative
> here.  I've beaten it myself reasonably well and have not found
> any regressions (except removal of --emu23 which I believe
> nobody uses anyway), but I'd appreciate people to try it out and
> see if it performs well for your dataset.
> 
> If you are planning further surgery on read-tree code, please
> base your changes on Daniel's rewrite to avoid your effort being
> wasted.  This request goes both to Chuck (active_cache
> abstraction) and Fredrik (addition of 'ignore index and working
> tree matching rules' [*1*]).
> 
> A proposed merge driver 'git-merge' is in the proposed updates
> branch.  This is intended to be the top-level user interface to
> the merge machinery which drives multiple merge strategy
> scripts, and I am hoping that I can eventually (1) retire
> 'git-resolve' and 'git-octopus' (they simply become merge
> strategy scripts driven by 'git-merge') and (2) call 'git-merge'
> from 'git-pull'.  What I have in the proposed updates branch has
> been fixed since my earlier message to the list and has a new
> merge strategy script, in addition to 'resolve' and 'octopus',
> called 'git-merge-multibase'.  This uses Daniel's read-tree that
> can use more than one merge bases.  I request Daniel to give OK
> to its name or suggest a better name for this script -- I would
> even accept 'git-merge-barkalow' if you want ;-).

I'd actually been thinking it would just go into the the "resolve" driver, 
with that going back to before it chose among merge-base outputs and just 
sending the whole list to read-tree.

> If you are planning to implement a new merge strategy, please
> use the ones in the proposed updates branch as examples, and
> complain and suggest improvements if you find the interface
> between the strategy scripts and the driver lacking.  This
> request goes primarily to Fredrik.  I'm interested in doing the
> renaming merge that would have helped HPA's klibc-kbuild vs
> klibc case myself but if somebody else is so inclined please go
> wild.
> 
> And finally, a request to everybody; please try out 'git-merge'
> and see how you like it.
> 
> `git-merge` [-n] [-s <strategy>]... <msg> <head> <remote> <remote>...
> 
> -n::
> 	Do not show diffstat at the end of the merge.
> 
> -s <strategy>::
> 	use that merge strategy; can be given more than once to
> 	specify them in the order they should be tried.  If
> 	there is no `-s` option, built-in list of strategies is
> 	used instead.
> 
> <head>::
> 	our branch head commit.
> 
> <remote>::
> 	other branch head merged into our branch.  You need at
> 	least one <remote>.  Specifying more than one <remote>
> 	obviously means you are trying an Octopus.
> 
> Here is a sample transcript from a test resolving one of the
> 'more-than-one-merge-base' commits Fredrik found in the kernel
> repository (": siamese;" is my $PS1; "  " is my $PS2).
> 
>     : siamese; git reset --hard b8112df71cae7d6a86158caeb19d215f56c4f9ab
>     : siamese; git merge -n \
>       'reproduce 0e396ee43e445cb7c215a98da4e76d0ce354d9d7' \
>       HEAD 2089a0d38bc9c2cdd084207ebf7082b18cf4bf58
>     Trying merge strategy resolve...
>     Trying to find the optimum merge base.
>     Trying simple merge.
>     Simple merge failed, trying Automatic merge.
>     Removing drivers/net/fmv18x.c
>     Auto-merging drivers/net/r8169.c.
>     merge: warning: conflicts during merge
>     ERROR: Merge conflict in drivers/net/r8169.c.
>     Removing drivers/net/sk_g16.c
>     Removing drivers/net/sk_g16.h
>     fatal: merge program failed
>     Rewinding the tree to pristine...
>     Trying merge strategy multibase...
>     Trying simple merge.
>     Simple merge failed, trying Automatic merge.
>     Removing drivers/net/fmv18x.c
>     Auto-merging drivers/net/r8169.c.
>     merge: warning: conflicts during merge
>     ERROR: Merge conflict in drivers/net/r8169.c.
>     Removing drivers/net/sk_g16.c
>     Removing drivers/net/sk_g16.h
>     fatal: merge program failed
>     Rewinding the tree to pristine...
>     Trying merge strategy octopus...
>     Rewinding the tree to pristine...
>     Using the multibase to prepare resolving by hand.
>     Trying simple merge.
>     Simple merge failed, trying Automatic merge.
>     Removing drivers/net/fmv18x.c
>     Auto-merging drivers/net/r8169.c.
>     merge: warning: conflicts during merge
>     ERROR: Merge conflict in drivers/net/r8169.c.
>     Removing drivers/net/sk_g16.c
>     Removing drivers/net/sk_g16.h
>     fatal: merge program failed
>     Automatic merge failed; fix up by hand
>     : siamese; git-update-cache --refresh
>     drivers/net/r8169.c: needs update
>     : siamese; echo $?
>     1
>     : siamese; git diff -r 0e396ee43e445cb7c215a98da4e76d0ce354d9d7
>     :100644 100644 ce449fe 0000000 M	drivers/net/r8169.c
>     : siamese; exit
> 
> In the above example, 'resolve', 'multibase' and 'octopus' are
> tried in turn (actually, 'octopus' notices that it is given only
> one <remote> and refuses to do anything without wasting time).
> Since all of these strategies fail to fully auto-merge the given
> heads, and 'multibase' gives the smallest number of conflicts,
> its result is left in the working tree for the user to resolve
> by hand.  You resolve this and commit the same way as you
> currently do with 'git-resolve' that results in conflicts.

This is no good: the current 'resolve' can generate wrong results and 
report that it worked cleanly, while 'multibase' would report a conflict 
because it isn't ignoring a real problem. My primary goal in doing the 
multibase version wasn't to produce more clean merges; it was to produce 
fewer clean-but-wrong merges.

> [Footnote]
> 
> *1* Fredrik, I have been wondering if we can just say that lack
> of '-u' flag implies '-i'.  Is there a good reason that
> 'git-read-tree -m O A B' without '-u' should care if working
> tree is up to date for the paths involved?

It tries to make sure that there is room to put stuff for resolving a 
conflict without messing with modified files in the directory.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFH] Merge driver
  2005-09-09 16:05                     ` Daniel Barkalow
@ 2005-09-09 16:43                       ` Junio C Hamano
  2005-09-09 17:25                         ` Daniel Barkalow
  0 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2005-09-09 16:43 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Fredrik Kuivinen, Linus Torvalds, cel, git

Daniel Barkalow <barkalow@iabervon.org> writes:

> I'd actually been thinking it would just go into the the "resolve" driver, 
> with that going back to before it chose among merge-base outputs and just 
> sending the whole list to read-tree.
>
> This is no good: the current 'resolve' can generate wrong results and 
> report that it worked cleanly, while 'multibase' would report a conflict 
> because it isn't ignoring a real problem. My primary goal in doing the 
> multibase version wasn't to produce more clean merges; it was to produce 
> fewer clean-but-wrong merges.

True.  Before 'git-merge' hits the "master" branch I should
remove 'git-merge-resolve' from the strategies list (or rename
'git-merge-multibase' to it).  I have them separately only
because I wanted to be able to see how differently they perform
by saying:

    git merge -s resolve blah...
    git merge -s multibase blah...

>> *1* Fredrik, I have been wondering if we can just say that lack
>> of '-u' flag implies '-i'.  Is there a good reason that
>> 'git-read-tree -m O A B' without '-u' should care if working
>> tree is up to date for the paths involved?
>
> It tries to make sure that there is room to put stuff for resolving a 
> conflict without messing with modified files in the directory.

I agree it can be used that way, but nobody seems to use it for
that purpose as far as I can tell hence my earlier comment.  But
let's leave the door open by having them as independent
options.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFH] Merge driver
  2005-09-09 16:43                       ` Junio C Hamano
@ 2005-09-09 17:25                         ` Daniel Barkalow
  0 siblings, 0 replies; 40+ messages in thread
From: Daniel Barkalow @ 2005-09-09 17:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, Linus Torvalds, cel, git

On Fri, 9 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > It tries to make sure that there is room to put stuff for resolving a 
> > conflict without messing with modified files in the directory.
> 
> I agree it can be used that way, but nobody seems to use it for
> that purpose as far as I can tell hence my earlier comment.  But
> let's leave the door open by having them as independent
> options.

Ah, okay. I hadn't realized that resolve used -u for that call to 
read-tree. You're entirely right.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Unified merge driver pushed out to "master" branch
  2005-09-07 16:50 ` [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory Fredrik Kuivinen
@ 2005-09-11  2:54   ` Junio C Hamano
  2005-09-11 21:05     ` Fredrik Kuivinen
  2005-09-14  5:56     ` Another merge test case from the kernel tree Junio C Hamano
  0 siblings, 2 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-11  2:54 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git, Daniel Barkalow

I've pushed out the 'git merge' along with Daniel's read-tree
updates and your 'base on merge of bases' merge strategy driver
in the "master" branch.

The last one is what I butchered your code without really
auditing the changes I made, so I'd appreciate you to take a
look at it and supply any necessary updates and fixes.  My
changes are meant to be only to adjust the calling convention to
that of 'git merge' -- that is, it gets merge bases and heads
from the command line, and it is not supposed to commit and
update HEAD itself, but is supposed to leave its best effort in
the working tree and the paths that cannot be automerged should
be left as that of HEAD version in the index file, so that the
user can run 'git diff' to see what is happening while resolving
things by hand.  I do not think my change did the last part
correctlyx.  The change to introduce '-i' option to read-tree is
also present.

I primarily used the commits found in the Linux 2.6 repository
and its derivatives that have multiple potential merge bases for
my testing.  Among them, the attempt to merge parents of this
commit triggered an assertion failure -- it may be my fault but
I have not spent much time to diagnose it.

commit 6358924b06cd1aaa031b8ba0c33e5a15e795bef0
Merge: 1da21e73bdc458f8131e6071072632b4c3b0430f 344a076110f4ecb16ea6d286b63be696604982ed
Author: Tony Luck <tony.luck@intel.com>
Date:   Thu Sep 8 14:29:32 2005 -0700

    Pull release into test branch

----------------------------------------------------------------
Here is a summary of my test results.

"Expect" names the commit being tested.  The parents of each of
these commits have more than one merge-base, and the test is to
start from one parent of it, and merge the other parent into it
using the four merge programs:

 * traditional is 'git-resolve-script' from GIT 0.99.6 (it does
   not even use Daniel's read-tree so that I can catch
   regression in read-tree).

 * dbrt is 'git-merge -s resolve', i.e. Daniel's multi-base merge.

 * stupid is 'git-merge -s stupid', supposed to be the same
   as 'git-resolve' (but internally uses Daniel's read-tree with
   only one merge base).

 * fredrik is 'git-merge -s fredrik', the one I butchered.

------------------------------------------------
Expect 0c168775709faa74c1b87f1e61046e0c51ade7f3
Method 'traditional' failed to automerge.
Method 'dbrt' failed to automerge.
Method 'stupid' failed to automerge.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

The fredrik is the only one that merged successfully, but its
result is different from the actual commit.

:100644 100644 065b702df56353a00d5f460cf385f172decccf2b cd4222ff8b925f6b92414d58729d225fccc3f310 M	include/net/ieee80211.h

------------------------------------------------
Expect 0c3e091838f02c537ccab3b6e8180091080f7df2 (case #16)
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' failed to automerge.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

The fredrik result is different from the actual commit.

:100644 100644 a7bed60b69f9e8de9a49944e22d03fb388ae93c7 51a7b7b4dd0e7c5720683a40637cdb79a31ec4c4 M	arch/ia64/hp/sim/boot/bootloader.c

------------------------------------------------
Expect 0e396ee43e445cb7c215a98da4e76d0ce354d9d7
Method 'traditional' failed to automerge.
Method 'dbrt' failed to automerge.
Method 'stupid' failed to automerge.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

The fredrik is the only one that merged successfully, and its
result matches the actual commit.

------------------------------------------------
Expect 1960ff70a6c1db8e5f8fa9ec7020ef585f8ce706
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

All results match the actual commit.

------------------------------------------------
Expect 19925d7e0af7de645d808fd973ef99049fce52f0
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

All results match the actual commit.

------------------------------------------------
Expect 3190186362466658f01b2e354e639378ce07e1a9
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

All results match the actual commit.

------------------------------------------------
Expect 467ca22d3371f132ee225a5591a1ed0cd518cb3d
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

All results match the actual commit.

------------------------------------------------
Expect 539aeb871b52233b189ae976dfded20e14db645a
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

All results match the actual commit.

------------------------------------------------
Expect 6358924b06cd1aaa031b8ba0c33e5a15e795bef0
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' failed to automerge.

Fredrik dies on asserttion failure.

------------------------------------------------
Expect 84ffa747520edd4556b136bdfc9df9eb1673ce12 (case #16)
Method 'traditional' failed to automerge.
Method 'dbrt' failed to automerge.
Method 'stupid' failed to automerge.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

The fredrik result matches the actual commit.

------------------------------------------------
Expect a855f9a4d29f0ec338c337358844389171b0bae0
Method 'traditional' failed to automerge.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' failed to automerge.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

Both fredrik and dbrt results match the actual commit.

------------------------------------------------
Expect cf70be167c085af820c0c2a606cab8c3ee819dc6
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

All results match the actual commit.

------------------------------------------------
Expect da28c12089dfcfb8695b6b555cdb8e03dda2b690
Method 'traditional' automerged
Method 'traditional' resolved cleanly.
Method 'dbrt' automerged
Method 'dbrt' resolved cleanly.
Method 'stupid' automerged
Method 'stupid' resolved cleanly.
Method 'fredrik' automerged
Method 'fredrik' resolved cleanly.

All results match the actual commit.

------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFH] Merge driver
  2005-09-09  7:44                   ` [RFH] Merge driver Junio C Hamano
  2005-09-09 16:05                     ` Daniel Barkalow
@ 2005-09-11  4:58                     ` Junio C Hamano
  2005-09-12 21:08                     ` Fredrik Kuivinen
  2 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-11  4:58 UTC (permalink / raw)
  To: git

Some people may be alarmed and feel concerned that I already
have Daniel's and Fredrik's merge in the "master" branch; I feel
this deserves some clarification.

The change Daniel made is a natural enhancement of an existing
program.  It spent unusually long time in the proposed update
queue exactly because read-tree was so fundamental, and I wanted
to make sure there is no serious regression.  Most importantly,
the original merge mechanism, git-resolve, is not dropped, and
'git-pull' uses that, not 'git-merge', after fetching a remote
head.  So for everyday use, nothing has changed.

I did the 'git-merge' strategy backend mechanism because I
wanted to play it safe.  It makes it easy to give the set of
conservative strategies as default for every day use, while
allowing us to try more experimental strategies which can be
used only by an explicit user request.  Introduction of
additional merge strategies, such as the Fredrik merge or
rename-detect merge which is yet to be written, is in a sense
similar to adding new filesystems to the stable kernel tree --
as long as they stay optional and do not touch the core deeply,
they tend to be safer changes than other kinds, and we can add
them more or less anytime in the stable series without impacting
stability.

I intend to eventually update git-pull to use git-merge, and I
am hoping that I can do so before we hit 1.0, once we are sure
that 'git-merge -s stupid' behaves identically as 'git-resolve',
and 'git-merge -s resolve' fares no worse than 'git-resolve'.  I
personally feel I've done enough tests to ensure that already,
but help from the community is as always appreciated.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Unified merge driver pushed out to "master" branch
  2005-09-11  2:54   ` Unified merge driver pushed out to "master" branch Junio C Hamano
@ 2005-09-11 21:05     ` Fredrik Kuivinen
  2005-09-12  1:23       ` Junio C Hamano
  2005-09-14  5:56     ` Another merge test case from the kernel tree Junio C Hamano
  1 sibling, 1 reply; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-11 21:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, git, Daniel Barkalow

On Sat, Sep 10, 2005 at 07:54:19PM -0700, Junio C Hamano wrote:
> I've pushed out the 'git merge' along with Daniel's read-tree
> updates and your 'base on merge of bases' merge strategy driver
> in the "master" branch.
> 
> The last one is what I butchered your code without really
> auditing the changes I made, so I'd appreciate you to take a
> look at it and supply any necessary updates and fixes.  My
> changes are meant to be only to adjust the calling convention to
> that of 'git merge' -- that is, it gets merge bases and heads
> from the command line, and it is not supposed to commit and
> update HEAD itself, but is supposed to leave its best effort in
> the working tree and the paths that cannot be automerged should
> be left as that of HEAD version in the index file, so that the
> user can run 'git diff' to see what is happening while resolving
> things by hand.  I do not think my change did the last part
> correctlyx.  The change to introduce '-i' option to read-tree is
> also present.

Your changes seems to be fine.

I am a bit confused about in what state the index should be in after a
non-clean merge.

I interpreted <7vmzn2prck.fsf@assigned-by-dhcp.cox.net> as follows:

The index should preferably contain unmerged entries after a merge if
the merge was non-clean. If, for example, the file 'a' is modified in
branch A and 'a' is also modified in branch 'B' in such a way that
merge(1) cannot merge those changes, then when we have merged A with B
git-ls-files --unmerged should list 'a' in its output.

However, according to my tests 'git merge -s resolve' doesn't do
this. (the old git-resolve-script didn't do this either) They both
update the unmerged index entries to the entries that was in HEAD,
which agrees with your description of how things are supposed to work
above.

The way the 'resolve' strategy do this seems to be more usable than
"leave unmerged entries in the cache". In particular, 'git diff' gives
a usable output in this case.

Should I change git-merge-fredrik.py to have the same behaviour as the
resolve strategy?

> I primarily used the commits found in the Linux 2.6 repository
> and its derivatives that have multiple potential merge bases for
> my testing.  Among them, the attempt to merge parents of this
> commit triggered an assertion failure -- it may be my fault but
> I have not spent much time to diagnose it.
> 
> commit 6358924b06cd1aaa031b8ba0c33e5a15e795bef0
> Merge: 1da21e73bdc458f8131e6071072632b4c3b0430f 344a076110f4ecb16ea6d286b63be696604982ed
> Author: Tony Luck <tony.luck@intel.com>
> Date:   Thu Sep 8 14:29:32 2005 -0700
> 
>     Pull release into test branch
> 

It was not your fault. I have a fix for it in my local tree, it will
be sent to the mailing list soon. The result match the actual commit
when this patch has been applied.


> ----------------------------------------------------------------
> Here is a summary of my test results.
> 
> "Expect" names the commit being tested.  The parents of each of
> these commits have more than one merge-base, and the test is to
> start from one parent of it, and merge the other parent into it
> using the four merge programs:
> 
>  * traditional is 'git-resolve-script' from GIT 0.99.6 (it does
>    not even use Daniel's read-tree so that I can catch
>    regression in read-tree).
> 
>  * dbrt is 'git-merge -s resolve', i.e. Daniel's multi-base merge.
> 
>  * stupid is 'git-merge -s stupid', supposed to be the same
>    as 'git-resolve' (but internally uses Daniel's read-tree with
>    only one merge base).
> 
>  * fredrik is 'git-merge -s fredrik', the one I butchered.
> 
> ------------------------------------------------
> Expect 0c168775709faa74c1b87f1e61046e0c51ade7f3
> Method 'traditional' failed to automerge.
> Method 'dbrt' failed to automerge.
> Method 'stupid' failed to automerge.
> Method 'fredrik' automerged
> Method 'fredrik' resolved cleanly.
> 
> The fredrik is the only one that merged successfully, but its
> result is different from the actual commit.
> 
> :100644 100644 065b702df56353a00d5f460cf385f172decccf2b cd4222ff8b925f6b92414d58729d225fccc3f310 M	include/net/ieee80211.h
> 

I believe my algorithm produces the correct result in this case. I
think the difference comes from manual edits to
include/net/ieee80211.h after the merge but before the commit.

This commit was mentioned in
<20050908200559.GA26088@c165.ib.student.liu.se>.


> ------------------------------------------------
> Expect 0c3e091838f02c537ccab3b6e8180091080f7df2 (case #16)
> Method 'traditional' automerged
> Method 'traditional' resolved cleanly.
> Method 'dbrt' failed to automerge.
> Method 'stupid' automerged
> Method 'stupid' resolved cleanly.
> Method 'fredrik' automerged
> Method 'fredrik' resolved cleanly.
> 
> The fredrik result is different from the actual commit.
> 
> :100644 100644 a7bed60b69f9e8de9a49944e22d03fb388ae93c7 51a7b7b4dd0e7c5720683a40637cdb79a31ec4c4 M	arch/ia64/hp/sim/boot/bootloader.c
> 

This is Tony Luck's test case, originally described in
<200508232256.j7NMuR1q027892@agluck-lia64.sc.intel.com>.

I reported the results for my algorithm on this case in
<20050826184731.GA13629@c165.ib.student.liu.se>. I think that the
result produced by my algorithm is the result which Tony expected to
get.


- Fredrik

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Unified merge driver pushed out to "master" branch
  2005-09-11 21:05     ` Fredrik Kuivinen
@ 2005-09-12  1:23       ` Junio C Hamano
  0 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-12  1:23 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git, Daniel Barkalow

Fredrik Kuivinen <freku045@student.liu.se> writes:

> I am a bit confused about in what state the index should be in after a
> non-clean merge.

Yeah, I was confused when I wrote it.  Sorry.

> The way the 'resolve' strategy do this seems to be more usable than
> "leave unmerged entries in the cache". In particular, 'git diff' gives
> a usable output in this case.

You are certainly correct about what resolve does, and most
likely right what the most usable output is from an end user's
point of view.  While merging, there _could_ be cases where we
would want to take a look at the current head of our head or the
current head of their head, but my reasoning that leaving those
entries unmerged in the index file _might_ help that (which was
the reason behind that 'leave them unmerged' comment) was quite
faulty.  We can say 'git-diff-cache HEAD' and 'git-diff-cache
MERGE_HEAD' with and without --cached for that.  I think the
current 'resolve' behaviour is the most useful one.

> This is Tony Luck's test case, originally described in
> <200508232256.j7NMuR1q027892@agluck-lia64.sc.intel.com>.
>
> I reported the results for my algorithm on this case in
> <20050826184731.GA13629@c165.ib.student.liu.se>. I think that the
> result produced by my algorithm is the result which Tony expected to
> get.

Yeah, I agree.  I might have sounded as if I was saying that the
automated merge must match the actual result otherwise it is
buggy, but that was not what I wanted to say -- rather we might
have found cases where the traditional 'git-resolve' did a wrong
thing which slipped through unnoticed.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFH] Merge driver
  2005-09-09  7:44                   ` [RFH] Merge driver Junio C Hamano
  2005-09-09 16:05                     ` Daniel Barkalow
  2005-09-11  4:58                     ` Junio C Hamano
@ 2005-09-12 21:08                     ` Fredrik Kuivinen
  2005-09-12 21:16                       ` Junio C Hamano
  2 siblings, 1 reply; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-12 21:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Fredrik Kuivinen, Daniel Barkalow, Linus Torvalds, cel, git

On Fri, Sep 09, 2005 at 12:44:40AM -0700, Junio C Hamano wrote:

...

> *1* Fredrik, I have been wondering if we can just say that lack
> of '-u' flag implies '-i'.  Is there a good reason that
> 'git-read-tree -m O A B' without '-u' should care if working
> tree is up to date for the paths involved?

No, I don't think so. But it would be nice if one could use '-i'
together with '-u'.

The current git-merge-fredrik.py does not use '-i' together with '-u',
but I plan to change that. Currently we assume that the index is
uptodate when we start the merge. If the two branches have more than
one common ancestor those ancestors are merged into a temporary tree,
lets call it T, without touching the working directory.

The final steps of the merge are currently the following steps (lets
say we want to merge A with B):

git-read-tree A
git-update-cache --refresh
git-read-tree -u -m T A B

We need the second step because otherwise git-read-tree will complain
that the cache isn't uptodate. But we _know_ that the contents are
uptodate, so the second step is unnecessary and just eats time. So,
instead of doing the above we should be able to do:

git-read-tree A
git-read-tree -u -i -m T A B

I have a patch which implements this and it _seems_ to work fine. Is
it reasonable to things this way or have I missed something?


- Fredrik

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFH] Merge driver
  2005-09-12 21:08                     ` Fredrik Kuivinen
@ 2005-09-12 21:16                       ` Junio C Hamano
  2005-09-13 20:33                         ` Fredrik Kuivinen
  0 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2005-09-12 21:16 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: Daniel Barkalow, Linus Torvalds, cel, git

Fredrik Kuivinen <freku045@student.liu.se> writes:

> git-read-tree A
> git-update-cache --refresh
> git-read-tree -u -m T A B
>
> We need the second step because otherwise git-read-tree will complain
> that the cache isn't uptodate. But we _know_ that the contents are
> uptodate, so the second step is unnecessary and just eats time.

If we _know_ it should match, isn't there a way to tell the
first "git-read-tree A" not to mess with the dirtiness, like,
ah, one-way merge?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFH] Merge driver
  2005-09-12 21:16                       ` Junio C Hamano
@ 2005-09-13 20:33                         ` Fredrik Kuivinen
  2005-09-13 20:46                           ` Junio C Hamano
  0 siblings, 1 reply; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-13 20:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Fredrik Kuivinen, Daniel Barkalow, Linus Torvalds, cel, git

On Mon, Sep 12, 2005 at 02:16:48PM -0700, Junio C Hamano wrote:
> Fredrik Kuivinen <freku045@student.liu.se> writes:
> 
> > git-read-tree A
> > git-update-cache --refresh
> > git-read-tree -u -m T A B
> >
> > We need the second step because otherwise git-read-tree will complain
> > that the cache isn't uptodate. But we _know_ that the contents are
> > uptodate, so the second step is unnecessary and just eats time.
> 
> If we _know_ it should match, isn't there a way to tell the
> first "git-read-tree A" not to mess with the dirtiness, like,
> ah, one-way merge?
> 

I don't think a one way merge will do what we want in this case.

At the start of the merge we know that the working directory is in
sync with the index. But during the merge we will merge the common
ancestors, and the index file will be overwritten. When we are done
with merging the common ancestors we want to merge the heads that we
wanted to merge from the begining (A and B) using T as the common
ancestor. When 'git-read-tree A' is executed we get a cache which has
zeroed stat data, but SHAs and mode information which match the
working directory. If we do 'git-update-cache --refresh' the stat data
will be updated to match the working directory.

Using 'git-read-tree -m A' will not work because the stat data in the
cache is zeroed before we run this command.

I see two solutions, either allow passing '-i' together with '-u' to
git-read-tree or use a temporary index file when we are merging the
common ancestors.

Thinking a bit more about it, maybe it's better to use a temporary
index file. If go this way we could, at least in principle, allow
merges which starts with a dirty working directory.

- Fredrik

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFH] Merge driver
  2005-09-13 20:33                         ` Fredrik Kuivinen
@ 2005-09-13 20:46                           ` Junio C Hamano
  0 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-13 20:46 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: Daniel Barkalow, Linus Torvalds, cel, git

Fredrik Kuivinen <freku045@student.liu.se> writes:

> Thinking a bit more about it, maybe it's better to use a temporary
> index file. If go this way we could, at least in principle, allow
> merges which starts with a dirty working directory.

Yeah, I like the idea of using temporary index for doing merges
that are not what the user wanted to do.  The same idea is used
in the 'stupid' thing that tries to find the least-conflicting
merge base.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Another merge test case from the kernel tree.
  2005-09-11  2:54   ` Unified merge driver pushed out to "master" branch Junio C Hamano
  2005-09-11 21:05     ` Fredrik Kuivinen
@ 2005-09-14  5:56     ` Junio C Hamano
  2005-09-14 16:11       ` Daniel Barkalow
                         ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-14  5:56 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git, Daniel Barkalow, Tony Luck

One more merge test case you would be interested.  What's most
interesting about this commit is that it has three merge bases,
not two.

commit c820884e4f35a40f88b787c3891a23d629ef6bfd
Merge: aa2dca4563b0629ecd9d9994dfdf39f29ff1b43f 357d596bd552ad157a906289ab13ea6ba7e66e3d
Author: Tony Luck <tony.luck@intel.com>
Date:   Sun Sep 11 18:35:10 2005 -0700

    Auto-update from upstream

Attempt to reproduce this commit is somewhat troubling.  The
'stupid' strategy and git-resolve resolves cleanly and Tony's
commit matches what 'git-resolve' does (obviously that is the
algorithm everybody uses).

But the 'resolve' strategy says this is case #16 (and resolves
this case differently but I think it is just by luck). This
indicates that Tony _might_ have committed what he did not
wanted to.  The path involved is arch/mips/kernel/offset.c.

    [Tony, I think you've seen this one once in different
    commit.  The terminology "case #16" means your merge has
    more than one merge base candidates:

        Ancestor#1 -- -- Your tree
                     X
        Ancestor#2 -- -- Other tree

    and when a path in your tree matches Ancestor #i and the
    same path in the other tree matches Ancestor #j (i!=j).

    The default git merge algorithm picks a single ancestor that
    gives the least number of conflicting paths during the
    merge, and compares your tree and other tree against it and
    pick the one that changed since that ancestor if only one
    side changed that path.  But in case #16, depending on which
    ancestor we pick, the result may come from your tree or
    other tree, and obviously we do a wrong thing with 50%
    probability.

    Thie _could_ indicate that you _might_ be losing a reverted
    you made in your line of development being overwritten by
    what the other tree did.]

BTW, Fredrik, the 'recursive' merge dies with this:

Trying merge strategy recursive...
Merging 357d596bd552ad157a906289ab13ea6ba7e66e3d with aa2dca4563b0629ecd9d9994dfdf39f29ff1b43f
Merging: 
357d596bd552ad157a906289ab13ea6ba7e66e3d Merge branch 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6  
aa2dca4563b0629ecd9d9994dfdf39f29ff1b43f Manual merge of kernel/sched.c and arch/ia64/kernel/mca_asm.S 
found 3 common ancestor(s): 
a1cddb88920b915eaba10712ecfd0fc698b00a22 [IA64-SGI] Add new vendor-specific SAL calls for: 
6cf07a8cc86a0b471466c7fe45892f7ef434015b [IA64] Fix nasty VMLPT problem... 
49a28cc8fd26f5317c47a9aeb2bdd1c33e21738e [IA64] MCA/INIT: remove obsolete unwind code 
  Merging: 
  a1cddb88920b915eaba10712ecfd0fc698b00a22 [IA64-SGI] Add new vendor-specific SAL calls for: 
  6cf07a8cc86a0b471466c7fe45892f7ef434015b [IA64] Fix nasty VMLPT problem... 
  found 1 common ancestor(s): 
  d8971fcb702e24d1e22c77fd1772f182ffee87e3 [INET]: compile errors when DEBUG is defined 
Traceback (most recent call last):
  File "/home/junio/bin/Linux/git-merge-recursive", line 429, in ?
    firstBranch, secondBranch, graph)
  File "/home/junio/bin/Linux/git-merge-recursive", line 54, in merge
    graph, callDepth+1)
  File "/home/junio/bin/Linux/git-merge-recursive", line 62, in merge
    runProgram(['git-read-tree', h1.tree()])
  File "/home/junio/share/git-core/python/gitMergeCommon.py", line 93, in runProgram
    raise ProgramError(progStr, out)
ProgramError: git-read-tree 1d20af805193ab9982a48cb4c828c0f6af034c6c: fatal: failed to unpack tree object 1d20af805193ab9982a48cb4c828c0f6af034c6c

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Another merge test case from the kernel tree.
  2005-09-14  5:56     ` Another merge test case from the kernel tree Junio C Hamano
@ 2005-09-14 16:11       ` Daniel Barkalow
  2005-09-14 16:30         ` Junio C Hamano
  2005-09-14 17:42       ` Fredrik Kuivinen
  2005-09-15  0:47       ` Yet another set of merge test cases " Junio C Hamano
  2 siblings, 1 reply; 40+ messages in thread
From: Daniel Barkalow @ 2005-09-14 16:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, git, Tony Luck

On Tue, 13 Sep 2005, Junio C Hamano wrote:

> One more merge test case you would be interested.  What's most
> interesting about this commit is that it has three merge bases,
> not two.
> 
> commit c820884e4f35a40f88b787c3891a23d629ef6bfd
> Merge: aa2dca4563b0629ecd9d9994dfdf39f29ff1b43f 357d596bd552ad157a906289ab13ea6ba7e66e3d
> Author: Tony Luck <tony.luck@intel.com>
> Date:   Sun Sep 11 18:35:10 2005 -0700
> 
>     Auto-update from upstream
> 
> Attempt to reproduce this commit is somewhat troubling.  The
> 'stupid' strategy and git-resolve resolves cleanly and Tony's
> commit matches what 'git-resolve' does (obviously that is the
> algorithm everybody uses).
> 
> But the 'resolve' strategy says this is case #16 (and resolves
> this case differently but I think it is just by luck). This
> indicates that Tony _might_ have committed what he did not
> wanted to.  The path involved is arch/mips/kernel/offset.c.

It shouldn't resolve it at all if it's case #16; I'll have to check on 
that.

>     [Tony, I think you've seen this one once in different
>     commit.  The terminology "case #16" means your merge has
>     more than one merge base candidates:
> 
>         Ancestor#1 -- -- Your tree
>                      X
>         Ancestor#2 -- -- Other tree
> 
>     and when a path in your tree matches Ancestor #i and the
>     same path in the other tree matches Ancestor #j (i!=j).
> 
>     The default git merge algorithm picks a single ancestor that
>     gives the least number of conflicting paths during the
>     merge, and compares your tree and other tree against it and
>     pick the one that changed since that ancestor if only one
>     side changed that path.  But in case #16, depending on which
>     ancestor we pick, the result may come from your tree or
>     other tree, and obviously we do a wrong thing with 50%
>     probability.
> 
>     Thie _could_ indicate that you _might_ be losing a reverted
>     you made in your line of development being overwritten by
>     what the other tree did.]

More precisely, case #16 only comes up when, in coming from the ancestors, 
one side decided to match one ancestor, and the other side decided to 
match the other; it is rarely the case that this could occur due to the 
two merges making opposite decisions to entirely ignore the other 
ancestor, and the only remaining case is that the merges each chose the 
same ancestor, and one side reverted to the other. So it's pretty certain 
that you've got a 50-50 chance of losing the correction.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Another merge test case from the kernel tree.
  2005-09-14 16:11       ` Daniel Barkalow
@ 2005-09-14 16:30         ` Junio C Hamano
  0 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-14 16:30 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Fredrik Kuivinen, git, Tony Luck

Daniel Barkalow <barkalow@iabervon.org> writes:

> It shouldn't resolve it at all if it's case #16; I'll have to check on 
> that.

Sorry my mistake; it did not.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Another merge test case from the kernel tree.
  2005-09-14  5:56     ` Another merge test case from the kernel tree Junio C Hamano
  2005-09-14 16:11       ` Daniel Barkalow
@ 2005-09-14 17:42       ` Fredrik Kuivinen
  2005-09-14 17:51         ` Junio C Hamano
  2005-09-15  0:47       ` Yet another set of merge test cases " Junio C Hamano
  2 siblings, 1 reply; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-14 17:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, git, Daniel Barkalow, Tony Luck

On Tue, Sep 13, 2005 at 10:56:45PM -0700, Junio C Hamano wrote:
...

> BTW, Fredrik, the 'recursive' merge dies with this:
> 
> Trying merge strategy recursive...
> Merging 357d596bd552ad157a906289ab13ea6ba7e66e3d with aa2dca4563b0629ecd9d9994dfdf39f29ff1b43f
> Merging: 
> 357d596bd552ad157a906289ab13ea6ba7e66e3d Merge branch 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6  
> aa2dca4563b0629ecd9d9994dfdf39f29ff1b43f Manual merge of kernel/sched.c and arch/ia64/kernel/mca_asm.S 
> found 3 common ancestor(s): 
> a1cddb88920b915eaba10712ecfd0fc698b00a22 [IA64-SGI] Add new vendor-specific SAL calls for: 
> 6cf07a8cc86a0b471466c7fe45892f7ef434015b [IA64] Fix nasty VMLPT problem... 
> 49a28cc8fd26f5317c47a9aeb2bdd1c33e21738e [IA64] MCA/INIT: remove obsolete unwind code 
>   Merging: 
>   a1cddb88920b915eaba10712ecfd0fc698b00a22 [IA64-SGI] Add new vendor-specific SAL calls for: 
>   6cf07a8cc86a0b471466c7fe45892f7ef434015b [IA64] Fix nasty VMLPT problem... 
>   found 1 common ancestor(s): 
>   d8971fcb702e24d1e22c77fd1772f182ffee87e3 [INET]: compile errors when DEBUG is defined 
> Traceback (most recent call last):
>   File "/home/junio/bin/Linux/git-merge-recursive", line 429, in ?
>     firstBranch, secondBranch, graph)
>   File "/home/junio/bin/Linux/git-merge-recursive", line 54, in merge
>     graph, callDepth+1)
>   File "/home/junio/bin/Linux/git-merge-recursive", line 62, in merge
>     runProgram(['git-read-tree', h1.tree()])
>   File "/home/junio/share/git-core/python/gitMergeCommon.py", line 93, in runProgram
>     raise ProgramError(progStr, out)
> ProgramError: git-read-tree 1d20af805193ab9982a48cb4c828c0f6af034c6c: fatal: failed to unpack tree object 1d20af805193ab9982a48cb4c828c0f6af034c6c
> 

It works for me. The 'recursive' strategy cleanly merges this and
produces a result identical to the actual commit.

The tree object 1d20af805193ab9982a48cb4c828c0f6af034c6c is the tree
of the commit a1cddb88920b915eaba10712ecfd0fc698b00a22 which is one of
the common ancestors. Are you sure that you got the entire repository? 
It took some time for me to figure out that 'git clone
rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git
aegl.git' will _not_ give you a usable repository. You have to change
aegl.git/.git/objects/objects/info/alternates before it becomes
usable.


- Fredrik

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Another merge test case from the kernel tree.
  2005-09-14 17:42       ` Fredrik Kuivinen
@ 2005-09-14 17:51         ` Junio C Hamano
  0 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2005-09-14 17:51 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git, Daniel Barkalow, Tony Luck

Fredrik Kuivinen <freku045@student.liu.se> writes:

> It works for me. The 'recursive' strategy cleanly merges this and
> produces a result identical to the actual commit.
>
> The tree object 1d20af805193ab9982a48cb4c828c0f6af034c6c is the tree
> of the commit a1cddb88920b915eaba10712ecfd0fc698b00a22 which is one of
> the common ancestors. Are you sure that you got the entire repository? 
> It took some time for me to figure out that 'git clone
> rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git
> aegl.git' will _not_ give you a usable repository. You have to change
> aegl.git/.git/objects/objects/info/alternates before it becomes
> usable.

Ah, that figures..  I was looking at Tony's repository right
now.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Yet another set of merge test cases from the kernel tree.
  2005-09-14  5:56     ` Another merge test case from the kernel tree Junio C Hamano
  2005-09-14 16:11       ` Daniel Barkalow
  2005-09-14 17:42       ` Fredrik Kuivinen
@ 2005-09-15  0:47       ` Junio C Hamano
  2005-09-19 16:13         ` Fredrik Kuivinen
  2 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2005-09-15  0:47 UTC (permalink / raw)
  To: Fredrik Kuivinen, Daniel Barkalow, Linus Torvalds; +Cc: git

I've tried to reproduce 400+ merges I picked up from various
kernel trees that have more than one merge base.

number of commits looked at      : 83292
number of merge commits          : 11089
commits with more than one base  :   426

Interestingly enough, some of them have timestamps in 2002 or
2003, and comparing what our merge strategies do with what
actually got recorded is interesting.  I hope comparing what was
recorded by the SCM used back then and what we would produce
today given the same input does not violate any license
agreements somebody might have ;-).

I've taken a look at only handful of results, but I can already
see some patterns:

 - Overall we seem to do reasonably well -- IOW many merges are
   trivial even to 'stupid' strategy;

 - In some cases, all strategies do quite poorly.  For example:

   091cced38d83732ee3212806ef5ea07abbef01fe
   44583d380d189095fa959ec8ba87f0cb6deb15f5

 - There are quite a few case #16 (45 out of 426).  While
   'stupid' strategy fails to merge many of them, there are
   plenty of cases that it trivially resolves to what the SCM
   used back then did.  'resolve' being too cautious _might_
   irritate some users in these cases, but personally I feel
   that being safer is better than letting ambiguities go
   unnoticed.  It _might_ indicate that the SCM used back then
   made the same mistake as our 'stupid' strategy does today
   ;-).  A couple of examples:

   39284791a2ca8cb32b5597bf9ccf8e5c548e7aa7
   662756c753b13231d437e4bc78032b5891c9d4ea

 - 'recursive' strategy seems to most often produce results that
   match what the SCM used back then did, but it still gets some
   cases different.  E.g.

   10f44c023ef22c8c125d5ef15bed560d46c66af9
   2e7bdf93a8c4d7ecd54f1ceb19881faf0f6b14f3
   46f58b7869ffd1f2009065ffa626db3b0498925a
   5f4fbfe6d1986a99cb62f433346c92e57e7b9814
   651a8cf62546f2425df09b1b4451fe317f84431d

I have not really counted to make stats out of those 426 merge
results, and the above is just my first impressions after
looking at about two dozens or so.


------------
commit 091cced38d83732ee3212806ef5ea07abbef01fe
Merge: 80964be0914c821546d3673fcb552d5b87a44867 cb70bcabcfb745c7dee78792bd17fb5fd50b0950
Author: Len Brown <len.brown@intel.com>
Date:   Wed Nov 10 07:57:44 2004 -0500

    Merge intel.com:/home/lenb/src/26-stable-dev
    into intel.com:/home/lenb/src/26-latest-dev

differences in paths: 4057

Try resolving the old way.
Method 'traditional' automerged (1).
Method 'traditional' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 2
Difference from the actual commit: 4
Difference from the actual commit (cached): 2

Strategy recursive.
Method 'recursive' automerged (0).
Method 'recursive' resolved cleanly.
Difference from the actual commit: 1
Difference from the actual commit (cached): 1

Strategy resolve.
Method 'resolve' automerged (1).
Method 'resolve' did not resolve cleanly.
	unmerged paths: 46
	conflict paths: 24
Difference from the actual commit: 28
Difference from the actual commit (cached): 27

Strategy stupid.
Method 'stupid' automerged (1).
Method 'stupid' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 2
Difference from the actual commit: 4
Difference from the actual commit (cached): 2
 ------------------------------------------------ 
commit 10f44c023ef22c8c125d5ef15bed560d46c66af9
Merge: 5ad6b7ada85fa414a7e94c309ef9c73862c08f69 961c380cef0f1af06c254488d1a672324ec2b34a
Author: Greg Kroah-Hartman <greg@kroah.com>
Date:   Tue Mar 2 03:27:45 2004 -0800

    Merge kroah.com:/home/linux/BK/bleed-2.6
    into kroah.com:/home/linux/BK/pci-2.6

differences in paths: 273

Try resolving the old way.
Method 'traditional' automerged (1).
Method 'traditional' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 1
Difference from the actual commit: 1
Difference from the actual commit (cached): 0

Strategy recursive.
Method 'recursive' automerged (0).
Method 'recursive' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0

Strategy resolve.
Method 'resolve' automerged (1).
Method 'resolve' did not resolve cleanly.
	unmerged paths: 16
	conflict paths: 8
Difference from the actual commit: 8
Difference from the actual commit (cached): 8

Strategy stupid.
Method 'stupid' automerged (1).
Method 'stupid' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 1
Difference from the actual commit: 1
Difference from the actual commit (cached): 0
 ------------------------------------------------ 
commit 2e7bdf93a8c4d7ecd54f1ceb19881faf0f6b14f3
Merge: ad50ff186e3544ce316bc24e8b37e82b840b42e1 b1c237f8c2f529713c11839db9d49ed8cec1c607
Author: Patrick Mochel <mochel@osdl.org>
Date:   Mon Sep 8 21:22:38 2003 -0700

    Merge kernel.bkbits.net:linux-2.5-power
    into osdl.org:/home/mochel/src/kernel/linux-2.5-power

differences in paths: 70

Try resolving the old way.
Method 'traditional' automerged (1).
Method 'traditional' did not resolve cleanly.
	unmerged paths: 4
	conflict paths: 3
Difference from the actual commit: 4
Difference from the actual commit (cached): 3

Strategy recursive.
Method 'recursive' automerged (0).
Method 'recursive' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0

Strategy resolve.
Method 'resolve' automerged (1).
Method 'resolve' did not resolve cleanly.
	unmerged paths: 2
	conflict paths: 1
Difference from the actual commit: 1
Difference from the actual commit (cached): 1

Strategy stupid.
Method 'stupid' automerged (1).
Method 'stupid' did not resolve cleanly.
	unmerged paths: 4
	conflict paths: 3
Difference from the actual commit: 4
Difference from the actual commit (cached): 3
 ------------------------------------------------ 
commit 39284791a2ca8cb32b5597bf9ccf8e5c548e7aa7
Merge: f96645e41c946431b988e061fbc9dd4c7176e621 b335ff4d92816a6722fcf323c67b86af58eb5b9d
Author: Keith M. Wesolowski <wesolows@foobazco.org>
Date:   Sun Jul 11 10:20:23 2004 -0700

    Merge ssh://kernel.bkbits.net/sparc32-2.6
    into foobazco.org:/sources/2.5-sparc-todave

differences in paths: 2316

Try resolving the old way.
Method 'traditional' automerged (0).
Method 'traditional' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0

Strategy recursive.
Method 'recursive' automerged (0).
Method 'recursive' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0

Strategy resolve.
Method 'resolve' automerged (1).
Method 'resolve' did not resolve cleanly.
	unmerged paths: 4
	conflict paths: 2
Difference from the actual commit: 4
Difference from the actual commit (cached): 4

Strategy stupid.
Method 'stupid' automerged (0).
Method 'stupid' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0
 ------------------------------------------------ 
commit 44583d380d189095fa959ec8ba87f0cb6deb15f5
Merge: 645dbacc0c65228151b7cbcb8a977c83328cef76 8bf1412a4bbfc9da0224e73203172f98f987b41a
Author: Jeff Dike <jdike@uml.karaya.com>
Date:   Sun Dec 29 11:40:13 2002 -0500

    Merge uml.karaya.com:/home/jdike/linux/2.5/skas-2.5
    into uml.karaya.com:/home/jdike/linux/2.5/skas-2.5-linus

differences in paths: 406

Try resolving the old way.
Method 'traditional' automerged (0).
Method 'traditional' resolved cleanly.
Difference from the actual commit: 1
Difference from the actual commit (cached): 1

Strategy recursive.
Method 'recursive' automerged (1).
Method 'recursive' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 1
Difference from the actual commit: 1
Difference from the actual commit (cached): 1

Strategy resolve.
Method 'resolve' automerged (1).
Method 'resolve' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 1
Difference from the actual commit: 2
Difference from the actual commit (cached): 2

Strategy stupid.
Method 'stupid' automerged (0).
Method 'stupid' resolved cleanly.
Difference from the actual commit: 1
Difference from the actual commit (cached): 1
 ------------------------------------------------ 
commit 46f58b7869ffd1f2009065ffa626db3b0498925a
Merge: d98612ae0eac1053fe585c0b283a9d385ed80572 9dd6f379b26f49c5819e302156dabb2171721f78
Author: James Simmons <jsimmons@maxwell.earthlink.net>
Date:   Sat Nov 16 01:36:03 2002 -0800

    Merge

differences in paths: 74

Try resolving the old way.
Method 'traditional' automerged (1).
Method 'traditional' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 5
Difference from the actual commit: 6
Difference from the actual commit (cached): 5

Strategy recursive.
Method 'recursive' automerged (1).
Method 'recursive' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 1
Difference from the actual commit: 1
Difference from the actual commit (cached): 0

Strategy resolve.
Method 'resolve' automerged (1).
Method 'resolve' did not resolve cleanly.
	unmerged paths: 4
	conflict paths: 3
Difference from the actual commit: 3
Difference from the actual commit (cached): 2

Strategy stupid.
Method 'stupid' automerged (1).
Method 'stupid' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 5
Difference from the actual commit: 6
Difference from the actual commit (cached): 5
 ------------------------------------------------ 
commit 5f4fbfe6d1986a99cb62f433346c92e57e7b9814
Merge: 03fab1f6c2a2ac316da184d687276655b53d3290 e772cc0eea175593d25669e11433b9647f4051e5
Author: James Simmons <jsimmons@kozmo.(none)>
Date:   Fri Feb 14 17:37:00 2003 -0800

    Merge bk://fbdev.bkbits.net/fbdev-2.5
    into kozmo.(none):/usr/src/fbdev-2.5

differences in paths: 154

Try resolving the old way.
Method 'traditional' automerged (1).
Method 'traditional' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 2
Difference from the actual commit: 4
Difference from the actual commit (cached): 2

Strategy recursive.
Method 'recursive' automerged (0).
Method 'recursive' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0

Strategy resolve.
Method 'resolve' automerged (1).
Method 'resolve' did not resolve cleanly.
	unmerged paths: 2
	conflict paths: 1
Difference from the actual commit: 2
Difference from the actual commit (cached): 2

Strategy stupid.
Method 'stupid' automerged (1).
Method 'stupid' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 2
Difference from the actual commit: 4
Difference from the actual commit (cached): 2
 ------------------------------------------------ 
commit 651a8cf62546f2425df09b1b4451fe317f84431d
Merge: cbe9bae1575761d8dfa7dd4b07c94878ef95bab8 abb86e39e38666d8b0af1821cb7c036902bbf459
Author: James Simmons <jsimmons@kozmo.(none)>
Date:   Thu Dec 5 17:48:25 2002 -0800

    Merge bk://fbdev.bkbits.net/fbdev-2.5
    into kozmo.(none):/usr/src/fbdev-2.5

differences in paths: 312

Try resolving the old way.
Method 'traditional' automerged (1).
Method 'traditional' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 1
Difference from the actual commit: 1
Difference from the actual commit (cached): 1

Strategy recursive.
Method 'recursive' automerged (0).
Method 'recursive' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0

Strategy resolve.
Method 'resolve' automerged (0).
Method 'resolve' resolved cleanly.
Difference from the actual commit: 1
Difference from the actual commit (cached): 1

Strategy stupid.
Method 'stupid' automerged (1).
Method 'stupid' did not resolve cleanly.
	unmerged paths: 1
	conflict paths: 1
Difference from the actual commit: 1
Difference from the actual commit (cached): 1
 ------------------------------------------------ 
commit 662756c753b13231d437e4bc78032b5891c9d4ea
Merge: 1a89e511e152998311f49c60e55b21d48d1b8973 8d7c763253b85b5598e5975a5c88b896fefe2110
Author: Tony Luck <tony.luck@intel.com>
Date:   Thu Mar 31 00:29:06 2005 -0800

    Merge intel.com:/data/home/aegl/BK/work/1
    into intel.com:/data/home/aegl/BK/linux-ia64-release-2.6.12

differences in paths: 3303

Try resolving the old way.
Method 'traditional' automerged (0).
Method 'traditional' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0

Strategy recursive.
Method 'recursive' automerged (0).
Method 'recursive' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0

Strategy resolve.
Method 'resolve' automerged (0).
Method 'resolve' resolved cleanly.
Difference from the actual commit: 3
Difference from the actual commit (cached): 3

Strategy stupid.
Method 'stupid' automerged (0).
Method 'stupid' resolved cleanly.
Difference from the actual commit: 0
Difference from the actual commit (cached): 0
 ------------------------------------------------ 
#!/bin/sh

check_result () {
	# method=$1, expect=$2

	result_exit=$?
	case "$result_exit" in
	0 | 1)
		echo "Method '$1' automerged ($result_exit)."
		git-update-cache --refresh >/dev/null
		unmerged=`git-ls-files --unmerged`
		conflict=`git-diff-files --name-only`
		case "$unmerged$conflict" in
		'')
			echo "Method '$1' resolved cleanly."
			;;
		*)
			echo "Method '$1' did not resolve cleanly."
			echo "	unmerged paths: "`echo "$unmerged" | wc -l`
			echo "	conflict paths: "`echo "$conflict" | wc -l`
			;;
		esac
		echo "Difference from the actual commit: "`
			git-diff-cache --name-only "$expect" | wc -l`
		echo "Difference from the actual commit (cached): "`
			git-diff-cache --name-only --cached "$expect" | wc -l`
		;;
	*)
		echo "Method '$1' failed to automerge."
		;;
	esac
}

strategies=$(git merge -s help 2>&1 | sed -e 's/available strategies are: //')

rm -fr ./++merge-log
mkdir -p ./++merge-log

while read parents
do
	set x $parents; shift; head="$1"; shift
	remote= expect= remotes_seen=
	for x
	do
		case "$x" in
		'|')
			case "$remotes_seen" in
			yes) break ;;
			esac
			remotes_seen=yes
			continue ;;
		?*)
			case "$remotes_seen" in
			yes) expect="$x"; break ;;
			esac
			remote="$remote$x " ;;
		esac
	done

	echo "Merge $remote into $head"
	echo

    (
	
	git log --max-count=1 $expect

	echo "differences in paths: "`
		git-diff-tree -r --name-only $remote $head | wc -l`

	echo
	git reset --hard "$head"
	echo "Try resolving the old way."
	(
		PATH=$HOME/bin/Linux-0.99.6:/usr/bin:/bin
		export PATH
		git-resolve-script $head $remote 'test merge with resolve'
		exit $?
	) >./"++merge-log/$expect.traditional" 2>&1
	check_result traditional $expect

	for s in $strategies
	do
	    case "$s" in octopus) continue ;; esac
	    echo
	    git reset --hard "$head"
	    echo "Strategy $s."
	    git-merge -n -s $s "test merge $s" $head $remote \
		>./"++merge-log/$expect.$s" 2>&1
	    check_result $s $expect
	done

	echo ' ------------------------------------------------ '

    ) >./"++merge-log/$expect".log 2>&1

done <./++commit-check-out

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Yet another set of merge test cases from the kernel tree.
  2005-09-15  0:47       ` Yet another set of merge test cases " Junio C Hamano
@ 2005-09-19 16:13         ` Fredrik Kuivinen
  2005-09-20  1:53           ` Junio C Hamano
  0 siblings, 1 reply; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-19 16:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, Daniel Barkalow, Linus Torvalds, git

On Wed, Sep 14, 2005 at 05:47:53PM -0700, Junio C Hamano wrote:
> I've tried to reproduce 400+ merges I picked up from various
> kernel trees that have more than one merge base.
> 
> number of commits looked at      : 83292
> number of merge commits          : 11089
> commits with more than one base  :   426
> 
> Interestingly enough, some of them have timestamps in 2002 or
> 2003, and comparing what our merge strategies do with what
> actually got recorded is interesting.  I hope comparing what was
> recorded by the SCM used back then and what we would produce
> today given the same input does not violate any license
> agreements somebody might have ;-).
> 
> I've taken a look at only handful of results, but I can already
> see some patterns:
> 
>  - Overall we seem to do reasonably well -- IOW many merges are
>    trivial even to 'stupid' strategy;
> 
>  - In some cases, all strategies do quite poorly.  For example:
> 
>    091cced38d83732ee3212806ef5ea07abbef01fe
>    44583d380d189095fa959ec8ba87f0cb6deb15f5

091c... seems to be a bit complicated. I have tried to understand what
the correct result should be, but I haven't managed to do that yet.

4458... is kind of interesting as it has no less than six common
ancestors.


> 
>  - There are quite a few case #16 (45 out of 426).  While
>    'stupid' strategy fails to merge many of them, there are
>    plenty of cases that it trivially resolves to what the SCM
>    used back then did.  'resolve' being too cautious _might_
>    irritate some users in these cases, but personally I feel
>    that being safer is better than letting ambiguities go
>    unnoticed.  It _might_ indicate that the SCM used back then
>    made the same mistake as our 'stupid' strategy does today
>    ;-).  A couple of examples:
> 
>    39284791a2ca8cb32b5597bf9ccf8e5c548e7aa7
>    662756c753b13231d437e4bc78032b5891c9d4ea
> 
>  - 'recursive' strategy seems to most often produce results that
>    match what the SCM used back then did, but it still gets some
>    cases different.  E.g.
> 
>    10f44c023ef22c8c125d5ef15bed560d46c66af9
>    2e7bdf93a8c4d7ecd54f1ceb19881faf0f6b14f3
>    46f58b7869ffd1f2009065ffa626db3b0498925a
>    5f4fbfe6d1986a99cb62f433346c92e57e7b9814
>    651a8cf62546f2425df09b1b4451fe317f84431d
>

The reason we get the wrong result for those is that they all involve
renames. I have an experimental patch at the end of this mail which
makes git-merge-recursive.py handle some rename cases. The support for
renames is currently interleaved with other unrelated changes and
this patch should not yet be considered for inclusion. With the rename
patch we get clean merges that matches the actual commit for all of
those except for 46f5.... git-merge-recursive.py do not merge
46f5... cleanly with the rename patch applied.

Peter's merge, reported in <4318E754.9000703@zytor.com>, is merged
cleanly if this patch is applied. (and I _think_ the result is
correct)

- Fredrik


---

 git-merge-recursive.py |  466 +++++++++++++++++++++++++++++++++++--------------
 1 files changed, 334 insertions(+), 132 deletions(-)

diff --git a/git-merge-recursive.py b/git-merge-recursive.py
--- a/git-merge-recursive.py
+++ b/git-merge-recursive.py
@@ -7,9 +7,6 @@ from sets import Set
 sys.path.append('@@GIT_PYTHON_PATH@@')
 from gitMergeCommon import *
 
-# The actual merge code
-# ---------------------
-
 originalIndexFile = os.environ.get('GIT_INDEX_FILE',
                                    os.environ.get('GIT_DIR', '.git') + '/index')
 temporaryIndexFile = os.environ.get('GIT_DIR', '.git') + \
@@ -26,6 +23,13 @@ def setupIndex(temporary):
         newIndex = originalIndexFile
     os.environ['GIT_INDEX_FILE'] = newIndex
 
+def fmtP(path):
+    '''Format path for printing'''
+    return '"' + path + '"'
+
+# The entry point to the merge code
+# ---------------------------------
+
 def merge(h1, h2, branch1Name, branch2Name, graph, callDepth=0):
     '''Merge the commits h1 and h2, return the resulting virtual
     commit object and a flag indicating the cleaness of the merge.'''
@@ -74,7 +78,7 @@ def merge(h1, h2, branch1Name, branch2Na
 
     return [res, clean]
 
-getFilesRE = re.compile(r'^([0-7]+) (\S+) ([0-9a-f]{40})\t(.*)$', re.S)
+getFilesRE = re.compile('([0-9]+) ([a-z0-9]+) ([0-9a-f]{40})\t(.*)')
 def getFilesAndDirs(tree):
     files = Set()
     dirs = Set()
@@ -89,6 +93,149 @@ def getFilesAndDirs(tree):
 
     return [files, dirs]
 
+def mergeTrees(head, merge, common, branch1Name, branch2Name,
+               cleanCache):
+    '''Merge the trees 'head' and 'merge' with the common ancestor
+    'common'. The name of the head branch is 'branch1Name' and the name of
+    the merge branch is 'branch2Name'. Return a tuple (tree, cleanMerge)
+    where tree is the resulting tree and cleanMerge is True iff the
+    merge was clean.'''
+    
+    assert(isSha(head) and isSha(merge) and isSha(common))
+
+    if common == merge:
+        print 'Already uptodate!'
+        return [head, True]
+
+    if cleanCache:
+        updateArg = '-i'
+    else:
+        updateArg = '-u'
+
+    runProgram(['git-read-tree', updateArg, '-m', common, head, merge])
+    cleanMerge = True
+
+    [tree, code] = runProgram('git-write-tree', returnCode=True)
+    tree = tree.rstrip()
+    if code != 0:
+        [files, dirs] = getFilesAndDirs(head)
+        [filesM, dirsM] = getFilesAndDirs(merge)
+        files.union_update(filesM)
+        dirs.union_update(dirsM)
+        
+        renamesHead = getRenames(common, head)
+        renamesMerge = getRenames(common, merge)
+
+        entries = unmergedCacheEntries()
+        cleanMerge = processRenames(entries, renamesHead, renamesMerge,
+                                    branch1Name, branch2Name, cleanCache)
+        for entry in entries.itervalues():
+            if entry.processed:
+                continue
+            if not processEntry(entry, branch1Name, branch2Name,
+                                files, dirs, cleanCache):
+                cleanMerge = False
+                
+        if cleanMerge or cleanCache:
+            tree = runProgram('git-write-tree').rstrip()
+        else:
+            tree = None
+    else:
+        cleanMerge = True
+
+    return [tree, cleanMerge]
+
+# Low level file merging, update and removal
+# ------------------------------------------
+
+MERGE_NOCHANGES = 0
+MERGE_SIMPLE = 1
+MERGE_3WAY = 2
+def mergeFile(path, oSha, oMode, aSha, aMode, bSha, bMode,
+              branch1Name, branch2Name):
+
+    merge = MERGE_NOCHANGES
+    clean = True
+
+    if aSha != oSha or bSha != oSha:
+        merge = MERGE_SIMPLE
+
+    if aSha == oSha:
+        sha = bSha
+    elif bSha == oSha:
+        sha = aSha
+    else:
+        merge = MERGE_3WAY
+        orig = runProgram(['git-unpack-file', oSha]).rstrip()
+        src1 = runProgram(['git-unpack-file', aSha]).rstrip()
+        src2 = runProgram(['git-unpack-file', bSha]).rstrip()
+        [out, code] = runProgram(['merge',
+                                  '-L', branch1Name + '/' + path,
+                                  '-L', 'orig/' + path,
+                                  '-L', branch2Name + '/' + path,
+                                  src1, orig, src2], returnCode=True)
+
+        sha = runProgram(['git-hash-object', '-t', 'blob', '-w',
+                          src1]).rstrip()
+
+        os.unlink(orig)
+        os.unlink(src1)
+        os.unlink(src2)
+
+        clean = (code == 0)
+
+    if aMode == oMode:
+        mode = bMode
+    else:
+        mode = aMode
+
+    return [sha, mode, clean, merge]
+
+def updateFile(clean, sha, mode, path, cleanCache, onlyWd=False):
+    updateCache = not onlyWd and (cleanCache or (not cleanCache and clean))
+    updateWd = onlyWd or (not cleanCache and clean)
+
+    if updateWd:
+        prog = ['git-cat-file', 'blob', sha]
+        if stat.S_ISREG(mode):
+            try:
+                os.unlink(path)
+            except OSError:
+                pass
+            if mode & 0100:
+                mode = 0777
+            else:
+                mode = 0666
+            fd = os.open(path, os.O_WRONLY | os.O_TRUNC | os.O_CREAT, mode)
+            proc = subprocess.Popen(prog, stdout=fd)
+            proc.wait()
+            os.close(fd)
+        elif stat.S_ISLNK(mode):
+            linkTarget = runProgram(prog)
+            os.symlink(linkTarget, path)
+        else:
+            assert(False)
+
+    if updateWd and updateCache:
+        runProgram(['git-update-index', '--add', '--', path])
+    elif updateCache:
+        runProgram(['git-update-index', '--add', '--cacheinfo',
+                    '0%o' % mode, sha, path])
+
+def removeFile(clean, path, cleanCache):
+    if cleanCache or (not cleanCache and clean):
+        runProgram(['git-update-index', '--force-remove', '--', path])
+
+    if not cleanCache and clean:
+        try:
+            os.unlink(path)
+        except OSError, e:
+            if e.errno != errno.ENOENT and e.errno != errno.EISDIR:
+                raise
+
+# Cache entry management
+# ----------------------
+
 class CacheEntry:
     def __init__(self, path):
         class Stage:
@@ -98,8 +245,15 @@ class CacheEntry:
         
         self.stages = [Stage(), Stage(), Stage()]
         self.path = path
+        self.processed = False
 
-unmergedRE = re.compile(r'^([0-7]+) ([0-9a-f]{40}) ([1-3])\t(.*)$', re.S)
+def getCacheEntry(entries, path):
+    for e in entries.itervalues():
+        if e.path == path:
+            return e
+    return None
+
+unmergedRE = re.compile('^([0-9]+) ([0-9a-f]{40}) ([1-3])\t(.*)$')
 def unmergedCacheEntries():
     '''Create a dictionary mapping file names to CacheEntry
     objects. The dictionary contains one entry for every path with a
@@ -130,52 +284,160 @@ def unmergedCacheEntries():
                 'git-ls-files:', l)
     return res
 
-def mergeTrees(head, merge, common, branch1Name, branch2Name,
-               cleanCache):
-    '''Merge the trees 'head' and 'merge' with the common ancestor
-    'common'. The name of the head branch is 'branch1Name' and the name of
-    the merge branch is 'branch2Name'. Return a tuple (tree, cleanMerge)
-    where tree is the resulting tree and cleanMerge is True iff the
-    merge was clean.'''
-    
-    assert(isSha(head) and isSha(merge) and isSha(common))
+# Rename detection and handling
+# -----------------------------
 
-    if common == merge:
-        print 'Already uptodate!'
-        return [head, True]
+class RenameEntry:
+    def __init__(self, src, srcSha, srcMode, dst, dstSha, dstMode, score):
+        self.srcName = src
+        self.srcSha = srcSha
+        self.srcMode = srcMode
+        self.dstName = dst
+        self.dstSha = dstSha
+        self.dstMode = dstMode
+        self.score = score
+
+class RenameInfo:
+    def __init__(self):
+        self.entriesSrc = {}
+        self.entriesDst = {}
+
+    def add(self, entry):
+        self.entriesSrc[entry.srcName] = entry
+        self.entriesDst[entry.dstName] = entry
+
+    def getSrc(self, path):
+        return self.entriesSrc.get(path)
+
+    def getDst(self, path):
+        return self.entriesDst.get(path)
+
+    def __iter__(self):
+        return self.entriesSrc.itervalues()
+
+parseDiffRenamesRE = re.compile(':([0-7]+) ([0-7]+) ([0-9a-f]{40}) ([0-9a-f]{40}) R([0-9]*)')
+def getRenames(treeOrig, tree):
+    inp = runProgram(['git-diff-tree', '-M', '--diff-filter=R', '-r', '-z', treeOrig, tree])
 
-    if cleanCache:
-        updateArg = '-i'
-    else:
-        updateArg = '-u'
+    ret = RenameInfo()
+    try:
+        recs = inp.split("\0")
+        recs.pop() # remove last entry (which is '')
+        it = recs.__iter__()
+        while True:
+            rec = it.next()
+            m = parseDiffRenamesRE.match(rec)
+            
+            if not m:
+                die('Unexpected output from git-diff-tree!:', rec)
+
+            srcMode = int(m.group(1), 8)
+            dstMode = int(m.group(2), 8)
+            srcSha = m.group(3)
+            dstSha = m.group(4)
+            score = m.group(5)
+            src = it.next()
+            dst = it.next()
+            ret.add(RenameEntry(src, srcSha, srcMode, dst, dstSha, dstMode, score))
+    except StopIteration:
+        pass
+    return ret
 
-    runProgram(['git-read-tree', updateArg, '-m', common, head, merge])
-    cleanMerge = True
+def processRenames(entries, renamesA, renamesB, branchNameA, branchNameB, cleanCache):
+    srcNames = Set()
+    for x in renamesA:
+        srcNames.add(x.srcName)
+    for x in renamesB:
+        srcNames.add(x.srcName)
 
-    [tree, code] = runProgram('git-write-tree', returnCode=True)
-    tree = tree.rstrip()
-    if code != 0:
-        [files, dirs] = getFilesAndDirs(head)
-        [filesM, dirsM] = getFilesAndDirs(merge)
-        files.union_update(filesM)
-        dirs.union_update(dirsM)
+    cleanMerge = True
+    for path in srcNames:
+        if renamesA.getSrc(path):
+            renames1 = renamesA
+            renames2 = renamesB
+            branchName1 = branchNameA
+            branchName2 = branchNameB
+        else:
+            renames1 = renamesB
+            renames2 = renamesA
+            branchName1 = branchNameB
+            branchName2 = branchNameA
         
-        cleanMerge = True
-        entries = unmergedCacheEntries()
-        for name in entries:
-            if not processEntry(entries[name], branch1Name, branch2Name,
-                                files, dirs, cleanCache):
+        ren1 = renames1.getSrc(path)
+        ren2 = renames2.getSrc(path)
+
+        e = getCacheEntry(entries, ren1.dstName)
+        if e:
+            e.processed = True
+        e = getCacheEntry(entries, ren1.srcName)
+        if e:
+            e.processed = True
+        removeFile(True, ren1.srcName, cleanCache)
+        if ren2:
+            assert(ren1.srcName == ren2.srcName)
+            e = getCacheEntry(entries, ren2.dstName)
+            if e:
+                e.processed = True
+            if ren1.dstName != ren2.dstName:
+                print 'CONFLICT (ren/ren):', fmtP(path), 'renamed to', \
+                      fmtP(ren1.dstName), 'in branch', branchName1, 'and to', fmtP(ren2.dstName), \
+                      'in', branchName2
                 cleanMerge = False
-                
-        if cleanMerge or cleanCache:
-            tree = runProgram('git-write-tree').rstrip()
+                updateFile(False, ren1.dstSha, ren1.dstMode, ren1.dstName, cleanCache)
+                updateFile(False, ren2.dstSha, ren2.dstMode, ren2.dstName, cleanCache)
+            else:
+                print fmtP(path), 'renamed to', fmtP(ren1.dstName), 'in both branches'
+                [resSha, resMode, clean, merge] = \
+                         mergeFile(ren1.srcName,
+                                   ren1.srcSha, ren1.srcMode,
+                                   ren1.dstSha, ren1.dstMode,
+                                   ren2.dstSha, ren2.dstMode,
+                                   branchName1, branchName2)
+
+                if merge:
+                    print 'Auto-merging', fmtP(ren1.dstName)
+
+                if not clean:
+                    print 'CONFLICT (mod/mod): merge conflict in', fmtP(ren1.dstName)
+                updateFile(clean, resSha, resMode, ren1.dstName, cleanCache)
+                    
+        # Renamed in 1, maybe changed in 2
         else:
-            tree = None
-    else:
-        cleanMerge = True
+            # FIXME ar det verkligen sakert att ren1.srcName finns i cachen som unmerged?
+            e = getCacheEntry(entries, ren1.srcName)
+            if renamesA == renames1:
+                sha = e.stages[2].sha1
+                mode = e.stages[2].mode
+            else:
+                sha = e.stages[1].sha1
+                mode = e.stages[1].mode
 
-    return [tree, cleanMerge]
+            [resSha, resMode, clean, merge] = \
+                     mergeFile(ren1.srcName,
+                               ren1.srcSha, ren1.srcMode,
+                               ren1.dstSha, ren1.dstMode,
+                               sha, mode, branchName1, branchName2)
+
+            if merge:
+                print fmtP(ren1.srcName), 'renamed to', fmtP(ren1.dstName), 'in', \
+                      branchName1, 'and modified in', branchName2 + ', auto-merging'
+
+                if not clean:
+                    print 'CONFLICT (ren/mod):', fmtP(ren1.srcName), 'renamed to ', \
+                          fmtP(ren1.dstName), 'in branch', branchName1, 'and modified in', \
+                          'branch', branchName2, 'merge conflict in', fmtP(ren1.dstName)
+                    cleanMerge = False
+            else:
+                print fmtP(ren1.srcName), 'renamed to', fmtP(ren1.dstName), 'in', \
+                      branchName1
 
+            e.processed = True
+            updateFile(clean, resSha, resMode, ren1.dstName, cleanCache)
+    return cleanMerge
+
+# Per entry merge function
+# ------------------------
+ 
 def processEntry(entry, branch1Name, branch2Name, files, dirs, cleanCache):
     '''Merge one cache entry. 'files' is a Set with the files in both of
     the heads that we are going to merge. 'dirs' contains the
@@ -192,48 +454,6 @@ def processEntry(entry, branch1Name, bra
 
 # If cleanCache == False then the cache shouldn't be updated if clean == False
 
-    def updateFile(clean, sha, mode, path, onlyWd=False):
-        updateCache = not onlyWd and (cleanCache or (not cleanCache and clean))
-        updateWd = onlyWd or (not cleanCache and clean)
-
-        if updateWd:
-            prog = ['git-cat-file', 'blob', sha]
-            if stat.S_ISREG(mode):
-                try:
-                    os.unlink(path)
-                except OSError:
-                    pass
-                if mode & 0100:
-                    mode = 0777
-                else:
-                    mode = 0666
-                fd = os.open(path, os.O_WRONLY | os.O_TRUNC | os.O_CREAT, mode)
-                proc = subprocess.Popen(prog, stdout=fd)
-                proc.wait()
-                os.close(fd)
-            elif stat.S_ISLNK(mode):
-                linkTarget = runProgram(prog)
-                os.symlink(linkTarget, path)
-            else:
-                assert(False)
-
-        if updateWd and updateCache:
-            runProgram(['git-update-index', '--add', '--', path])
-        elif updateCache:
-            runProgram(['git-update-index', '--add', '--cacheinfo',
-                        '0%o' % mode, sha, path])
-
-    def removeFile(clean, path):
-        if cleanCache or (not cleanCache and clean):
-            runProgram(['git-update-index', '--force-remove', '--', path])
-
-        if not cleanCache and clean:
-            try:
-                os.unlink(path)
-            except OSError, e:
-                if e.errno != errno.ENOENT and e.errno != errno.EISDIR:
-                    raise
-
     def uniquePath(path, branch):
         newPath = path + '_' + branch
         suffix = 0
@@ -273,26 +493,26 @@ def processEntry(entry, branch1Name, bra
     # Deleted in both or deleted in one and unchanged in the other
             if aSha:
                 print 'Removing ' + path
-            removeFile(True, path)
+            removeFile(True, path, cleanCache)
         else:
     # Deleted in one and changed in the other
             cleanMerge = False
             if not aSha:
-                print 'CONFLICT (del/mod): "' + path + '" deleted in', \
+                print 'CONFLICT (del/mod):', fmtP(path), ' deleted in', \
                       branch1Name, 'and modified in', branch2Name, \
-                      '. Version', branch2Name, ' of "' + path + \
-                      '" left in tree'
+                      '. Version', branch2Name, 'of', fmtP(path), \
+                      'left in tree'
                 mode = bMode
                 sha = bSha
             else:
-                print 'CONFLICT (mod/del): "' + path + '" deleted in', \
+                print 'CONFLICT (mod/del):', fmtP(path), 'deleted in', \
                       branch2Name, 'and modified in', branch1Name + \
-                      '. Version', branch1Name, 'of "' + path + \
-                      '" left in tree'
+                      '. Version', branch1Name, 'of', path, \
+                      'left in tree'
                 mode = aMode
                 sha = aSha
 
-            updateFile(False, sha, mode, path)
+            updateFile(False, sha, mode, path, cleanCache)
     
     elif (not oSha and aSha     and not bSha) or \
          (not oSha and not aSha and bSha):
@@ -316,15 +536,15 @@ def processEntry(entry, branch1Name, bra
             cleanMerge = False
             newPath = uniquePath(path, addBranch)
             print 'CONFLICT (' + conf + \
-                  '): There is a directory with name "' + path + '" in', \
-                  otherBranch + '. Adding "' + path + '" as "' + newPath + '"'
+                  '): There is a directory with name', path, 'in', \
+                  otherBranch + '. Adding', fmtP(path), 'as', fmtP(newPath)
 
-            removeFile(False, path)
+            removeFile(False, path, cleanCache)
             path = newPath
         else:
-            print 'Adding "' + path + '"'
+            print 'Adding', fmtP(path)
 
-        updateFile(True, sha, mode, path)
+        updateFile(True, sha, mode, path, cleanCache)
     
     elif not oSha and aSha and bSha:
     #
@@ -333,13 +553,13 @@ def processEntry(entry, branch1Name, bra
         if aSha == bSha:
             if aMode != bMode:
                 cleanMerge = False
-                print 'CONFLICT: File "' + path + \
-                      '" added identically in both branches,', \
+                print 'CONFLICT: File', fmtP(path), \
+                      'added identically in both branches,', \
                       'but permissions conflict', '0%o' % aMode, '->', \
                       '0%o' % bMode
                 print 'CONFLICT: adding with permission:', '0%o' % aMode
 
-                updateFile(False, aSha, aMode, path)
+                updateFile(False, aSha, aMode, path, cleanCache)
             else:
                 # This case is handled by git-read-tree
                 assert(False)
@@ -347,49 +567,31 @@ def processEntry(entry, branch1Name, bra
             cleanMerge = False
             newPath1 = uniquePath(path, branch1Name)
             newPath2 = uniquePath(path, branch2Name)
-            print 'CONFLICT (add/add): File "' + path + \
-                  '" added non-identically in both branches.'
-            removeFile(False, path)
-            updateFile(False, aSha, aMode, newPath1)
-            updateFile(False, bSha, bMode, newPath2)
+            print 'CONFLICT (add/add): File', fmtP(path), \
+                  'added non-identically in both branches.'
+            removeFile(False, path, cleanCache)
+            updateFile(False, aSha, aMode, newPath1, cleanCache)
+            updateFile(False, bSha, bMode, newPath2, cleanCache)
 
     elif oSha and aSha and bSha:
     #
     # case D: Modified in both, but differently.
     #
         print 'Auto-merging', path 
-        orig = runProgram(['git-unpack-file', oSha]).rstrip()
-        src1 = runProgram(['git-unpack-file', aSha]).rstrip()
-        src2 = runProgram(['git-unpack-file', bSha]).rstrip()
-        [out, ret] = runProgram(['merge',
-                                 '-L', branch1Name + '/' + path,
-                                 '-L', 'orig/' + path,
-                                 '-L', branch2Name + '/' + path,
-                                 src1, orig, src2], returnCode=True)
-
-        if aMode == oMode:
-            mode = bMode
+        [sha, mode, clean, merge] = \
+              mergeFile(path, oSha, oMode, aSha, aMode, bSha, bMode,
+                        branch1Name, branch2Name)
+        if clean:
+            updateFile(True, sha, mode, path, cleanCache)
         else:
-            mode = aMode
-
-        sha = runProgram(['git-hash-object', '-t', 'blob', '-w',
-                          src1]).rstrip()
-
-        if ret != 0:
             cleanMerge = False
-            print 'CONFLICT (content): Merge conflict in "' + path + '".'
+            print 'CONFLICT (content): Merge conflict in', fmtP(path)
 
             if cleanCache:
-                updateFile(False, sha, mode, path)
+                updateFile(False, sha, mode, path, cleanCache)
             else:
-                updateFile(True, aSha, aMode, path)
-                updateFile(False, sha, mode, path, True)
-        else:
-            updateFile(True, sha, mode, path)
-
-        os.unlink(orig)
-        os.unlink(src1)
-        os.unlink(src2)
+                updateFile(True, aSha, aMode, path, cleanCache)
+                updateFile(False, sha, mode, path, cleanCache, True)
     else:
         die("ERROR: Fatal merge failure, shouldn't happen.")
 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Yet another set of merge test cases from the kernel tree.
  2005-09-19 16:13         ` Fredrik Kuivinen
@ 2005-09-20  1:53           ` Junio C Hamano
  2005-09-20  5:50             ` Fredrik Kuivinen
  0 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2005-09-20  1:53 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git

Fredrik Kuivinen <freku045@student.liu.se> writes:

> ... The support for
> renames is currently interleaved with other unrelated changes and
> this patch should not yet be considered for inclusion. With the rename
> patch we get clean merges that matches the actual commit for all of
> those except for 46f5.... git-merge-recursive.py do not merge
> 46f5... cleanly with the rename patch applied.

That is very interesting.

> +getFilesRE = re.compile('([0-9]+) ([a-z0-9]+) ([0-9a-f]{40})\t(.*)')
> +def mergeTrees(head, merge, common, branch1Name, branch2Name,
> +               cleanCache):
> ...
> +    if cleanCache:
> +        updateArg = '-i'
> +    else:
> +        updateArg = '-u'
> +
> +    runProgram(['git-read-tree', updateArg, '-m', common, head, merge])

It _might_ make sense if we make '-i' and '-u' imply instead of
require '-m' in read-tree, but that will be an independent
patch.

> +def mergeFile(path, oSha, oMode, aSha, aMode, bSha, bMode,
> +              branch1Name, branch2Name):
>...
> +    if aMode == oMode:
> +        mode = bMode
> +    else:
> +        mode = aMode

Note: preferring "our" mode if there is a conflict instead of
barfing.  I do not know which is more useful in practice.

> +
> +    return [sha, mode, clean, merge]
> +
> +def updateFile(clean, sha, mode, path, cleanCache, onlyWd=False):
> +    updateCache = not onlyWd and (cleanCache or (not cleanCache and clean))
> +    updateWd = onlyWd or (not cleanCache and clean)
> +
> +    if updateWd:
> +        prog = ['git-cat-file', 'blob', sha]
> +        if stat.S_ISREG(mode):
> +            try:
> +                os.unlink(path)
> +            except OSError:
> +                pass
> +            if mode & 0100:
> +                mode = 0777
> +            else:
> +                mode = 0666
> +            fd = os.open(path, os.O_WRONLY | os.O_TRUNC | os.O_CREAT, mode)
> +            proc = subprocess.Popen(prog, stdout=fd)
> +            proc.wait()
> +            os.close(fd)
> +        elif stat.S_ISLNK(mode):
> +            linkTarget = runProgram(prog)
> +            os.symlink(linkTarget, path)
> +        else:
> +            assert(False)

Could it happen trying to update a file 'foo' when the working
tree has 'foo/bar' file (i.e. D/F conflict)?

> +def getRenames(treeOrig, tree):
>...
> +def processRenames(entries, renamesA,...
>...

These are indeed very interesting.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Yet another set of merge test cases from the kernel tree.
  2005-09-20  1:53           ` Junio C Hamano
@ 2005-09-20  5:50             ` Fredrik Kuivinen
  0 siblings, 0 replies; 40+ messages in thread
From: Fredrik Kuivinen @ 2005-09-20  5:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, git

On Mon, Sep 19, 2005 at 06:53:49PM -0700, Junio C Hamano wrote:
> Fredrik Kuivinen <freku045@student.liu.se> writes:
> 
> > +def mergeFile(path, oSha, oMode, aSha, aMode, bSha, bMode,
> > +              branch1Name, branch2Name):
> >...
> > +    if aMode == oMode:
> > +        mode = bMode
> > +    else:
> > +        mode = aMode
> 
> Note: preferring "our" mode if there is a conflict instead of
> barfing.  I do not know which is more useful in practice.
> 

I just realized that this piece of code is broken with respect to type
changes. Trying to merge a symlink with a file should give us a
conflict.

> > +
> > +    return [sha, mode, clean, merge]
> > +
> > +def updateFile(clean, sha, mode, path, cleanCache, onlyWd=False):
> > +    updateCache = not onlyWd and (cleanCache or (not cleanCache and clean))
> > +    updateWd = onlyWd or (not cleanCache and clean)
> > +
> > +    if updateWd:
> > +        prog = ['git-cat-file', 'blob', sha]
> > +        if stat.S_ISREG(mode):
> > +            try:
> > +                os.unlink(path)
> > +            except OSError:
> > +                pass
> > +            if mode & 0100:
> > +                mode = 0777
> > +            else:
> > +                mode = 0666
> > +            fd = os.open(path, os.O_WRONLY | os.O_TRUNC | os.O_CREAT, mode)
> > +            proc = subprocess.Popen(prog, stdout=fd)
> > +            proc.wait()
> > +            os.close(fd)
> > +        elif stat.S_ISLNK(mode):
> > +            linkTarget = runProgram(prog)
> > +            os.symlink(linkTarget, path)
> > +        else:
> > +            assert(False)
> 
> Could it happen trying to update a file 'foo' when the working
> tree has 'foo/bar' file (i.e. D/F conflict)?
> 

We get a list of all files and directories in getFilesAndDirs. This
list is then used in processEntry to check for d/f conflicts, so it
shouldn't happen in that case.

The rename path do currently not check for d/f conflicts so it could
probably happen there.

- Fredrik

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2005-09-20  5:50 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-07 16:47 [PATCH 0/2] A new merge algorithm, take 3 Fredrik Kuivinen
2005-09-07 16:50 ` [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory Fredrik Kuivinen
2005-09-11  2:54   ` Unified merge driver pushed out to "master" branch Junio C Hamano
2005-09-11 21:05     ` Fredrik Kuivinen
2005-09-12  1:23       ` Junio C Hamano
2005-09-14  5:56     ` Another merge test case from the kernel tree Junio C Hamano
2005-09-14 16:11       ` Daniel Barkalow
2005-09-14 16:30         ` Junio C Hamano
2005-09-14 17:42       ` Fredrik Kuivinen
2005-09-14 17:51         ` Junio C Hamano
2005-09-15  0:47       ` Yet another set of merge test cases " Junio C Hamano
2005-09-19 16:13         ` Fredrik Kuivinen
2005-09-20  1:53           ` Junio C Hamano
2005-09-20  5:50             ` Fredrik Kuivinen
2005-09-07 16:51 ` [PATCH 2/2] A new merge algorithm Fredrik Kuivinen
2005-09-07 18:33 ` [PATCH 0/2] A new merge algorithm, take 3 Daniel Barkalow
2005-09-08  6:06   ` Fredrik Kuivinen
2005-09-08 15:27     ` Daniel Barkalow
2005-09-08 20:05       ` Fredrik Kuivinen
2005-09-08 21:27         ` Daniel Barkalow
2005-09-07 18:36 ` Junio C Hamano
     [not found]   ` <431F34FF.5050301@citi.umich.edu>
     [not found]     ` <7vvf1cz64l.fsf@assigned-by-dhcp.cox.net>
2005-09-08 15:06       ` Chuck Lever
2005-09-08 16:51         ` Junio C Hamano
2005-09-08 17:19           ` Linus Torvalds
2005-09-08 17:51             ` Junio C Hamano
2005-09-08 18:16             ` Chuck Lever
2005-09-08 18:35               ` Linus Torvalds
2005-09-08 18:58                 ` Chuck Lever
2005-09-08 20:59                   ` Linus Torvalds
2005-09-09  7:44                   ` [RFH] Merge driver Junio C Hamano
2005-09-09 16:05                     ` Daniel Barkalow
2005-09-09 16:43                       ` Junio C Hamano
2005-09-09 17:25                         ` Daniel Barkalow
2005-09-11  4:58                     ` Junio C Hamano
2005-09-12 21:08                     ` Fredrik Kuivinen
2005-09-12 21:16                       ` Junio C Hamano
2005-09-13 20:33                         ` Fredrik Kuivinen
2005-09-13 20:46                           ` Junio C Hamano
2005-09-08 20:54   ` [PATCH 0/2] A new merge algorithm, take 3 Junio C Hamano
2005-09-08 21:23     ` A Large Angry SCM

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).