git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-union-merge proposal
@ 2011-06-21  2:20 Joey Hess
  2011-06-21  5:22 ` Jonathan Nieder
  2011-06-21 17:44 ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Joey Hess @ 2011-06-21  2:20 UTC (permalink / raw
  To: GIT Mailing-list


[-- Attachment #1.1: Type: text/plain, Size: 2176 bytes --]

Lately some tools are storing data in git branches or refs, that is not
source code, and that is designed in some way to be automatically
merged. Generally merge=union will work for it, but the problem is that
git-merge can only operate on the checked out branch, not other
branches.

So these things all deal with merging in their own ad-hoc ways:

* pristine-tar commits the info it needs to reconstruct tarballs
  to a pristine-tar branch; files in the branch should not easily conflict
  as each includes the name of the tarball.. but when git pull
  cannot fast-forward the pristine-tar branch, the user is left to
  manually fix it.
* git-annex stores location tracking information to log files in
  .git-annex/; gitattributes is configured to use merge=union,
  and the log files have timestamps or are otherwise structured to be
  safely merged.
* git notes merge -s cat_sort_uniq
  Notes are stored in a tree using the object sha, which can be
  union merged, when the notes' format is a series of independant lines.
* probably other tools do things like this too, or will ...

So I've written a prototype of a git-union-merge that could be used
for all of these. It works like this:

git union-merge foo origin/foo refs/heads/foo 

That looks up foo and origin/foo and union merges them together,
producing the new branch refs/heads/foo. New blobs are injected
as needed for unioned files, and the merge commit is generated,
without affecting the current working tree, and without any
expensive checkouts of the branches. It's pretty fast, it only
needs to write out a temporary index file.

Prototype is attached, I doubt it would be suitable for git as-is,
but it does show how this is accomplished, if you've not already
seen how to do it -- just look for ls-tree, diff-tree,
show, hash-object, and update-index. Note that merging file modes is
not yet dealt with.

I imagine a git that can have union merge or other custom automated
merge strategies configured on a per-branch basis, so that git pull
automatically merges branches. That could be a good basis for adding
Fossil-like features to git.

-- 
see shy jo

[-- Attachment #1.2: git-union-merge.hs --]
[-- Type: text/x-haskell, Size: 3729 bytes --]

{- git-union-merge program
 -
 - Copyright 2011 Joey Hess <joey@kitenet.net>
 -
 - Licensed under the GNU GPL version 3 or higher.
 -}

import System.Environment
import System.FilePath
import System.Directory
import System.Cmd.Utils
import System.Posix.Env (setEnv)
import Control.Monad (when)
import Data.List
import Data.Maybe
import Data.String.Utils

import qualified GitRepo as Git
import Utility

header :: String
header = "Usage: git-union-merge ref ref newref"

usage :: IO a
usage = error $ "bad parameters\n\n" ++ header

main :: IO ()
main = do
	[aref, bref, newref] <- parseArgs
	g <- setup
	stage g aref bref
	commit g aref bref newref
	cleanup g

parseArgs :: IO [String]
parseArgs = do
	args <- getArgs
	if (length args /= 3)
		then usage
		else return args

tmpIndex :: Git.Repo -> FilePath
tmpIndex g = Git.workTree g </> Git.gitDir g </> "index.git-union-merge"

{- Configures git to use a temporary index file. -}
setup :: IO Git.Repo
setup = do
	g <- Git.configRead =<< Git.repoFromCwd
	cleanup g -- idempotency
	setEnv "GIT_INDEX_FILE" (tmpIndex g) True
	return g

cleanup :: Git.Repo -> IO ()
cleanup g = do
	e' <- doesFileExist (tmpIndex g)
	when e' $ removeFile (tmpIndex g)

{- Stages the content of both refs into the index. -}
stage :: Git.Repo -> String -> String -> IO ()
stage g aref bref = do
	-- Get the contents of aref, as a starting point.
	ls <- fromgit
		["ls-tree", "-z", "-r", "--full-tree", aref]
	-- Identify files that are different between aref and bref, and
	-- inject merged versions into git.
	diff <- fromgit
		["diff-tree", "--raw", "-z", "-r", "--no-renames", "-l0", aref, bref]
	ls' <- mapM mergefile (pairs diff)
	-- Populate the index file. Later lines override earlier ones.
	togit ["update-index", "-z", "--index-info"]
		(join "\0" $ ls++catMaybes ls')
	where
		fromgit l = Git.pipeNullSplit g (map Param l)
		togit l content = Git.pipeWrite g (map Param l) content
			>>= forceSuccess
		tofromgit l content = do
			(h, s) <- Git.pipeWriteRead g (map Param l) content
			length s `seq` do
				forceSuccess h
				Git.reap
				return ((), s)

		pairs [] = []
		pairs (_:[]) = error "parse error"
		pairs (a:b:rest) = (a,b):pairs rest
		
		nullsha = take shaSize $ repeat '0'
		ls_tree_line sha file = "100644 blob " ++ sha ++ "\t" ++ file
		unionmerge = unlines . nub . lines
		
		mergefile (info, file) = do
			let [_colonamode, _bmode, asha, bsha, _status] = words info
			if bsha == nullsha
				then return Nothing -- already staged from aref
				else mergefile' file asha bsha
		mergefile' file asha bsha = do
			let shas = filter (/= nullsha) [asha, bsha]
			content <- Git.pipeRead g $ map Param ("show":shas)
			sha <- getSha "hash-object" $
				tofromgit ["hash-object", "-w", "--stdin"] $
					unionmerge content
			return $ Just $ ls_tree_line sha file

{- Commits the index into the specified branch. -}
commit :: Git.Repo -> String -> String -> String -> IO ()
commit g aref bref newref = do
	tree <- getSha "write-tree"  $
		pipeFrom "git" ["write-tree"]
	sha <- getSha "commit-tree" $
		pipeBoth "git" ["commit-tree", tree, "-p", aref, "-p", bref]
			"union merge"
	Git.run g "update-ref" [Param newref, Param sha]

{- Runs an action that causes a git subcommand to emit a sha, and strips
   any trailing newline, returning the sha. -}
getSha :: String -> IO (a, String) -> IO String
getSha subcommand a = do
	(_, t) <- a
	let t' = if last t == '\n'
		then take (length t - 1) t
		else t
	when (length t' /= shaSize) $
		error $ "failed to read sha from git " ++ subcommand ++ " (" ++ t' ++ ")"
	return t'

shaSize :: Int
shaSize = 40

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-union-merge proposal
  2011-06-21  2:20 git-union-merge proposal Joey Hess
@ 2011-06-21  5:22 ` Jonathan Nieder
  2011-06-21  7:34   ` Johan Herland
  2011-06-21 17:44 ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Jonathan Nieder @ 2011-06-21  5:22 UTC (permalink / raw
  To: Joey Hess; +Cc: GIT Mailing-list, Johan Herland

Hi Joey,

Joey Hess wrote[1]:

> * git-annex stores location tracking information to log files in
>   .git-annex/; gitattributes is configured to use merge=union,
>   and the log files have timestamps or are otherwise structured to be
>   safely merged.
> * git notes merge -s cat_sort_uniq
>   Notes are stored in a tree using the object sha, which can be
>   union merged, when the notes' format is a series of independant lines.
> * probably other tools do things like this too, or will ...
>
> So I've written a prototype of a git-union-merge that could be used
> for all of these. It works like this:
>
> git union-merge foo origin/foo refs/heads/foo 

Hm, this makes a lot of sense.  Often a person needs a worktree anyway
to check the merge result for sanity, but as you say, that needn't
always be the case.

[...]
> Prototype is attached, I doubt it would be suitable for git as-is,

Does the GitRepo module that it uses come from git-annex?

If the prototype were self-contained, I would encourage you to submit
it for inclusion under contrib/ so it can evolve and eventually
graduate out of there.  Cc-ing Johan (who has no doubt thought through
these things in the context of "git notes") in case he has thoughts on
it.

Regards,
Jonathan

[1] http://thread.gmane.org/gmane.comp.version-control.git/176119

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-union-merge proposal
  2011-06-21  5:22 ` Jonathan Nieder
@ 2011-06-21  7:34   ` Johan Herland
  2011-06-21 16:10     ` Joey Hess
  0 siblings, 1 reply; 8+ messages in thread
From: Johan Herland @ 2011-06-21  7:34 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: git, Joey Hess

On Tuesday 21 June 2011, Jonathan Nieder wrote:
> Joey Hess wrote[1]:
> > * git notes merge -s cat_sort_uniq
> > 
> >   Notes are stored in a tree using the object sha, which can be
> >   union merged, when the notes' format is a series of independant
> >   lines.
> > 
> > [...]
> > 
> > So I've written a prototype of a git-union-merge that could be used
> > for all of these. It works like this:
> > 
> > git union-merge foo origin/foo refs/heads/foo
> 
> [...]
> 
> If the prototype were self-contained, I would encourage you to submit
> it for inclusion under contrib/ so it can evolve and eventually
> graduate out of there.  Cc-ing Johan (who has no doubt thought through
> these things in the context of "git notes") in case he has thoughts on
> it.

Thanks for the CC.

I must confess that my Haskell skills are exactly nil, but AFAICS the script 
depends on the filename as the only criteria to identify files that need a 
line-level merge. How does the script deal with renamed and copied files?

If you depend on the filename only, this script simply will not work for 
notes. The notes tree reorganizes itself dynamically for optimum 
performance, and this affects how notes trees can be merged.

E.g. given a note for object 01234567..., this note may exist as 
"01234567..." in one notes tree, while it may exist as "01/234567..." in a 
bigger notes tree, or even "01/23/4567..." in an even bigger notes tree. 
Even though the filenames differ, they all refer to the same note, and you 
cannot merge notes trees correctly without taking that fact into account.

Furthermore, if you (union-)merge two notes trees that both have 
"01/234567...", the result does not necessarily belong in "01/234567...". It 
could be that the sum/union of the two notes trees have pushed the number of 
notes in the result so high that "01/23/4567..." is now a more optimal name 
for this note.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-union-merge proposal
  2011-06-21  7:34   ` Johan Herland
@ 2011-06-21 16:10     ` Joey Hess
  0 siblings, 0 replies; 8+ messages in thread
From: Joey Hess @ 2011-06-21 16:10 UTC (permalink / raw
  To: Johan Herland; +Cc: Jonathan Nieder, git

[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]

Jonathan Nieder wrote:
> Does the GitRepo module that it uses come from git-annex?
> 
> If the prototype were self-contained, I would encourage you to submit
> it for inclusion under contrib/ so it can evolve and eventually
> graduate out of there.  Cc-ing Johan (who has no doubt thought through
> these things in the context of "git notes") in case he has thoughts on
> it.

Yes, this was written in the context of git-annex. I would probably not want
to submit the haskell implementation to contrib/, but a shell implementation
could be done that would be perhaps less robust, but also less unusual in
the context of git's code base.

Johan Herland wrote:
> I must confess that my Haskell skills are exactly nil, but AFAICS the script 
> depends on the filename as the only criteria to identify files that need a 
> line-level merge. How does the script deal with renamed and copied files?
> 
> If you depend on the filename only, this script simply will not work for 
> notes.

It simply depends on filenames. I saw there was additional complexity
in notes and I don't see how a general purpose merger can handle that.
(I wish I could just use notes for my application, but I have data that
is not tied to any one object in git.)

Although, this is an obvious extension that would add some flexability
to handle for files that cannot be merged with a naive union:

git union-merge foo origin/foo refs/heads/foo -c "sort * | uniq"

Where the files would be passed in as temp files.

Hmm, that makes it look not unlike git-filter-branch, except
it's generating a new commit at the tip. I *think* that filter-branch
can't do this.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-union-merge proposal
  2011-06-21  2:20 git-union-merge proposal Joey Hess
  2011-06-21  5:22 ` Jonathan Nieder
@ 2011-06-21 17:44 ` Junio C Hamano
  2011-06-21 18:12   ` Junio C Hamano
  2011-06-21 18:41   ` Joey Hess
  1 sibling, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2011-06-21 17:44 UTC (permalink / raw
  To: Joey Hess; +Cc: GIT Mailing-list

Joey Hess <joey@kitenet.net> writes:

> Lately some tools are storing data in git branches or refs, that is not
> source code, and that is designed in some way to be automatically
> merged. Generally merge=union will work for it, but the problem is that
> git-merge can only operate on the checked out branch, not other
> branches.

I think linking "union" merge too tightly into this is going in a wrong
direction. We _could_ do certain merges without using the working tree at
all, and the design of "git merge" has always been to perform the merge
entirely in index. We do check out the contents of cleanly resolved paths
that are different from the merged-into branch to the working tree, but it
is perfectly fine if we made it optional. We also do write out half-merged
content to the working tree, but that is merely to ask the user to help
finishing the merge that is happening in the index (iow, the ultimate goal
is to let the user say "git add" to tell the index what the resolution is,
and is not to let the user remove <<< === >>> markers in the file in the
working tree). If there is no conflict, we should not have to touch the
working tree at all, and "union" is a very narrow special case that we
declare there is no conflict (even if there was).

In other words, I would prefer to see something like:

	$ git merge --index-only [-s <strategy>] <other_branch>

which

    (0) does work without any file checked out in the working tree;

    (1) does not update a path in the working tree even if the merge
        result for the path is different from the original index entry for
        the path;

    (2) updates the index only when everything cleanly merges (depending
        on the definition of "cleanly merges", e.g. "union" may be a lot
        more lenient than the usual "text" merge) and aborts without
        touching anything if there is a conflict (because --index-only
        does not allow us to touch working tree to ask the user to resolve
        the conflict).

"git merge" is designed to work without any file checked out in the
working tree, by considering a _missing_ file in the working tree as if
there is _no change_ to the path during a merge. IOW, we do not say "you
have an uncommitted local removal of a path, which other side tried to
modify, hence we stop the merge to protect your local change".

This is so that you can do something like this:

	$ git checkout v1.7.6-rc2^0
        $ git reset --hard
	$ rm -fr .temp-workdir
	$ mkdir .temp-workdir
        $ cd .temp-workdir
        $ export GIT_DIR=$(git rev-parse --git-dir)
	$ export GIT_WORK_TREE=$(pwd) ;# this is optional, I think.
        $ git merge -s resolve origin/jk/maint-1.7.2-status-ignored
        Trying really trivial in-index merge...
        error: Merge requires file-level merging
        Nope.
        Trying simple merge.
        Simple merge failed, trying Automatic merge.
        Auto-merging Documentation/git-status.txt
        ERROR: content conflict in Documentation/git-status.txt
        Auto-merging t/t7508-status.sh
        ERROR: content conflict in t/t7508-status.sh
        Auto-merging wt-status.c
        fatal: merge program failed
        Automatic merge failed; fix conflicts and then commit the result.
        : alter .temp-workdir/master|MERGING; ls
        ./  ../  Documentation/  t/  wt-status.c

Because .temp-workdir is empty when merge is run, we consider that your
working tree exactly matches what is in your index. We do check out the
cleanly merged result to this temporary working tree (wt-status.c is
cleanly merged and result can be seen there), but it is not strictly
necessary (IOW we could make that part optional). We do write conflicted
half-merge result, as that is the easiest way for the user to help the
index resolve it.

    Side Note: the "recursive" strategy is so broken that it may assume
    the working tree has to be populated and the above may not work as
    nicely.

In other words, we are already half-way there, I think.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-union-merge proposal
  2011-06-21 17:44 ` Junio C Hamano
@ 2011-06-21 18:12   ` Junio C Hamano
  2011-06-21 18:41   ` Joey Hess
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2011-06-21 18:12 UTC (permalink / raw
  To: GIT Mailing-list; +Cc: Joey Hess

Junio C Hamano <gitster@pobox.com> writes:

> In other words, I would prefer to see something like:
>
> 	$ git merge --index-only [-s <strategy>] <other_branch>
>
> which
>
>     (0) does work without any file checked out in the working tree;
>
>     (1) does not update a path in the working tree even if the merge
>         result for the path is different from the original index entry for
>         the path;
>
>     (2) updates the index only when everything cleanly merges (depending
>         on the definition of "cleanly merges", e.g. "union" may be a lot
>         more lenient than the usual "text" merge) and aborts without
>         touching anything if there is a conflict (because --index-only
>         does not allow us to touch working tree to ask the user to resolve
>         the conflict).
>
> "git merge" is designed to work without any file checked out in the
> working tree, by considering a _missing_ file in the working tree as if
> there is _no change_ to the path during a merge. IOW, we do not say "you
> have an uncommitted local removal of a path, which other side tried to
> modify, hence we stop the merge to protect your local change".
>
> This is so that you can do something like this:
> ...
> In other words, we are already half-way there, I think.

Addendum.

People have wished on this list who have two or more branches, e.g. "work"
and "play", based on the same upstream branch to be able to do:

	$ git fetch origin
	$ git checkout work
	$ git merge @{u}
        $ git checkout play
        $ git merge @{u}
	$ git checkout work

i.e. integrate upstream changes without having to check out "play" branch
first, if there is no conflict. The standard answer has been "You cannot,
because you may not know if there will be a conflict until you try".

But the existing "merge in a temporary working tree that is empty to start
with" support is a good way to implement it.  You would do something like
this after you are on your "work" branch and finished merging from its
upstream (package this up in a "git simplemerge" script):

	rm -fr .t && mkdir .t && cd .t &&
        GIT_DIR=$(git rev-parse --git-dir) &&
        GIT_INDEX_FILE=$GIT_DIR/temp-index &&
        GIT_WORK_TREE=$(pwd) &&
        export GIT_DIR GIT_INDEX_FILE GIT_WORK_TREE &&
	save_head &&
        git symbolic-ref refs/heads/play HEAD
        git read-tree play &&
        (
	        git merge --no-commit play@{upstream}
	)
	restore_head

If the merge goes well without conflict, you write the temporary index out
to a tree, create a commit and update the "play" branch (save_head should
save away the current branch and restore_head should restore it).

If the merge conflicts, you _could_ ask the user to resolve it in .t/
directory (with these updated GIT_DIR/GIT_INDEX_FILE/GIT_WORK_TREE), write
the result out to a tree, create a commit and update the "play"
branch. This is entirely optional, as the common request is to do this
only when the merge is trivial and there is no conflict.

And if we are allowed to punt when there is a conflict, we do not need the
temporary working tree at all if we had --index-only option.  We only need
to save and restore the HEAD pointer and the index file so that we can
continue working on the "work" branch after we are done.

This incidentally is the reason why I said "update the index" and not
"create a commit" when I specified the behaviour of "--index-only" in the
previous message.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-union-merge proposal
  2011-06-21 17:44 ` Junio C Hamano
  2011-06-21 18:12   ` Junio C Hamano
@ 2011-06-21 18:41   ` Joey Hess
  2011-06-21 20:19     ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Joey Hess @ 2011-06-21 18:41 UTC (permalink / raw
  To: GIT Mailing-list

[-- Attachment #1: Type: text/plain, Size: 301 bytes --]

Junio C Hamano wrote:
> In other words, I would prefer to see something like:
> 
> 	$ git merge --index-only [-s <strategy>] <other_branch>

Would this allow merging changes into a branch other than HEAD? If so, I
agree, I'd prefer to see it in git-merge itself if possible.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-union-merge proposal
  2011-06-21 18:41   ` Joey Hess
@ 2011-06-21 20:19     ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2011-06-21 20:19 UTC (permalink / raw
  To: Joey Hess; +Cc: GIT Mailing-list

Joey Hess <joey@kitenet.net> writes:

> Junio C Hamano wrote:
>> In other words, I would prefer to see something like:
>> 
>> 	$ git merge --index-only [-s <strategy>] <other_branch>
>
> Would this allow merging changes into a branch other than HEAD? If so, I
> agree, I'd prefer to see it in git-merge itself if possible.

I just said "I would prefer to see", so allowing or not allowing is up to
you ;-).  If course I would prefer to see it not touch HEAD so that your
working envirnoment (checked out branch and the working tree) are not
disrupted.  See my other message in the thread.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-06-21 20:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-21  2:20 git-union-merge proposal Joey Hess
2011-06-21  5:22 ` Jonathan Nieder
2011-06-21  7:34   ` Johan Herland
2011-06-21 16:10     ` Joey Hess
2011-06-21 17:44 ` Junio C Hamano
2011-06-21 18:12   ` Junio C Hamano
2011-06-21 18:41   ` Joey Hess
2011-06-21 20:19     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).