git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Re: Merge with git-pasky II.
  2005-04-14  0:29 Merge with git-pasky II Petr Baudis
@ 2005-04-13 21:25 ` Christopher Li
  2005-04-14  0:45   ` Petr Baudis
  2005-04-14  0:30 ` Petr Baudis
  2005-04-14 22:11 ` git merge Petr Baudis
  2 siblings, 1 reply; 148+ messages in thread
From: Christopher Li @ 2005-04-13 21:25 UTC (permalink / raw)
  To: Petr Baudis; +Cc: torvalds, git

While you are there, do you mind to move the shell script
to a sub directory? Let's try how rename works.

Chris

On Thu, Apr 14, 2005 at 02:29:02AM +0200, Petr Baudis wrote:
>   Hello Linus,
> 
>   I think my tree should be ready for merging with you. It is the final
> tree and I've already switched my main branch for it, so it's what
> people doing git pull are getting for some time already.
> 
>   Its main contents are all of my shell scripts. Apart of that, some
> tiny fixes scattered all around can be found there, as well as some
> patches which went through the mailing list. My last merge with you
> concerned your commit 39021759c903a943a33a28cfbd5070d36d851581.
> 
>   It's again
> 
> 	rsync://pasky.or.cz/git/
> 
> this time my HEAD is fba83970090ef54c6eb86dcc2c2d5087af5ac637.
> 
>   Note that my rsync tree still contains even my old branch; I thought
> I'd leave it around in the public objects database for some time, shall
> anyone want to have a look at the history of some of the scripts. But if
> you want it gone, tell me and I will prune it (and perhaps offer it in
> /git-old/ or whatever). I'm using the following:
> 
> 	fsck-cache --unreachable $(commit-id) | grep unreachable \
> 		| cut -d ' ' -f 2 | sed 's/^\(..\)/.git\/objects\/\1\//' \
> 		| xargs rm
> 
>   Thanks,
> 
> -- 
> 				Petr "Pasky" Baudis
> Stuff: http://pasky.or.cz/
> C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14  0:45   ` Petr Baudis
@ 2005-04-13 22:00     ` Christopher Li
  2005-04-14  3:51     ` Linus Torvalds
  1 sibling, 0 replies; 148+ messages in thread
From: Christopher Li @ 2005-04-13 22:00 UTC (permalink / raw)
  To: Petr Baudis; +Cc: torvalds, git


On Thu, Apr 14, 2005 at 02:45:04AM +0200, Petr Baudis wrote:
> Dear diary, on Wed, Apr 13, 2005 at 11:25:46PM CEST, I got a letter
> where Christopher Li <git@chrisli.org> told me that...
> Well, unless Linus will want me otherwise, I'd like to postpone this
> until I'm finally done with the damn merge - enough things already got
> into my way today, so I would really like to focus on this tomorrow. So
> I'll be probably merging only (or mostly) bugfixes until I have that
> finished.

Sure, whenever you are ready.

> P.S.: Just staring at
> http://www.theregister.co.uk/2005/04/11/torvalds_attack/ ... I'm nothing
> like a regular reader of (R), but I thought the guys have at least a bit
> of sense. Duh. :/ Or is April 11 now yet another joke day after April 1?

Whatever, is the news site. They never mention git though.

Chris


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Merge with git-pasky II.
@ 2005-04-14  0:29 Petr Baudis
  2005-04-13 21:25 ` Christopher Li
                   ` (2 more replies)
  0 siblings, 3 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-14  0:29 UTC (permalink / raw)
  To: torvalds; +Cc: git

  Hello Linus,

  I think my tree should be ready for merging with you. It is the final
tree and I've already switched my main branch for it, so it's what
people doing git pull are getting for some time already.

  Its main contents are all of my shell scripts. Apart of that, some
tiny fixes scattered all around can be found there, as well as some
patches which went through the mailing list. My last merge with you
concerned your commit 39021759c903a943a33a28cfbd5070d36d851581.

  It's again

	rsync://pasky.or.cz/git/

this time my HEAD is fba83970090ef54c6eb86dcc2c2d5087af5ac637.

  Note that my rsync tree still contains even my old branch; I thought
I'd leave it around in the public objects database for some time, shall
anyone want to have a look at the history of some of the scripts. But if
you want it gone, tell me and I will prune it (and perhaps offer it in
/git-old/ or whatever). I'm using the following:

	fsck-cache --unreachable $(commit-id) | grep unreachable \
		| cut -d ' ' -f 2 | sed 's/^\(..\)/.git\/objects\/\1\//' \
		| xargs rm

  Thanks,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  0:29 Merge with git-pasky II Petr Baudis
  2005-04-13 21:25 ` Christopher Li
@ 2005-04-14  0:30 ` Petr Baudis
  2005-04-14 22:11 ` git merge Petr Baudis
  2 siblings, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-14  0:30 UTC (permalink / raw)
  To: torvalds; +Cc: git

Dear diary, on Thu, Apr 14, 2005 at 02:29:02AM CEST, I got a letter
where Petr Baudis <pasky@ucw.cz> told me that...
>   Its main contents are all of my shell scripts. Apart of that, some
> tiny fixes scattered all around can be found there, as well as some
> patches which went through the mailing list. My last merge with you
> concerned your commit 39021759c903a943a33a28cfbd5070d36d851581.
> 
>   It's again
> 
> 	rsync://pasky.or.cz/git/
> 
> this time my HEAD is fba83970090ef54c6eb86dcc2c2d5087af5ac637.

I forgot to add that after merging, you will probably want to change the
VERSION file (to contain whatever you want).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-13 21:25 ` Christopher Li
@ 2005-04-14  0:45   ` Petr Baudis
  2005-04-13 22:00     ` Christopher Li
  2005-04-14  3:51     ` Linus Torvalds
  0 siblings, 2 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-14  0:45 UTC (permalink / raw)
  To: Christopher Li; +Cc: torvalds, git

Dear diary, on Wed, Apr 13, 2005 at 11:25:46PM CEST, I got a letter
where Christopher Li <git@chrisli.org> told me that...
> While you are there, do you mind to move the shell script
> to a sub directory? Let's try how rename works.

Well, unless Linus will want me otherwise, I'd like to postpone this
until I'm finally done with the damn merge - enough things already got
into my way today, so I would really like to focus on this tomorrow. So
I'll be probably merging only (or mostly) bugfixes until I have that
finished.

P.S.: Just staring at
http://www.theregister.co.uk/2005/04/11/torvalds_attack/ ... I'm nothing
like a regular reader of (R), but I thought the guys have at least a bit
of sense. Duh. :/ Or is April 11 now yet another joke day after April 1?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14  3:51     ` Linus Torvalds
@ 2005-04-14  1:23       ` Christopher Li
  2005-04-14  5:03         ` Paul Jackson
  2005-04-14  7:05       ` Junio C Hamano
  1 sibling, 1 reply; 148+ messages in thread
From: Christopher Li @ 2005-04-14  1:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git

On Wed, Apr 13, 2005 at 08:51:50PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 14 Apr 2005, Petr Baudis wrote:
> 
> Thick skin is the name of the game. I'd not get any work done otherwise.
> 
> On that note - I've been avoiding doing the merge-tree thing, in the hope 
> that somebody else does what I've described. I really do suck at scripting 
> things, yet this is clearly something where using C to do a lot of the 
> stuff is pointless.
> 
> Almost all the parts do seem to be there, ie Daniel did the "common 
> parent" part, and the rest really does seem to be more about scripting 
> than writing more C plumbing stuff..

Do you have preference about what language of script we used? I actually
hesitated to introduce my Python script to git.

I can build some script extension for git just like the one I did for
sparse, is that some thing you want to see? 

Chris


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  5:03         ` Paul Jackson
@ 2005-04-14  2:16           ` Christopher Li
  2005-04-14  6:16             ` Paul Jackson
  0 siblings, 1 reply; 148+ messages in thread
From: Christopher Li @ 2005-04-14  2:16 UTC (permalink / raw)
  To: Paul Jackson; +Cc: torvalds, pasky, git

On Wed, Apr 13, 2005 at 10:03:41PM -0700, Paul Jackson wrote:
> 
> If you have a thin skin or tend to annoy others with a bit too much
> attitude or can't pass up a good language war (which is my failing, and
> why I am responding to a discussion that I've not been involved in for
> days) then the resulting flamage could be distracting.

Oh, my bad. I am not trying to start a language war here.
That is why I am hesitated about Python.
Just try to find out the acceptability. No pushing.

Chris 


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14  0:45   ` Petr Baudis
  2005-04-13 22:00     ` Christopher Li
@ 2005-04-14  3:51     ` Linus Torvalds
  2005-04-14  1:23       ` Christopher Li
  2005-04-14  7:05       ` Junio C Hamano
  1 sibling, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-14  3:51 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Christopher Li, git



On Thu, 14 Apr 2005, Petr Baudis wrote:
>
> http://www.theregister.co.uk/2005/04/11/torvalds_attack/ ... I'm nothing
> like a regular reader of (R), but I thought the guys have at least a bit
> of sense. Duh. :/ Or is April 11 now yet another joke day after April 1?

I actually _am_ a fairly regular reader, and hey, being opinionated and a 
bit over the top is what makes the site worthwhile. It's obviously what 
motivates the people. 

And then, occasionally, when they bite you, hey, that's the price of
having a high profile. I worry more about sometimes not listening to
critics than I do about the critics themselves.

Thick skin is the name of the game. I'd not get any work done otherwise.

On that note - I've been avoiding doing the merge-tree thing, in the hope 
that somebody else does what I've described. I really do suck at scripting 
things, yet this is clearly something where using C to do a lot of the 
stuff is pointless.

Almost all the parts do seem to be there, ie Daniel did the "common 
parent" part, and the rest really does seem to be more about scripting 
than writing more C plumbing stuff..

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  1:23       ` Christopher Li
@ 2005-04-14  5:03         ` Paul Jackson
  2005-04-14  2:16           ` Christopher Li
  0 siblings, 1 reply; 148+ messages in thread
From: Paul Jackson @ 2005-04-14  5:03 UTC (permalink / raw)
  To: Christopher Li; +Cc: torvalds, pasky, git

> Do you have preference about what language of script we used? 

Do you have a thick skin? <grin>

Can you easily ignore language wars with an amused wave of the hand
and a happy chuckle at the oh so predictable weaknesses of humans?

Then I'd wager it will be fine.

If you have a thin skin or tend to annoy others with a bit too much
attitude or can't pass up a good language war (which is my failing, and
why I am responding to a discussion that I've not been involved in for
days) then the resulting flamage could be distracting.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  2:16           ` Christopher Li
@ 2005-04-14  6:16             ` Paul Jackson
  0 siblings, 0 replies; 148+ messages in thread
From: Paul Jackson @ 2005-04-14  6:16 UTC (permalink / raw)
  To: Christopher Li; +Cc: torvalds, pasky, git

> Oh, my bad. I am not trying to start a language war here.

Neither am I - no problem what so ever. <chuckle ...>

Besides, I think we'd be on the same side.

My point was only a gentle one -- as is often the case when dealing with
the strange species called human, whether or not you can get away with
something is often a simple matter of ones attitude.

There is one Python script already in the kernel: scripts/show_delta.
But it's too small a sample to mean much.

I think first thing is "get it right."  Python is good for that
in the hands of someone who enjoys coding in it.

I wish you well.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  3:51     ` Linus Torvalds
  2005-04-14  1:23       ` Christopher Li
@ 2005-04-14  7:05       ` Junio C Hamano
  2005-04-14  8:06         ` Linus Torvalds
  1 sibling, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-14  7:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Christopher Li, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On that note - I've been avoiding doing the merge-tree thing, in the hope 
LT> that somebody else does what I've described.

I now have a Perl script that uses rev-tree, cat-file,
diff-tree, show-files (with one modification so that it can deal
with pathnames with embedded newlines), update-cache (with one
modification so that I can add an entry for a file that does not
exist to the dircache) and merge (from RCS).  Quick and dirty.

The changes to show-files is to give it an optional '-z' flag,
which chanegs record terminator to NUL character instead of LF.

The script git-merge.perl takes two head commits.  It basically
follows what you described as I remember ;-):

 1. runs rev-tree with --edges to find the common anscestor.

 2. creates a temporary directory "./,,merge-temp"; create a
    symlink ./,,merge-temp/.git/objects that points at
    .git/objects.

 3. sets up dircache there, initially populated with this common
    ancestor tree.  No files are checked out.  Just set up
    .git/index and that's it.

 4. runs diff-tree to find what has been changed in each head.

 5. for each path involved:

  5.0 if neither heads change it, leave it as is;
  5.1 if only one head changes a path and the other does not, just
      get the changed version;
  5.2 if both heads change it, check all three out and run merge.

It does not currently commit.  You can go to ./,,merge-temp/ and
see show-diff to see the result of the merge.  Files added in
one head has already been run "update-cache" when the script
ends, but changed and merged files are not---dircache still has
the common ancestor view.  So show-diff you will be seeing may
be enormous and not very useful if two forks were done in the
distant past.  After reviewing the merge result, you can
update-cache, write-tree and commit-tree as usual, but with one
caveat:  do not run "show-files | xargs update-cache" if you are
running git-merge.perl without -f flag!

By default, git-merge.perl creates absolute minimum number of
files in ./,,merge-temp---only the merged files are left there
so that you can inspect them.  You will not see unmodified
files nor files changed only by one side of the merge.

If you give '-o' (oneside checkout) flag to git-merge.perl, then
the files only one side of the merge changed are also checked
out in ./,,merge-temp.  If you give '-f' (full checkout) flag to
git-merge.perl, then in addition to what '-o' checks out,
unchanged files are checked out in ./,,merge-temp.  This default
is geared towards a huge tree with small merges (favorite case
of Linus, if I understand correctly).

Running 'show-diff' in such a sparsely populated merge result
tree gives you huge results because recent show-diff shows diffs
with empty files.  I added a '-r' flag to show-diff, which
squelches diffs with empty files.

Also to implement 'changed only by one-side' without actually
checking the file out, I needed to add one option to
'update-cache'.  --cacheinfo flag is used this way:

    $ update-cache --cacheinfo mode sha1 path

and adds the pathname with mode and sha1 to the .git/index
without actually requiring you to have such a file there.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 show-diff.c    |   11 ++-
 show-files.c   |   12 ++-
 update-cache.c |   25 +++++++
 git-merge.perl |  193 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 234 insertions(+), 7 deletions(-)


show-diff.c:  a531ca4078525d1c8dcf84aae0bfa89fed6e5d96
--- show-diff.c
+++ show-diff.c	2005-04-13 22:47:33.000000000 -0700
@@ -58,15 +58,20 @@
 int main(int argc, char **argv)
 {
 	int silent = 0;
+	int silent_on_nonexisting_files = 0;
 	int entries = read_cache();
 	int i;
 
 	while (argc-- > 1) {
 		if (!strcmp(argv[1], "-s")) {
-			silent = 1;
+			silent_on_nonexisting_files = silent = 1;
 			continue;
 		}
-		usage("show-diff [-s]");
+		if (!strcmp(argv[1], "-r")) {
+			silent_on_nonexisting_files = 1;
+			continue;
+		}
+		usage("show-diff [-s] [-r]");
 	}
 
 	if (entries < 0) {
@@ -83,7 +88,7 @@
 
 		if (stat(ce->name, &st) < 0) {
 			printf("%s: %s\n", ce->name, strerror(errno));
-			if (errno == ENOENT && !silent)
+			if (errno == ENOENT && !silent_on_nonexisting_files)
 				show_diff_empty(ce);
 			continue;
 		}
show-files.c:  a9fa6767a418f870a34b39379f417bf37b17ee18
--- show-files.c
+++ show-files.c	2005-04-13 21:18:40.000000000 -0700
@@ -14,6 +14,7 @@
 static int show_cached = 0;
 static int show_others = 0;
 static int show_ignored = 0;
+static int line_terminator = '\n';
 
 static const char **dir;
 static int nr_dir;
@@ -105,12 +106,12 @@
 	}
 	if (show_others) {
 		for (i = 0; i < nr_dir; i++)
-			printf("%s\n", dir[i]);
+			printf("%s%c", dir[i], line_terminator);
 	}
 	if (show_cached) {
 		for (i = 0; i < active_nr; i++) {
 			struct cache_entry *ce = active_cache[i];
-			printf("%s\n", ce->name);
+			printf("%s%c", ce->name, line_terminator);
 		}
 	}
 	if (show_deleted) {
@@ -119,7 +120,7 @@
 			struct stat st;
 			if (!stat(ce->name, &st))
 				continue;
-			printf("%s\n", ce->name);
+			printf("%s%c", ce->name, line_terminator);
 		}
 	}
 	if (show_ignored) {
@@ -134,6 +135,11 @@
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
 
+		if (!strcmp(arg, "-z")) {
+			line_terminator = 0;
+			continue;
+		}
+
 		if (!strcmp(arg, "--cached")) {
 			show_cached = 1;
 			continue;
update-cache.c:  8f149d5a4ab60e030a0ab19fdb59b8ee2576ee71
--- update-cache.c
+++ update-cache.c	2005-04-13 23:27:54.000000000 -0700
@@ -203,6 +203,8 @@
 {
 	int i, newfd, entries;
 	int allow_options = 1;
+	const char *sha1_force = NULL;
+	const char *mode_force = NULL;
 
 	newfd = open(".git/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600);
 	if (newfd < 0)
@@ -235,14 +237,35 @@
 				refresh_cache();
 				continue;
 			}
+			if (!strcmp(path, "--cacheinfo")) {
+				mode_force = argv[++i];
+				sha1_force = argv[++i];
+				continue;
+			}
 			die("unknown option %s", path);
 		}
 		if (!verify_path(path)) {
 			fprintf(stderr, "Ignoring path %s\n", argv[i]);
 			continue;
 		}
-		if (add_file_to_cache(path))
+		if (sha1_force && mode_force) {
+			struct cache_entry *ce;
+			int namelen = strlen(path);
+			int mode;
+			int size = cache_entry_size(namelen);
+			sscanf(mode_force, "%o", &mode);
+			ce = malloc(size);
+			memset(ce, 0, size);
+			memcpy(ce->name, path, namelen);
+			ce->namelen = namelen;
+			ce->st_mode = mode;
+			get_sha1_hex(sha1_force, ce->sha1);
+
+			add_cache_entry(ce, 1);
+		}
+		else if (add_file_to_cache(path))
 			die("Unable to add %s to database", path);
+		mode_force = sha1_force = NULL;
 	}
 	if (write_cache(newfd, active_cache, active_nr) ||
 	    rename(".git/index.lock", ".git/index"))

--- /dev/null	2005-03-19 15:28:25.000000000 -0800
+++ git-merge.perl	2005-04-13 23:45:23.000000000 -0700
@@ -0,0 +1,193 @@
+#!/usr/bin/perl -w
+
+use Getopt::Long;
+
+my $full_checkout = 0;
+my $oneside_checkout = 0;
+GetOptions("full" => \$full_checkout,
+	   "oneside" => \$oneside_checkout)
+    or die;
+
+if ($full_checkout) {
+    $oneside_checkout = 1;
+}
+
+sub read_rev_tree {
+    my (@head) = @_;
+    my ($fhi);
+    open $fhi, '-|', 'rev-tree', '--edges', @head
+	or die "$!: rev-tree --edges @head";
+    my $common;
+    while (<$fhi>) {
+	chomp;
+	(undef, undef, $common) = split(/ /, $_);
+	if ($common =~ s/^([a-f0-f]{40}):\d+$/$1/) {
+	    last;
+	}
+    }
+    close $fhi;
+    return $common;
+}
+
+sub read_commit_tree {
+    my ($commit) = @_;
+    my ($fhi);
+    open $fhi, '-|', 'cat-file', 'commit', $commit
+	or die "$!: cat-file commit $commit";
+    my $tree = <$fhi>;
+    close $fhi;
+    $tree =~ s/^tree //;
+    return $tree;
+}
+
+sub read_diff_tree {
+    my (@tree) = @_;
+    my ($fhi);
+    local ($_, $/);
+    $/ = "\0"; 
+    my %path;
+    open $fhi, '-|', 'diff-tree', '-r', @tree
+	or die "$!: diff-tree -r @tree";
+    while (<$fhi>) {
+	chomp;
+	if (/^\*[0-7]+->([0-7]+)\tblob\t[0-9a-f]+->([0-9a-f]{40})\t(.*)$/s) {
+	    # mode newsha path
+	    $path{$3} = [$1, $2];
+	}
+	elsif (/^\+([0-7]+)\tblob\t([0-9a-f]{40})\t(.*)$/s) {
+	    # mode newsha path
+	    $path{$3} = [$1, $2];
+	}
+	else {
+	    print STDERR "$_??";
+	}
+    }
+    close $fhi;
+    return %path;
+}
+
+sub read_show_files {
+    my ($fhi);
+    local ($_, $/);
+    $/ = "\0"; 
+    open $fhi, '-|', 'show-files', '-z'
+	or die "$!: show-files -z";
+    my (@path) = map { chomp; $_ } <$fhi>;
+    close $fhi;
+    return @path;
+}
+
+sub checkout_file {
+    my ($path, $info) = @_;
+    my (@elt) = split(/\//, $path);
+    my $j = '';
+    my $tail = pop @elt;
+    my ($fhi, $fho);
+    for (@elt) {
+	mkdir "$j$_";
+	$j = "$j$_/";
+    }
+    open $fho, '>', "$path";
+    open $fhi, '-|', 'cat-file', 'blob', $info->[1]
+	or die "$!: cat-file blob $info->[1]";
+    while (<$fhi>) {
+	print $fho $_;
+    }
+    close $fhi;
+    close $fho;
+    chmod oct("0$info->[0]"), "$path";
+}
+
+sub record_file {
+    my ($path, $info) = @_;
+    system 'update-cache', '--cacheinfo', @$info, $path;
+}
+
+sub merge_tree {
+    my ($path, $info0, $info1) = @_;
+    print STDERR "M - $path\n";
+    checkout_file(',,merge-0', $info0);
+    checkout_file(',,merge-1', $info1);
+    system 'checkout-cache', $path;
+    my ($fhi, $fho);
+    open $fhi, '-|', 'merge', '-p', ',,merge-0', $path, ',,merge-1';
+    open $fho, '>', "$path+";
+    local ($/);
+    while (<$fhi>) { print $fho $_; }
+    close $fhi;
+    close $fho;
+    unlink ',,merge-0', ',,merge-1';
+    rename "$path+", $path;
+    # There is no reason to prefer info0 over info1 but
+    # we need to pick one.
+    chmod oct("0$info0->[0]"), "$path";
+}
+
+# Find common ancestor of two trees.
+my $common = read_rev_tree(@ARGV);
+print "Common ancestor: $common\n";
+
+# Create a temporary directory and go there.
+system 'rm', '-rf', ',,merge-temp';
+for ((',,merge-temp', '.git')) { mkdir $_; chdir $_; }
+symlink "../../.git/objects", "objects";
+chdir '..';
+
+my $ancestor_tree = read_commit_tree($common);
+system 'read-tree', $ancestor_tree;
+
+my %tree0 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[0]));
+my %tree1 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[1]));
+
+my @ancestor_file = read_show_files();
+my %ancestor_file = map { $_ => 1 } @ancestor_file;
+
+for (@ancestor_file) {
+    if (! exists $tree0{$_} && ! exists $tree1{$_}) {
+	if ($full_checkout) {
+	    system 'checkout-cache', $_;
+	}
+	print STDERR "O - $_\n";
+    }
+}
+
+my %need_merge = ();
+
+for $path (keys %tree0) {
+    if (! exists $tree1{$path}) {
+	# Only changed in tree 0 --- take his version
+	print STDERR "0 - $path\n";
+	if (! exists $ancestor_file{$path}) {
+	    checkout_file($path, $tree0{$path});
+	    system 'update-cache', '--add', "$path";
+	}
+	elsif ($oneside_checkout) {
+	    checkout_file($path, $tree0{$path});
+	}
+	else {
+	    record_file($path, $tree0{$path});
+	}
+    }
+    else {
+	merge_tree($path, $tree0{$path}, $tree1{$path});
+    }
+}
+
+for $path (keys %tree1) {
+    if (! exists $tree0{$path}) {
+	# Only changed in tree 1 --- take his version
+	print STDERR "1 - $path\n";
+	if (! exists $ancestor_file{$path}) {
+	    checkout_file($path, $tree1{$path});
+	    system 'update-cache', '--add', "$path";
+	}
+	elsif ($oneside_checkout) {
+	    checkout_file($path, $tree1{$path});
+	}
+	else {
+	    record_file($path, $tree1{$path});
+	}
+    }
+}
+
+# system 'show-diff';



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  7:05       ` Junio C Hamano
@ 2005-04-14  8:06         ` Linus Torvalds
  2005-04-14  8:39           ` Junio C Hamano
  2005-04-15 19:57           ` Junio C Hamano
  0 siblings, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-14  8:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Christopher Li, git



On Thu, 14 Apr 2005, Junio C Hamano wrote:
> 
> I now have a Perl script that uses rev-tree, cat-file,
> diff-tree, show-files (with one modification so that it can deal
> with pathnames with embedded newlines), update-cache (with one
> modification so that I can add an entry for a file that does not
> exist to the dircache) and merge (from RCS).  Quick and dirty.

That's exactly what I wanted. Q'n'D is how the ball gets rolling.

In the meantime I wrote a very stupid "merge-tree" which does things
slightly differently, but I really think your approach (aka my original
approach) is actually a lot faster. I was just starting to worry that the 
ball didn't start, so I wrote an even hackier one.

My really hacky one is called "merge-tree", and it really only merges one 
directory. For each entry in the directory it says either

	select <mode> <sha1> path

or

	merge <mode>-><mode>,<mode> <sha1>-><sha1>,<sha1> path

depending on whether it could directly select the right object or not.

It's actually exactly the same algorithm as the first one, but I was 
afraid the first one would be so abstract that it (a) might not work and 
(b) wouldn't get people to work it out. This "one directory at a time with 
very explicit output" thing is much more down-to-earth, but it's also
likely slower because it will need script help more often.

That said, I don't know. MOST of the time there will be just a single 
"directory" entry that needs merging, and then the script would just need 
to recurse into that directory with the new "tree" objects. So it might 
not be too horrible.

But I'm really happy that you seem to have implemented my first 
suggestion and I seem to have been wasting my time. 

>  5. for each path involved:
> 
>   5.0 if neither heads change it, leave it as is;
>   5.1 if only one head changes a path and the other does not, just
>       get the changed version;
>   5.2 if both heads change it, check all three out and run merge.

You missed one case: 

    5.0.1 if both heads change it to the same thing, take the new thing

but maybe you counted that as 5.0 (it _should_ fall out automatically from
the fact that "diff-tree" between the two destination trees shows no
difference for such a file).

Now, arguably, your 5.2 will do things right, but the thing is, it's 
actually fairly _common_ that both heads have changed something to the 
same thing. Namely if there was a previous merge that already handled that 
case, but that previous merge may not be a proper parent of the new 
commits.  So from a performance standpoint you really don't want to 
consider that to be a merge - you just pick up the new contents directly.

See?

(My stupid "merge-tree" should show the algorithm in painful obviousity. 
Of course, my stipid merge-tree may also be painfully buggy. You be the 
judge).

> It does not currently commit.  You can go to ./,,merge-temp/ and
> see show-diff to see the result of the merge.  Files added in
> one head has already been run "update-cache" when the script
> ends, but changed and merged files are not---dircache still has
> the common ancestor view.

That sounds good.

> Also to implement 'changed only by one-side' without actually
> checking the file out, I needed to add one option to
> 'update-cache'.  --cacheinfo flag is used this way:
> 
>     $ update-cache --cacheinfo mode sha1 path

Yes. My "merge-tree" needs the exact same thing.

Looks good from your explanation, but I'm too tired to look at the code. 
It's 1AM, and the kids get up at 7.

I'm not much of a hacker, I usually crash by 10PM these days ;^)

			Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  8:06         ` Linus Torvalds
@ 2005-04-14  8:39           ` Junio C Hamano
  2005-04-14  9:10             ` Linus Torvalds
  2005-04-15 19:57           ` Junio C Hamano
  1 sibling, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-14  8:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Christopher Li, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> But I'm really happy that you seem to have implemented my first 
LT> suggestion and I seem to have been wasting my time. 

Thanks for the kind words.

>> 5. for each path involved:
>> 
>> 5.0 if neither heads change it, leave it as is;
>> 5.1 if only one head changes a path and the other does not, just
>> get the changed version;
>> 5.2 if both heads change it, check all three out and run merge.

LT> You missed one case: 

LT>     5.0.1 if both heads change it to the same thing, take the new thing

LT> but maybe you counted that as 5.0 (it _should_ fall out automatically from
LT> the fact that "diff-tree" between the two destination trees shows no
LT> difference for such a file).

Actually I am not handling that.  It really is 5.1a---the exact
same code path as 5.1 can be used for this case, and as you
point out it is really a quite important optimization.

I have to handle the following cases.  I think I currently do
wrong things to them:

  5.1a both head modify to the same thing.
  5.1b one head removes, the other does not do anything.
  5.1c both head remove.
  5.3 one head removes, the other head modifies.

Handling of 5.1a, 5.1b and 5.1c are obvious.

  5.1a Update dircache to the same new thing.  Without -f or -o
       flag do not touch ,,merge-temp/. directory; with -f or
       -o, leave the new file in ,,merge-temp/.

  5.1b Remove the path from dircache and do not have the file in
       ,,merge-temp/. directory regardless of -f or -o flags.

  5.1c Same as 5.1b

I am not sure what to do with 5.3.  My knee-jerk reaction is to
leave the modified result in ,,merge-temp/$path~ without
touching dircache.  If the merger wants to pick it up, he can
rename $path~ to $path temporarily, run show-diff on it (I think
giving an option to show-diff to specify paths would be helpful
for this workflow), to decide if he wants to keep the file or
not.  Suggestions?


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  8:39           ` Junio C Hamano
@ 2005-04-14  9:10             ` Linus Torvalds
  2005-04-14 11:14               ` Junio C Hamano
  0 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-14  9:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Christopher Li, git



On Thu, 14 Apr 2005, Junio C Hamano wrote:
> 
> I have to handle the following cases.  I think I currently do
> wrong things to them:
> 
>   5.1a both head modify to the same thing.
>   5.1b one head removes, the other does not do anything.
>   5.1c both head remove.
>   5.3 one head removes, the other head modifies.

There's another interesting set of cases: one side creates a file, and the
other one creates a directory. 

> I am not sure what to do with 5.3.

My very _strong_ preference is to just inform the user about a merge that
cannot be performed, and not let it be automated. BIG warning, with some 
way for the user to specify the end result.

The thing is, these are pretty rare cases. But in order to make people 
feel good about the _common_ case, it's important that they feel safe 
about the rare one.

Put another way: if git tells me when it can't do something (with some
specificity), I can then fix the situation up and try again. I might curse
a while, and maybe it ends up being so common that I might even automate
it, but at least I'll be able to trust the end result.

In contrast, if git does something that _may_ be nonsensical, then I'll
worry all the time, and not trust git. That's much worse than an 
occasional curse.

So the rule should be: only merge when it's "obviously the right thing".  
If it's not obvious, the merge should _not_ try to guess what the right
thing is. It's much better to fail loudly.

(That's especially true early on. There may be cases that end up being
obvious after some usage. But I'd rather find them by having git be too
stupid, than find out the hard way that git lost some data because it
thought it was ok to remove a file that had been modified)

			Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  9:10             ` Linus Torvalds
@ 2005-04-14 11:14               ` Junio C Hamano
  2005-04-14 12:16                 ` Petr Baudis
  0 siblings, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-14 11:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Here is a diff to update the git-merge.perl script I showed you
earlier today ;-).  It contains the following updates against
your HEAD (bb95843a5a0f397270819462812735ee29796fb4).

 * git-merge.perl command we talked about on the git list.  I've
   covered the changed-to-the-same case etc.  I still haven't done
   anything about file-vs-directory case yet.

   It does warn when it needed to run merge to automerge and let
   merge give a warning message about conflicts if any.  In
   modify/remove cases, modified in one but removed in the other
   files are left in either $path~A~ or $path~B~ in the merge
   temporary directory, and the script issues a warning at the
   end.

 * show-files and ls-tree updates to add -z flag to NUL terminate records;
   this is needed for git-merge.perl to work.

 * show-diff updates to add -r flag to squelch diffs for files not in
   the working directory.  This is mainly useful when verifying the
   result of an automated merge.

 * update-cache updates to add "--cacheinfo mode sha1" flag to register
   a file that is not in the current working directory.  Needed for
   minimum-checkout merging by git-merge.perl.


Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 git-merge.perl |  247 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ls-tree.c      |    9 +-
 show-diff.c    |   11 +-
 show-files.c   |   12 ++
 update-cache.c |   25 +++++
 5 files changed, 296 insertions(+), 8 deletions(-)

diff -x .git -Nru ,,1/git-merge.perl ,,2/git-merge.perl
--- ,,1/git-merge.perl	1969-12-31 16:00:00.000000000 -0800
+++ ,,2/git-merge.perl	2005-04-14 04:00:14.000000000 -0700
@@ -0,0 +1,247 @@
+#!/usr/bin/perl -w
+
+use Getopt::Long;
+
+my $full_checkout = 0;
+my $oneside_checkout = 0;
+GetOptions("full" => \$full_checkout,
+	   "oneside" => \$oneside_checkout)
+    or die;
+
+if ($full_checkout) {
+    $oneside_checkout = 1;
+}
+
+sub read_rev_tree {
+    my (@head) = @_;
+    my ($fhi);
+    open $fhi, '-|', 'rev-tree', '--edges', @head
+	or die "$!: rev-tree --edges @head";
+    my %common;
+    while (<$fhi>) {
+	chomp;
+	(undef, undef, my @common) = split(/ /, $_);
+	for (@common) {
+	    if (s/^([a-f0-f]{40}):3$/$1/) {
+		$common{$_}++;
+	    }
+	}
+    }
+    close $fhi;
+
+    my @common = (map { $_->[1] }
+		  sort { $b->[0] <=> $a->[0] }
+		  map { [ $common{$_} => $_ ] }
+		  keys %common);
+
+    return $common[0];
+}
+
+sub read_commit_tree {
+    my ($commit) = @_;
+    my ($fhi);
+    open $fhi, '-|', 'cat-file', 'commit', $commit
+	or die "$!: cat-file commit $commit";
+    my $tree = <$fhi>;
+    close $fhi;
+    $tree =~ s/^tree //;
+    return $tree;
+}
+
+# Reads diff-tree -r output and gives a hash that maps a path
+# to 3-tuple (old-mode new-mode new-sha).
+# When creating, old-mode is undef.  When removing, new-* are undef.
+sub read_diff_tree {
+    my (@tree) = @_;
+    my ($fhi);
+    local ($_, $/);
+    $/ = "\0"; 
+    my %path;
+    open $fhi, '-|', 'diff-tree', '-r', @tree
+	or die "$!: diff-tree -r @tree";
+    while (<$fhi>) {
+	chomp;
+	if (/^\*([0-7]+)->([0-7]+)\tblob\t[0-9a-f]+->([0-9a-f]{40})\t(.*)$/s) {
+	    $path{$4} = [$1, $2, $3];
+	}
+	elsif (/^\+([0-7]+)\tblob\t([0-9a-f]{40})\t(.*)$/s) {
+	    $path{$3} = [undef, $1, $2];
+	}
+	elsif (/^\-([0-7]+)\tblob\t[0-9a-f]{40}\t(.*)$/s) {
+	    $path{$2} = [$1, undef, undef];
+	}
+	else {
+	    die "cannot parse diff-tree output: $_";
+	}
+    }
+    close $fhi;
+    return %path;
+}
+
+sub read_show_files {
+    my ($fhi);
+    local ($_, $/);
+    $/ = "\0"; 
+    open $fhi, '-|', 'show-files', '-z'
+	or die "$!: show-files -z";
+    my (@path) = map { chomp; $_ } <$fhi>;
+    close $fhi;
+    return @path;
+}
+
+sub checkout_file {
+    my ($path, $info) = @_;
+    my (@elt) = split(/\//, $path);
+    my $j = '';
+    my $tail = pop @elt;
+    my ($fhi, $fho);
+    for (@elt) {
+	mkdir "$j$_";
+	$j = "$j$_/";
+    }
+    open $fho, '>', "$path";
+    open $fhi, '-|', 'cat-file', 'blob', $info->[2]
+	or die "$!: cat-file blob $info->[2]";
+    while (<$fhi>) {
+	print $fho $_;
+    }
+    close $fhi;
+    close $fho;
+    chmod oct("0$info->[1]"), "$path";
+}
+
+sub record_file {
+    my ($path, $info) = @_;
+    system ('update-cache', '--add', '--cacheinfo',
+	    $info->[1], $info->[2], $path);
+}
+
+sub merge_tree {
+    my ($path, $info0, $info1) = @_;
+    checkout_file(',,merge-0', $info0);
+    checkout_file(',,merge-1', $info1);
+    system 'checkout-cache', $path;
+    my ($fhi, $fho);
+    open $fhi, '-|', 'merge', '-p', ',,merge-0', $path, ',,merge-1';
+    open $fho, '>', "$path+";
+    local ($/);
+    while (<$fhi>) { print $fho $_; }
+    close $fhi;
+    close $fho;
+    unlink ',,merge-0', ',,merge-1';
+    rename "$path+", $path;
+    # There is no reason to prefer info0 over info1 but
+    # we need to pick one.
+    chmod oct("0$info0->[1]"), "$path";
+}
+
+# Find common ancestor of two trees.
+my $common = read_rev_tree(@ARGV);
+print "Common ancestor: $common\n";
+
+# Create a temporary directory and go there.
+system 'rm', '-rf', ',,merge-temp';
+for ((',,merge-temp', '.git')) { mkdir $_; chdir $_; }
+symlink "../../.git/objects", "objects";
+chdir '..';
+
+my $ancestor_tree = read_commit_tree($common);
+system 'read-tree', $ancestor_tree;
+
+my %tree0 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[0]));
+my %tree1 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[1]));
+
+my @ancestor_file = read_show_files();
+my %ancestor_file = map { $_ => 1 } @ancestor_file;
+
+for (@ancestor_file) {
+    if (! exists $tree0{$_} && ! exists $tree1{$_}) {
+	if ($full_checkout) {
+	    system 'checkout-cache', $_;
+	}
+	print STDERR "O - $_\n";
+    }
+}
+
+for my $set ([\%tree0, \%tree1, 'A'], [\%tree1, \%tree0, 'B']) {
+    my ($treeA, $treeB, $side) = @$set;
+    while (my ($path, $info) = each %$treeA) {
+	# In this loop we do not deal with overlaps.
+	next if (exists $treeB->{$path});
+
+	if (! defined $info->[1]) {
+	    # deleted in this tree only.
+	    unlink $path;
+	    system 'update-cache', '--remove', $path;
+	    print STDERR "$side D $path\n";
+	}
+	else {
+	    # modified or created in this tree only.
+	    print STDERR "$side M $path\n";
+	    if ($oneside_checkout) {
+		checkout_file($path, $info);
+		system 'update-cache', '--add', "$path";
+	    } else {
+		record_file($path, $info);
+	    }
+	}
+    }
+}
+
+my @warning = ();
+
+while (my ($path, $info0) = each %tree0) {
+    # We need to deal only with overlaps.
+    next if (!exists $tree1{$path});
+
+    my $info1 = $tree1{$path};
+    if (! defined $info0->[1]) {
+	# deleted in this tree.
+	if (! defined $info1->[1]) {
+	    # deleted in both trees.  Obvious.
+	    print STDERR "*DD $path\n";
+	    unlink $path;
+	    system 'update-cache', '--remove', $path;
+	}
+	else {
+	    # oops.  tree0 wants to remove but tree1 wants to modify it.
+	    print STDERR "*DM $path\n";
+	    checkout_file("$path~B~", $info1);
+	    push @warning, $path;
+	}
+    }
+    else {
+	# modified or created in tree0
+	if (! defined $info1->[1]) {
+	    # oops.  tree0 wants to modify but tree1 wants to remove it.
+	    print STDERR "*MD $path\n";
+	    checkout_file("$path~A~", $info0);
+	    push @warning, $path;
+	}
+	else {
+	    # modified both in tree0 and tree1
+	    # are they modifying to the same contents?
+	    if ($info0->[2] eq $info1->[2]) {
+		# just mode changes (or no changes)
+		# we prefer tree0 over tree1 for no particular reason.
+		print STDERR "*MM $path\n";
+		record_file($path, $info0);
+	    }
+	    else {
+		# modified in both.  Needs merge.
+		print STDERR "MRG $path\n";
+		merge_tree($path, $info0, $info1);
+	    }
+	}
+    }
+}
+
+if (@warning) {
+    print "\nThere are some files that were deleted in one branch and\n"
+	. "modified in another.  Please examine them carefully:\n";
+    for (@warning) {
+	print "$_\n";
+    }
+}
+
+# system 'show-diff';


diff -x .git -Nru ,,1/ls-tree.c ,,2/ls-tree.c
--- ,,1/ls-tree.c	2005-04-14 03:47:18.000000000 -0700
+++ ,,2/ls-tree.c	2005-04-14 04:00:14.000000000 -0700
@@ -5,6 +5,8 @@
  */
 #include "cache.h"
 
+int line_termination = '\n';
+
 static int list(unsigned char *sha1)
 {
 	void *buffer;
@@ -31,7 +33,8 @@
 		 * It seems not worth it to read each file just to get this
 		 * and the file size. -- pasky@ucw.cz */
 		type = S_ISDIR(mode) ? "tree" : "blob";
-		printf("%03o\t%s\t%s\t%s\n", mode, type, sha1_to_hex(sha1), path);
+		printf("%03o\t%s\t%s\t%s%c", mode, type, sha1_to_hex(sha1),
+		       path, line_termination);
 	}
 	return 0;
 }
@@ -40,6 +43,10 @@
 {
 	unsigned char sha1[20];
 
+	if (argc == 3 && !strcmp(argv[1], "-z")) {
+	  line_termination = 0;
+	  argc--; argv++;
+	}
 	if (argc != 2)
 		usage("ls-tree <key>");
 	if (get_sha1_hex(argv[1], sha1) < 0)


diff -x .git -Nru ,,1/show-diff.c ,,2/show-diff.c
--- ,,1/show-diff.c	2005-04-14 03:47:18.000000000 -0700
+++ ,,2/show-diff.c	2005-04-14 04:00:14.000000000 -0700
@@ -58,15 +58,20 @@
 int main(int argc, char **argv)
 {
 	int silent = 0;
+	int silent_on_nonexisting_files = 0;
 	int entries = read_cache();
 	int i;
 
 	while (argc-- > 1) {
 		if (!strcmp(argv[1], "-s")) {
-			silent = 1;
+			silent_on_nonexisting_files = silent = 1;
 			continue;
 		}
-		usage("show-diff [-s]");
+		if (!strcmp(argv[1], "-r")) {
+			silent_on_nonexisting_files = 1;
+			continue;
+		}
+		usage("show-diff [-s] [-r]");
 	}
 
 	if (entries < 0) {
@@ -83,7 +88,7 @@
 
 		if (stat(ce->name, &st) < 0) {
 			printf("%s: %s\n", ce->name, strerror(errno));
-			if (errno == ENOENT && !silent)
+			if (errno == ENOENT && !silent_on_nonexisting_files)
 				show_diff_empty(ce);
 			continue;
 		}


diff -x .git -Nru ,,1/show-files.c ,,2/show-files.c
--- ,,1/show-files.c	2005-04-14 03:47:18.000000000 -0700
+++ ,,2/show-files.c	2005-04-14 04:00:14.000000000 -0700
@@ -14,6 +14,7 @@
 static int show_cached = 0;
 static int show_others = 0;
 static int show_ignored = 0;
+static int line_terminator = '\n';
 
 static const char **dir;
 static int nr_dir;
@@ -105,12 +106,12 @@
 	}
 	if (show_others) {
 		for (i = 0; i < nr_dir; i++)
-			printf("%s\n", dir[i]);
+			printf("%s%c", dir[i], line_terminator);
 	}
 	if (show_cached) {
 		for (i = 0; i < active_nr; i++) {
 			struct cache_entry *ce = active_cache[i];
-			printf("%s\n", ce->name);
+			printf("%s%c", ce->name, line_terminator);
 		}
 	}
 	if (show_deleted) {
@@ -119,7 +120,7 @@
 			struct stat st;
 			if (!stat(ce->name, &st))
 				continue;
-			printf("%s\n", ce->name);
+			printf("%s%c", ce->name, line_terminator);
 		}
 	}
 	if (show_ignored) {
@@ -134,6 +135,11 @@
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
 
+		if (!strcmp(arg, "-z")) {
+			line_terminator = 0;
+			continue;
+		}
+
 		if (!strcmp(arg, "--cached")) {
 			show_cached = 1;
 			continue;


diff -x .git -Nru ,,1/update-cache.c ,,2/update-cache.c
--- ,,1/update-cache.c	2005-04-14 03:47:18.000000000 -0700
+++ ,,2/update-cache.c	2005-04-14 04:00:14.000000000 -0700
@@ -250,6 +250,8 @@
 {
 	int i, newfd, entries;
 	int allow_options = 1;
+	const char *sha1_force = NULL;
+	const char *mode_force = NULL;
 
 	newfd = open(".git/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600);
 	if (newfd < 0)
@@ -282,14 +284,35 @@
 				refresh_cache();
 				continue;
 			}
+			if (!strcmp(path, "--cacheinfo")) {
+				mode_force = argv[++i];
+				sha1_force = argv[++i];
+				continue;
+			}
 			die("unknown option %s", path);
 		}
 		if (!verify_path(path)) {
 			fprintf(stderr, "Ignoring path %s\n", argv[i]);
 			continue;
 		}
-		if (add_file_to_cache(path))
+		if (sha1_force && mode_force) {
+			struct cache_entry *ce;
+			int namelen = strlen(path);
+			int mode;
+			int size = cache_entry_size(namelen);
+			sscanf(mode_force, "%o", &mode);
+			ce = malloc(size);
+			memset(ce, 0, size);
+			memcpy(ce->name, path, namelen);
+			ce->namelen = namelen;
+			ce->st_mode = mode;
+			get_sha1_hex(sha1_force, ce->sha1);
+
+			add_cache_entry(ce, 1);
+		}
+		else if (add_file_to_cache(path))
 			die("Unable to add %s to database", path);
+		mode_force = sha1_force = NULL;
 	}
 	if (write_cache(newfd, active_cache, active_nr) ||
 	    rename(".git/index.lock", ".git/index"))


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14 11:14               ` Junio C Hamano
@ 2005-04-14 12:16                 ` Petr Baudis
  2005-04-14 18:12                   ` Junio C Hamano
  0 siblings, 1 reply; 148+ messages in thread
From: Petr Baudis @ 2005-04-14 12:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Dear diary, on Thu, Apr 14, 2005 at 01:14:13PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> Here is a diff to update the git-merge.perl script I showed you
> earlier today ;-).  It contains the following updates against
> your HEAD (bb95843a5a0f397270819462812735ee29796fb4).

Bah, you outran me. ;-)

>  * git-merge.perl command we talked about on the git list.  I've
>    covered the changed-to-the-same case etc.  I still haven't done
>    anything about file-vs-directory case yet.
> 
>    It does warn when it needed to run merge to automerge and let
>    merge give a warning message about conflicts if any.  In
>    modify/remove cases, modified in one but removed in the other
>    files are left in either $path~A~ or $path~B~ in the merge
>    temporary directory, and the script issues a warning at the
>    end.

I think I will take it rather my working git merge implementation - it's
getting insane in bash. ;-)

I'll change it to use the cool git-pasky stuff (commit-id etc) and its
style of committing - that is, it will merely record the update-caches
to be done upon commit, and it will read-tree the branch we are merging
to instead of the ancestor. (So that git diff gives useful output.)

>  * show-files and ls-tree updates to add -z flag to NUL terminate records;
>    this is needed for git-merge.perl to work.
> 
>  * show-diff updates to add -r flag to squelch diffs for files not in
>    the working directory.  This is mainly useful when verifying the
>    result of an automated merge.

-r traditionally means recursive - what's the reasoning behind the
choice of this letter?

>  * update-cache updates to add "--cacheinfo mode sha1" flag to register
>    a file that is not in the current working directory.  Needed for
>    minimum-checkout merging by git-merge.perl.
> 
> 
> Signed-off-by: Junio C Hamano <junkio@cox.net>
> 
> ---
> 
>  git-merge.perl |  247 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  ls-tree.c      |    9 +-
>  show-diff.c    |   11 +-
>  show-files.c   |   12 ++
>  update-cache.c |   25 +++++
>  5 files changed, 296 insertions(+), 8 deletions(-)
> 
> diff -x .git -Nru ,,1/git-merge.perl ,,2/git-merge.perl
> --- ,,1/git-merge.perl	1969-12-31 16:00:00.000000000 -0800
> +++ ,,2/git-merge.perl	2005-04-14 04:00:14.000000000 -0700
> @@ -0,0 +1,247 @@
> +#!/usr/bin/perl -w
> +
> +use Getopt::Long;

use strict?

> +
> +my $full_checkout = 0;
> +my $oneside_checkout = 0;
> +GetOptions("full" => \$full_checkout,
> +	   "oneside" => \$oneside_checkout)
> +    or die;
> +
> +if ($full_checkout) {
> +    $oneside_checkout = 1;
> +}
> +
> +sub read_rev_tree {
> +    my (@head) = @_;
> +    my ($fhi);
> +    open $fhi, '-|', 'rev-tree', '--edges', @head
> +	or die "$!: rev-tree --edges @head";
> +    my %common;
> +    while (<$fhi>) {
> +	chomp;
> +	(undef, undef, my @common) = split(/ /, $_);
> +	for (@common) {
> +	    if (s/^([a-f0-f]{40}):3$/$1/) {
> +		$common{$_}++;
> +	    }
> +	}
> +    }
> +    close $fhi;
> +
> +    my @common = (map { $_->[1] }
> +		  sort { $b->[0] <=> $a->[0] }
> +		  map { [ $common{$_} => $_ ] }
> +		  keys %common);
> +
> +    return $common[0];
> +}

It'd be simpler to do just

	my @common = (map { $common{$_} }
	              sort { $b <=> $a }
	              keys %common)

But I really think this is a horrible heuristic. I believe you should
take the latest commit in the --edges output, and from that choose the
base whose rev-tree --edges the_base merged_branch has the least lines
on output. (That is, the path to it is shortest - ideally it's already
part of the merged_branch.)

> +
> +sub read_commit_tree {
> +    my ($commit) = @_;
> +    my ($fhi);
> +    open $fhi, '-|', 'cat-file', 'commit', $commit
> +	or die "$!: cat-file commit $commit";
> +    my $tree = <$fhi>;
> +    close $fhi;
> +    $tree =~ s/^tree //;
> +    return $tree;
> +}
> +
> +# Reads diff-tree -r output and gives a hash that maps a path
> +# to 3-tuple (old-mode new-mode new-sha).
> +# When creating, old-mode is undef.  When removing, new-* are undef.

What about

sub OLDMODE { 0 }
sub NEWMODE { 1 }
sub NEWSHA { 2 }

and then using that when accessing the tuple? Would make the code
much more readable.

> +sub read_diff_tree {
> +    my (@tree) = @_;
> +    my ($fhi);
> +    local ($_, $/);
> +    $/ = "\0"; 
> +    my %path;
> +    open $fhi, '-|', 'diff-tree', '-r', @tree
> +	or die "$!: diff-tree -r @tree";
> +    while (<$fhi>) {
> +	chomp;
> +	if (/^\*([0-7]+)->([0-7]+)\tblob\t[0-9a-f]+->([0-9a-f]{40})\t(.*)$/s) {
> +	    $path{$4} = [$1, $2, $3];
> +	}
> +	elsif (/^\+([0-7]+)\tblob\t([0-9a-f]{40})\t(.*)$/s) {
> +	    $path{$3} = [undef, $1, $2];
> +	}
> +	elsif (/^\-([0-7]+)\tblob\t[0-9a-f]{40}\t(.*)$/s) {
> +	    $path{$2} = [$1, undef, undef];
> +	}
> +	else {
> +	    die "cannot parse diff-tree output: $_";
> +	}
> +    }
> +    close $fhi;
> +    return %path;
> +}
> +
> +sub read_show_files {
> +    my ($fhi);
> +    local ($_, $/);
> +    $/ = "\0"; 
> +    open $fhi, '-|', 'show-files', '-z'
> +	or die "$!: show-files -z";
> +    my (@path) = map { chomp; $_ } <$fhi>;
> +    close $fhi;
> +    return @path;
> +}
> +
> +sub checkout_file {
> +    my ($path, $info) = @_;
> +    my (@elt) = split(/\//, $path);
> +    my $j = '';
> +    my $tail = pop @elt;
> +    my ($fhi, $fho);
> +    for (@elt) {
> +	mkdir "$j$_";
> +	$j = "$j$_/";
> +    }
> +    open $fho, '>', "$path";
> +    open $fhi, '-|', 'cat-file', 'blob', $info->[2]
> +	or die "$!: cat-file blob $info->[2]";
> +    while (<$fhi>) {
> +	print $fho $_;
> +    }
> +    close $fhi;
> +    close $fho;
> +    chmod oct("0$info->[1]"), "$path";
> +}
> +
> +sub record_file {
> +    my ($path, $info) = @_;
> +    system ('update-cache', '--add', '--cacheinfo',
> +	    $info->[1], $info->[2], $path);
> +}
> +
> +sub merge_tree {
> +    my ($path, $info0, $info1) = @_;
> +    checkout_file(',,merge-0', $info0);
> +    checkout_file(',,merge-1', $info1);
> +    system 'checkout-cache', $path;
> +    my ($fhi, $fho);
> +    open $fhi, '-|', 'merge', '-p', ',,merge-0', $path, ',,merge-1';
> +    open $fho, '>', "$path+";
> +    local ($/);
> +    while (<$fhi>) { print $fho $_; }
> +    close $fhi;
> +    close $fho;
> +    unlink ',,merge-0', ',,merge-1';
> +    rename "$path+", $path;
> +    # There is no reason to prefer info0 over info1 but
> +    # we need to pick one.
> +    chmod oct("0$info0->[1]"), "$path";
> +}

It is a good idea to check merge's exit code and give a notice at the
end if there were any conflicts.

> +
> +# Find common ancestor of two trees.
> +my $common = read_rev_tree(@ARGV);
> +print "Common ancestor: $common\n";
> +
> +# Create a temporary directory and go there.
> +system 'rm', '-rf', ',,merge-temp';

Can't we call it just ,,merge?

> +for ((',,merge-temp', '.git')) { mkdir $_; chdir $_; }
> +symlink "../../.git/objects", "objects";
> +chdir '..';
> +
> +my $ancestor_tree = read_commit_tree($common);
> +system 'read-tree', $ancestor_tree;
> +
> +my %tree0 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[0]));
> +my %tree1 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[1]));
> +
> +my @ancestor_file = read_show_files();
> +my %ancestor_file = map { $_ => 1 } @ancestor_file;
> +
> +for (@ancestor_file) {
> +    if (! exists $tree0{$_} && ! exists $tree1{$_}) {
> +	if ($full_checkout) {
> +	    system 'checkout-cache', $_;
> +	}
> +	print STDERR "O - $_\n";

Huh, what are you trying to do here? I think you should just record
remove, no? (And I wouldn't do anything with my read-tree. ;-)

> +    }
> +}
> +
> +for my $set ([\%tree0, \%tree1, 'A'], [\%tree1, \%tree0, 'B']) {
> +    my ($treeA, $treeB, $side) = @$set;
> +    while (my ($path, $info) = each %$treeA) {
> +	# In this loop we do not deal with overlaps.
> +	next if (exists $treeB->{$path});
> +
> +	if (! defined $info->[1]) {
> +	    # deleted in this tree only.
> +	    unlink $path;
> +	    system 'update-cache', '--remove', $path;
> +	    print STDERR "$side D $path\n";
> +	}
> +	else {
> +	    # modified or created in this tree only.
> +	    print STDERR "$side M $path\n";
> +	    if ($oneside_checkout) {
> +		checkout_file($path, $info);
> +		system 'update-cache', '--add', "$path";
> +	    } else {
> +		record_file($path, $info);
> +	    }
> +	}
> +    }
> +}
..snip..

Hmm, I think I will just need to play with the script a lot. ;-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 12:16                 ` Petr Baudis
@ 2005-04-14 18:12                   ` Junio C Hamano
  2005-04-14 18:36                     ` Linus Torvalds
                                       ` (2 more replies)
  0 siblings, 3 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-14 18:12 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, git

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> Bah, you outran me. ;-)

Just being in a different timezone, I guess.

PB> I'll change it to use the cool git-pasky stuff (commit-id etc) and its
PB> style of committing - that is, it will merely record the update-caches
PB> to be done upon commit, and it will read-tree the branch we are merging
PB> to instead of the ancestor. (So that git diff gives useful output.)

Sorry, I have not seen what you have been doing since pasky 0.3,
and I have not even started to understand the mental model of
the world your tool is building.  That said, my gut feeling is
that telling this script about git-pasky's world model might be
a mistake.  I'd rather see you consider the script as mere "part
of the plumbing".  Maybe adding an extra parameter to the script
to let the user explicitly specify the common ancestor to use
would be needed, but I would prefer git-pasky-merge to do its
own magic (converting symbolic commit names into raw commit
names and such) before calling this low level script.

That way people like me who have not migrated to your framework
can still keep using it.  All the script currently needs is a
bare git object database; i.e., nothing other than what is in
.git/objects and a couple of commit record SHA1s as its
parameters.  No .git/heads/, no .git/HEAD.local, no .git/tags,
are involved for it to work, and I would prefer to keep things
that way if possible.

>> * show-diff updates to add -r flag to squelch diffs for files not in
>> the working directory.  This is mainly useful when verifying the
>> result of an automated merge.

PB> -r traditionally means recursive - what's the reasoning behind the
PB> choice of this letter?

Well, '-r' is not necessarily recursive. "ls -r" is reverse, "sort
-r" is reverse.  "less -r" is raw.  "cat -r" is reversible.
"nethack -r" is race ;-).  You are thinking as an SCM person so
it may look that way.  "diff -r" is recursive.  "darcs add -r"
is recursive.  But even in the SCM world, "cvs add -r" is not
(it means read-only) neither "co -r" (explicit revision) ;-).

I would rather pick '-q' if I were doing the patch today, but I
was too tired and did not think of a letter when I wrote it.  I
guess '-r' stood for removed, but I agree it is a bad choice.
Any objections to '-q'?

PB> use strict?

Not in this iteration but eventually yes.

PB> It'd be simpler to do just

PB> 	my @common = (map { $common{$_} }
PB> 	              sort { $b <=> $a }
PB> 	              keys %common)

Well, actually you spotted a bug between the implementation and
what I wanted to do.  It should have been:

map { $_->[0] }
    sort { $b->[1] <=> $a->[1] }
        map { [ $common{$_} => $_ ] } keys %common

That is, sort [anscestor => number of times it appears] tuple by
the "number of times it appears" in decreasing order, and
project the resulting list to a list of ancestors.  It is trying
to deal with the following pattern in rev-tree output:

TIMESTAMP1 EDGE1:1 ANCESTOR1:3 ANCESTOR2:3
TIMESTAMP2 EDGE2:2 ANCESTOR1:3

and when the above happens I wanted to pick up ANCESTOR1, but
that was without no sound reason.

PB> But I really think this is a horrible heuristic. I believe you should
PB> take the latest commit in the --edges output, and from that choose the
PB> base whose rev-tree --edges the_base merged_branch has the least lines
PB> on output. (That is, the path to it is shortest - ideally it's already
PB> part of the merged_branch.)

I'll try something along that line.  Honestly the ancestor
selection part was what I had most trouble with.  Thanks.

PB> What about

PB> sub OLDMODE { 0 }
PB> sub NEWMODE { 1 }
PB> sub NEWSHA { 2 }

PB> and then using that when accessing the tuple? Would make the code
PB> much more readable.

Totally agreed; readability cleanup is needed, just as "use
strict" you mentioned, before it is ready for public
consumption.  Remember, however, the primary purpose of the
message was to share it with Linus so that I can ask his opinion
while the script was still slushy; the contents that array
contained was still changing then and was too early for symbolic
constants.  I'll do that in the next round.

PB> It is a good idea to check merge's exit code and give a notice at the
PB> end if there were any conflicts.

In principle yes, but I noticed that merge already gave me a
nice warning message when it found conflicts, so there was no
need to do so myself in this case.  See sample output:

    $ perl ./git-merge.perl \
        71796686221a0a56ccc25b02386ed8ea648da14d \
        bb95843a5a0f397270819462812735ee29796fb4 
    Common ancestor: 9f02d4d233223462d3f6217b5837b786e6286ba4
    O - COPYING
    O - README
    ...
    O - write-tree.c
    A M write-blob.c
    A M show-diff.c
    ...
    A M update-cache.c
    A M git-merge.perl
    B M merge-tree.c
    MRG Makefile
    merge: warning: conflicts during merge
    $ 

>> +# Create a temporary directory and go there.
>> +system 'rm', '-rf', ',,merge-temp';

PB> Can't we call it just ,,merge?

I'd rather have a command line option '-o' (scrapping the
current '-o' and renaming it to something else; as you can see I
am terrible at picking option names ;-)) to mean "output to this
directory".  I am not really an Arch person so I do not
particulary care about /^,,/.  How about "git~merge~$$"?

>> +for ((',,merge-temp', '.git')) { mkdir $_; chdir $_; }
>> +symlink "../../.git/objects", "objects";
>> +chdir '..';
>> +
>> +my $ancestor_tree = read_commit_tree($common);
>> +system 'read-tree', $ancestor_tree;
>> +
>> +my %tree0 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[0]));
>> +my %tree1 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[1]));
>> +
>> +my @ancestor_file = read_show_files();
>> +my %ancestor_file = map { $_ => 1 } @ancestor_file;
>> +
>> +for (@ancestor_file) {
>> +    if (! exists $tree0{$_} && ! exists $tree1{$_}) {
>> +	if ($full_checkout) {
>> +	    system 'checkout-cache', $_;
>> +	}
>> +	print STDERR "O - $_\n";

PB> Huh, what are you trying to do here? I think you should just record
PB> remove, no? (And I wouldn't do anything with my read-tree. ;-)

At this moment in the script, we have run "read-tree" the
ancestor so the dircache has the original.  %tree0 and %tree1
both did not touch the path ($_ here) so it is the same as
ancestor.  When '-f' is specified we are populating the output
working tree with the merge result so that is what that
'checkout-cache' is about.  "O - $path" means "we took the
original".

The idea is to populate the dircache of merge-temp with the
merge result and leave uncertain stuff as in the common ancestor
state, so that the user can fix them starting from there.

Maybe it is a good time for me to summarize the output somewhere
in a document.

    O - $path	Tree-A and tree-B did not touch this; the result
                is taken from the ancestor (O for original).

    A D $path	Only tree-A (or tree-B) deleted this and the other
    B D $path   branch did not touch this; the result is to delete.

    A M $path	Only tree-A (or tree-B) modified this and the other
    B M $path   branch did not touch this; the result is to use one
                from tree-A (or tree-B).  This includes file
                creation case.

    *DD $path	Both tree-A and tree-B deleted this; the result
                is to delete.

    *DM $path   Tree-A deleted while tree-B modified this (or
    *MD $path   vice versa), and manual conflict resolution is
                needed; dircache is left as in the ancestor, and
                the modified file is saved as $path~A~ in the
                working directory.  The user can rename it to $path
                and run show-diff to see what Tree-A wanted to do
                and decide before running update-cache.

    *MM $path   Tree-A and tree-B did the exact same
                modification; the result is to use that.

    MRG $path   Tree-A and tree-B have different modifications;
                run "merge" and the merge result is left as
                $path in the working directory.

In cases other than *DM, *MD, and MRG, the result is trivial and
is recorded in the dircache.  Without '-o' (to be renamed ;-)
nor '-f' there will not be a file checked out in the working
directory for them.  The three merge cases need human attention.
The dircache is not touched in these cases and left as the
ancestor version, and the working directory gets some file as
described above.

NOTE NOTE NOTE: I am not dealing with a case where both branches
create the same file but with different contents.  In such a
case the current code falls into MRG path without having a
common ancestor, which is nonsense---I can use /dev/null as the
common ancestor, I guess.  Also NOTE NOTE NOTE I need to detect
the case where one branch creates a directory while the other
creates a file.  There is nothing an automated tool can do in
that case but it needs to be detected and be told the user
loudly.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 18:12                   ` Junio C Hamano
@ 2005-04-14 18:36                     ` Linus Torvalds
  2005-04-14 19:59                       ` Junio C Hamano
                                         ` (2 more replies)
  2005-04-14 18:51                     ` Christopher Li
  2005-04-14 19:35                     ` Petr Baudis
  2 siblings, 3 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-14 18:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git



On Thu, 14 Apr 2005, Junio C Hamano wrote:
> 
> Sorry, I have not seen what you have been doing since pasky 0.3,
> and I have not even started to understand the mental model of
> the world your tool is building.  That said, my gut feeling is
> that telling this script about git-pasky's world model might be
> a mistake.  I'd rather see you consider the script as mere "part
> of the plumbing". 

I agree. Having separate abstraction layers is good.  I'm actually very 
happy with Pasky's cleaned-up-tree, exactly because unlike the first one, 
Pasky did a great job of maintaining the abstraction between "plumbing" 
and user interfaces.

The plumbing should take user interface needs into account, but the more
conceptually separate it is ("does it makes sense on its own?") the better
off we'll be. And "merge these two trees" (which works on a _tree_ level)
or "find the common commit" (which works on a _commit_ level) look like 
plumbing to me - the kind of things I should have written, if I weren't 
such a lazy slob.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 18:12                   ` Junio C Hamano
  2005-04-14 18:36                     ` Linus Torvalds
@ 2005-04-14 18:51                     ` Christopher Li
  2005-04-14 19:35                     ` Petr Baudis
  2 siblings, 0 replies; 148+ messages in thread
From: Christopher Li @ 2005-04-14 18:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Linus Torvalds, git

On Thu, Apr 14, 2005 at 11:12:35AM -0700, Junio C Hamano wrote:
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> At this moment in the script, we have run "read-tree" the
> ancestor so the dircache has the original.  %tree0 and %tree1
> both did not touch the path ($_ here) so it is the same as
> ancestor.  When '-f' is specified we are populating the output
> working tree with the merge result so that is what that
> 'checkout-cache' is about.  "O - $path" means "we took the
> original".
> 
> The idea is to populate the dircache of merge-temp with the
> merge result and leave uncertain stuff as in the common ancestor
> state, so that the user can fix them starting from there.
> 
> Maybe it is a good time for me to summarize the output somewhere
> in a document.
> 
>     O - $path	Tree-A and tree-B did not touch this; the result
>                 is taken from the ancestor (O for original).
> 
>     A D $path	Only tree-A (or tree-B) deleted this and the other
>     B D $path   branch did not touch this; the result is to delete.
> 
>     A M $path	Only tree-A (or tree-B) modified this and the other
>     B M $path   branch did not touch this; the result is to use one
>                 from tree-A (or tree-B).  This includes file
>                 creation case.
> 
>     *DD $path	Both tree-A and tree-B deleted this; the result
>                 is to delete.
> 
>     *DM $path   Tree-A deleted while tree-B modified this (or
>     *MD $path   vice versa), and manual conflict resolution is
>                 needed; dircache is left as in the ancestor, and
>                 the modified file is saved as $path~A~ in the
>                 working directory.  The user can rename it to $path
>                 and run show-diff to see what Tree-A wanted to do
>                 and decide before running update-cache.
> 
>     *MM $path   Tree-A and tree-B did the exact same
>                 modification; the result is to use that.
> 
>     MRG $path   Tree-A and tree-B have different modifications;
>                 run "merge" and the merge result is left as
>                 $path in the working directory.
> 
> In cases other than *DM, *MD, and MRG, the result is trivial and

I believe there is simpler way to do it as in my demo python script.
I start it easier but you bits me in time. It is a demo script, it
only print the action instead of actually going out to do it.
change that to corresponding os.system("") call leaves to the reader.

Again, this is a demo how it can be done. Not python vs perl thing
I did not chose perl only because I am not good at it.

#!/usr/bin/env python

import re
import sys
import os
from pprint import pprint

def get_tree(commit):
    data = os.popen("cat-file commit %s"%commit).read()
    return re.findall(r"(?m)^tree (\w+)", data)[0]

PREFIX = 0
PATH = -1
SHA = -2
ORIGSHA = -3

def get_difftree(old, new):
    lines = os.popen("diff-tree %s %s"%(old, new)).read().split("\x00")
    patterns = (r"(\*)(\d+)->(\d+)\s(\w+)\s(\w+)->(\w+)\s(.*)",
		r"([+-])(\d+)\s(\w+)\s(\w+)\s(.*)")
    res = {}
    for l in lines:
	if not l: continue
	for p in patterns:
	    m = re.findall(p, l)
	    if m:
		m = m[0]
		res[m[-1]] = m
		break
	else:
	    raise "difftree: unknow line", l
    return res

def analyze(diff1, diff2):
    diff1only = [ diff1[k] for k in diff1 if k not in diff2 ]
    diff2only = [ diff2[k] for k in diff2 if k not in diff1 ]
    both = [ (diff1[k],diff2[k]) for k in diff2 if k in diff1 ]

    action(diff1only)
    action(diff2only)
    action_two(both)

def action(diffs):
    for act in diffs:
	if act[PREFIX] == "*":
	    print "modify", act[PATH], act[SHA]
	elif act[PREFIX] == '-':
	    print "remove", act[PATH], act[SHA]
	elif act[PREFIX] == '+':
	    print "add", "remove", act[PATH], act[SHA]
	else:
	    raise "unknow action"

def action_two(diffs):
    for act1, act2 in diffs:
	if len(act1) == len(act2):	# same kind type
	    if act1[PREFIX] == act2[PREFIX]:
		if act1[SHA] == act2[SHA] or act1[PREFIX] == '-': 
		    return action(act1)
	    	if act1[PREFIX]=='*':
		    print "3way-merge", act1[PATH], act1[ORIGSHA], act1[SHA], act2[SHA]
		    return
	print "unable to handle", act[PATH]
	print "one side wants", act1[PREFIX]
	print "the other side wants", act2[PREFIX]
	
    
args = sys.argv[1:]
trees = map(get_tree, args)
print "check out tree", trees[0]
diff1 = get_difftree(trees[0], trees[1])
diff2 = get_difftree(trees[0], trees[2])
analyze(diff1, diff2)


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14 18:12                   ` Junio C Hamano
  2005-04-14 18:36                     ` Linus Torvalds
  2005-04-14 18:51                     ` Christopher Li
@ 2005-04-14 19:35                     ` Petr Baudis
  2005-04-14 20:01                       ` Live Merging from remote repositories Barry Silverman
                                         ` (2 more replies)
  2 siblings, 3 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-14 19:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Dear diary, on Thu, Apr 14, 2005 at 08:12:35PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> PB> Bah, you outran me. ;-)
> 
> Just being in a different timezone, I guess.
> 
> PB> I'll change it to use the cool git-pasky stuff (commit-id etc) and its
> PB> style of committing - that is, it will merely record the update-caches
> PB> to be done upon commit, and it will read-tree the branch we are merging
> PB> to instead of the ancestor. (So that git diff gives useful output.)
> 
> Sorry, I have not seen what you have been doing since pasky 0.3,
> and I have not even started to understand the mental model of
> the world your tool is building.  That said, my gut feeling is
> that telling this script about git-pasky's world model might be
> a mistake.  I'd rather see you consider the script as mere "part
> of the plumbing".  Maybe adding an extra parameter to the script
> to let the user explicitly specify the common ancestor to use
> would be needed, but I would prefer git-pasky-merge to do its
> own magic (converting symbolic commit names into raw commit
> names and such) before calling this low level script.
> 
> That way people like me who have not migrated to your framework
> can still keep using it.  All the script currently needs is a
> bare git object database; i.e., nothing other than what is in
> .git/objects and a couple of commit record SHA1s as its
> parameters.  No .git/heads/, no .git/HEAD.local, no .git/tags,
> are involved for it to work, and I would prefer to keep things
> that way if possible.

I see, and I actually agree with it. However, I'll want merge-tree.pl to
do a little less than it does now for that, though. The mechanics in
"kernel" is fine as long as I can control policy in my "userspace". ;-)

BTW, the git* name sorta imply my toilet instead of the core plumbing, and
it'd be more consistent with the current plumbnaming; and could we have
it with the .pl extension, please? :-)

What I would like your script to do is therefore just do the merge in a
given already prepared (including built index) directory, with a passed
base. The base should be determined by a separate tool (I already saw
some patches); most future "science" will probably go to a clever
selection of this base, anyway.

This will give the tool maximal flexibility. E.g., then someone who
wants to can just merge with his working copy (if you don't give
checkout-cache -f - but why would you anyway), or do whatever other
cleverness he wants.

> >> * show-diff updates to add -r flag to squelch diffs for files not in
> >> the working directory.  This is mainly useful when verifying the
> >> result of an automated merge.
..snip..
> was too tired and did not think of a letter when I wrote it.  I
> guess '-r' stood for removed, but I agree it is a bad choice.
> Any objections to '-q'?

None here.

> >> +# Create a temporary directory and go there.
> >> +system 'rm', '-rf', ',,merge-temp';
> 
> PB> Can't we call it just ,,merge?
> 
> I'd rather have a command line option '-o' (scrapping the
> current '-o' and renaming it to something else; as you can see I
> am terrible at picking option names ;-)) to mean "output to this
> directory".  I am not really an Arch person so I do not
> particulary care about /^,,/.  How about "git~merge~$$"?

I'm all for an -o, and I don't mind ,, - I just don't want it uselessly
long. I hope "git~merge~$$" was a joke... :-)

> >> +for ((',,merge-temp', '.git')) { mkdir $_; chdir $_; }
> >> +symlink "../../.git/objects", "objects";
> >> +chdir '..';
> >> +
> >> +my $ancestor_tree = read_commit_tree($common);
> >> +system 'read-tree', $ancestor_tree;
> >> +
> >> +my %tree0 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[0]));
> >> +my %tree1 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[1]));
> >> +
> >> +my @ancestor_file = read_show_files();
> >> +my %ancestor_file = map { $_ => 1 } @ancestor_file;
> >> +
> >> +for (@ancestor_file) {
> >> +    if (! exists $tree0{$_} && ! exists $tree1{$_}) {

By the way, what about indentation with tabs? If you have a strong
opinion about this, I don't insist - but if you really don't mind/care
either way, it'd be great to use tabs as in the rest of the git code.

> >> +	if ($full_checkout) {
> >> +	    system 'checkout-cache', $_;
> >> +	}
> >> +	print STDERR "O - $_\n";
> 
> PB> Huh, what are you trying to do here? I think you should just record
> PB> remove, no? (And I wouldn't do anything with my read-tree. ;-)
> 
> At this moment in the script, we have run "read-tree" the
> ancestor so the dircache has the original.  %tree0 and %tree1
> both did not touch the path ($_ here) so it is the same as
> ancestor.  When '-f' is specified we are populating the output
> working tree with the merge result so that is what that
> 'checkout-cache' is about.  "O - $path" means "we took the
> original".

Aha! Thanks.

Is there a fundamental reason why the directory cache contains the
ancestor instead of the destination branch? It makes no sense to me and
I think the script actually does not fundamentally depend on it. My main
motivation is that the user can then trivially see what is he actually
going to commit to his destination branch, which would be bought for
free by that.

> The idea is to populate the dircache of merge-temp with the
> merge result and leave uncertain stuff as in the common ancestor
> state, so that the user can fix them starting from there.

And this is another thing I dislike a lot. I'd like merge-tree.pl to
leave my directory cache alone, thank you very much. You know, I see
what goes to the directory cache as actually part of the policy part.

What you actually do is interfering with my different policy choice,
which is to record stuff to index only at the time of commit (I've asked
about this and noone replied, so I assume it's an ok choice). show-diff
does the right thing for me then, and I don't need to care about losing
*any* information when replacing/rebuilding the index for any reason. I
have full control, and I like that. :-)

I'd be happy with parsing merge-tree.pl output and doing the right thing
on my side. Of course I could then blast away the tediously modified
index with my one, but I didn't need to do any such hacking before and
I'd prefer not to now either.

Actually, the only time I need to do explicit update-cache (with my
policy) when doing git merge is when deleting stuff or adding new stuff;
both of this is not so common as modifying, when I need not to do
anything.

> Maybe it is a good time for me to summarize the output somewhere
> in a document.
> 
>     O - $path	Tree-A and tree-B did not touch this; the result
>                 is taken from the ancestor (O for original).
> 
>     A D $path	Only tree-A (or tree-B) deleted this and the other
>     B D $path   branch did not touch this; the result is to delete.
> 
>     A M $path	Only tree-A (or tree-B) modified this and the other
>     B M $path   branch did not touch this; the result is to use one
>                 from tree-A (or tree-B).  This includes file
>                 creation case.

Could we please have the file creation case separately? Modification
is much more common and creation has pretty different consequences
(especially that it can't combine with anything else :-).

>     *DD $path	Both tree-A and tree-B deleted this; the result
>                 is to delete.
> 
>     *DM $path   Tree-A deleted while tree-B modified this (or
>     *MD $path   vice versa), and manual conflict resolution is
>                 needed; dircache is left as in the ancestor, and
>                 the modified file is saved as $path~A~ in the
>                 working directory.  The user can rename it to $path
>                 and run show-diff to see what Tree-A wanted to do
>                 and decide before running update-cache.
> 
>     *MM $path   Tree-A and tree-B did the exact same
>                 modification; the result is to use that.
> 
>     MRG $path   Tree-A and tree-B have different modifications;
>                 run "merge" and the merge result is left as
>                 $path in the working directory.

Hmm. I actually don't like this naming. I think it's not too consistent,
is irregular, therefore parsing it would be ugly. What I propose:

12c\tname <- legend
          <- original file
D         <- tree #1 removed file
 D        <- tree #2 removed file
DD        <- both trees removed file
M         <- tree #1 modified file
 M
DM*       <- conflict, tree #1 removed file, tree #2 modified file
MD*
MM        <- exact same modification
MM*       <- different modifications, merging

This is generic, theoretically scales well even to more trees, is easy
to parse trivially, still is human readable (actually the asterisk in
the 'conflict' column is there basically only for the humans), is
completely regular and consistent.

Now that we have the notion of tree A and tree B gone, I'd prefer to use
numbers instead of letters for the ~1~ and ~2~ suffixes. Not insisting,
though.

What do you think?

> In cases other than *DM, *MD, and MRG, the result is trivial and
> is recorded in the dircache.  Without '-o' (to be renamed ;-)
> nor '-f' there will not be a file checked out in the working
> directory for them.  The three merge cases need human attention.
> The dircache is not touched in these cases and left as the
> ancestor version, and the working directory gets some file as
> described above.
> 
> NOTE NOTE NOTE: I am not dealing with a case where both branches
> create the same file but with different contents.  In such a
> case the current code falls into MRG path without having a
> common ancestor, which is nonsense---I can use /dev/null as the
> common ancestor, I guess.  Also NOTE NOTE NOTE I need to detect

That might be the best way at least for the start, although I suspect
that merge will fail horribly this way even in the case of slightest
differences; still better than nothing.

> the case where one branch creates a directory while the other
> creates a file.  There is nothing an automated tool can do in
> that case but it needs to be detected and be told the user
> loudly.

Or when both branches create directories... ;-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 18:36                     ` Linus Torvalds
@ 2005-04-14 19:59                       ` Junio C Hamano
  2005-04-14 20:20                         ` Petr Baudis
  2005-04-15  0:42                         ` Linus Torvalds
  2005-04-15  2:21                       ` [Patch] ls-tree enhancements Junio C Hamano
  2005-04-15  9:14                       ` Merge with git-pasky II David Woodhouse
  2 siblings, 2 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-14 19:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Thu, 14 Apr 2005, Junio C Hamano wrote:

>> Sorry, I have not seen what you have been doing since pasky 0.3,
>> and I have not even started to understand the mental model of
>> the world your tool is building.  That said, my gut feeling is
>> that telling this script about git-pasky's world model might be
>> a mistake.  I'd rather see you consider the script as mere "part
>> of the plumbing". 

LT> I agree. Having separate abstraction layers is good.  I'm actually very 
LT> happy with Pasky's cleaned-up-tree, exactly because unlike the first one, 
LT> Pasky did a great job of maintaining the abstraction between "plumbing" 
LT> and user interfaces.

Agreed, not just with your agreeing with me, but with the
statement that Pasky did a good job (although I am ashamed to
say I have not caught up with the "userland" tools).

LT> The plumbing should take user interface needs into account, but the more
LT> conceptually separate it is ("does it makes sense on its own?") the better
LT> off we'll be. And "merge these two trees" (which works on a _tree_ level)
LT> or "find the common commit" (which works on a _commit_ level) look like 
LT> plumbing to me - the kind of things I should have written, if I weren't 
LT> such a lazy slob.

I am planning drop the ancestor computation from the script, and
make it another command line parameter to the script.  Dan
Barkalow's merge-base program should be used to compute it and
his result should drive the merge.  That sounds more UNIXy to
me.  I even may want to make the script take three trees not
commits, since the merge script does not need commits (it only
needs trees).  As plumbing it would be cleaner interface to it
to do so.  The wrapper SCM scripts can and should make sure it
is fed trees when the user gives it commits (or symbolic
representation of it like .git/tags/blah, or `cat .git/HEAD`).

But one different thing to note here.

You say "merge these two trees" above (I take it that you mean
"merge these two trees, taking account of this tree as their
common ancestor", so actually you are dealing with three trees),
and I am tending to agree with the notion of merging trees not
commits.  However you might get richer context and more sensible
resulting merge if you say "merge these two commits".  Since
commit chaining is part of the fundamental git object model you
may as well use it.

This however opens up another set of can of worms---it would
involve not just three trees but all the trees in the commit
chain in between.  That's when you start wondering if it would
be better to add renames in the git object model, which is the
topic of another thread.  I have not formed an opinion on that
one myself yet.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Live Merging from remote repositories
  2005-04-14 19:35                     ` Petr Baudis
@ 2005-04-14 20:01                       ` Barry Silverman
  2005-04-14 23:22                         ` Junio C Hamano
  2005-04-14 20:23                       ` Re: Merge with git-pasky II Erik van Konijnenburg
  2005-04-14 23:12                       ` Junio C Hamano
  2 siblings, 1 reply; 148+ messages in thread
From: Barry Silverman @ 2005-04-14 20:01 UTC (permalink / raw)
  To: Petr Baudis, Junio C Hamano; +Cc: Linus Torvalds, git

If you are merging from many distributed developers, than you would need to
replicate every one of their repositories into your own. Is this necessary?

I have been looking at Junio's code for merging, and it looks like it would
be (relatively) easy change to make it run live across two remote
repositories - assuming "future" science to develop remote common ancestor
lookup...

IE, merge.pl $COMMON-BASE $LOCAL-CHANGESET remote::$REMOTE-CHANGESET

To make this work, only a couple of things need to happen:
1) be able to remotely run "remote::diff-tree $BASE $REMOTE-CHANGESET", and
copy the results over the net to the place in the script where it is done
locally. This is not a LOT of data, and is bounded by the number of total
number of blobs in the resulting tree.

2) When a remote blob is required (for merging, or copying), then copy it
from the remote .git/objects to the local one. You only copy the blobs that
will end up in the merged result (or be used for file merge).

The way Junio has done it, no intermediate trees or commits are used...

You don't copy remote's tree of commits between $BASE and $REMOTE-CHANGESET,
or any of their associated trees and blobs (unless used to merge).

Is this a bug or a feature?


Barry Silverman


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14 19:59                       ` Junio C Hamano
@ 2005-04-14 20:20                         ` Petr Baudis
  2005-04-15  0:42                         ` Linus Torvalds
  1 sibling, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-14 20:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Dear diary, on Thu, Apr 14, 2005 at 09:59:04PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> On Thu, 14 Apr 2005, Junio C Hamano wrote:
> 
> >> Sorry, I have not seen what you have been doing since pasky 0.3,
> >> and I have not even started to understand the mental model of
> >> the world your tool is building.  That said, my gut feeling is
> >> that telling this script about git-pasky's world model might be
> >> a mistake.  I'd rather see you consider the script as mere "part
> >> of the plumbing". 
> 
> LT> I agree. Having separate abstraction layers is good.  I'm actually very 
> LT> happy with Pasky's cleaned-up-tree, exactly because unlike the first one, 

(Just a side-note - functionally and even organizationally, the cleaned
up tree does not differ significantly from the original one.)

> LT> Pasky did a great job of maintaining the abstraction between "plumbing" 
> LT> and user interfaces.
> 
> Agreed, not just with your agreeing with me, but with the
> statement that Pasky did a good job (although I am ashamed to
> say I have not caught up with the "userland" tools).

Thanks. :-)

> LT> The plumbing should take user interface needs into account, but the more
> LT> conceptually separate it is ("does it makes sense on its own?") the better
> LT> off we'll be. And "merge these two trees" (which works on a _tree_ level)
> LT> or "find the common commit" (which works on a _commit_ level) look like 
> LT> plumbing to me - the kind of things I should have written, if I weren't 
> LT> such a lazy slob.
> 
> I am planning drop the ancestor computation from the script, and
> make it another command line parameter to the script.  Dan
> Barkalow's merge-base program should be used to compute it and
> his result should drive the merge.  That sounds more UNIXy to
> me.

Good move, I say!

> I even may want to make the script take three trees not
> commits, since the merge script does not need commits (it only
> needs trees).  As plumbing it would be cleaner interface to it
> to do so.  The wrapper SCM scripts can and should make sure it
> is fed trees when the user gives it commits (or symbolic
> representation of it like .git/tags/blah, or `cat .git/HEAD`).

Agreed.

> But one different thing to note here.
> 
> You say "merge these two trees" above (I take it that you mean
> "merge these two trees, taking account of this tree as their
> common ancestor", so actually you are dealing with three trees),
> and I am tending to agree with the notion of merging trees not
> commits.  However you might get richer context and more sensible
> resulting merge if you say "merge these two commits".  Since
> commit chaining is part of the fundamental git object model you
> may as well use it.

Could you be more particular on the richer context etc?

I think this script should stay strictly on the level of trees. When
someone invents it, there could be a merge-commits script which does
something very smart about two commits, traversing the graph between
them etc, and doing a set of merge-tree invocations, possibly preparing
the staging area for them etc.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14 19:35                     ` Petr Baudis
  2005-04-14 20:01                       ` Live Merging from remote repositories Barry Silverman
@ 2005-04-14 20:23                       ` Erik van Konijnenburg
  2005-04-14 20:24                         ` Petr Baudis
  2005-04-14 23:12                       ` Junio C Hamano
  2 siblings, 1 reply; 148+ messages in thread
From: Erik van Konijnenburg @ 2005-04-14 20:23 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Linus Torvalds, git

On Thu, Apr 14, 2005 at 09:35:07PM +0200, Petr Baudis wrote:
> Hmm. I actually don't like this naming. I think it's not too consistent,
> is irregular, therefore parsing it would be ugly. What I propose:
> 
> 12c\tname <- legend
>           <- original file
> D         <- tree #1 removed file
>  D        <- tree #2 removed file
> DD        <- both trees removed file
> M         <- tree #1 modified file
>  M
> DM*       <- conflict, tree #1 removed file, tree #2 modified file
> MD*
> MM        <- exact same modification
> MM*       <- different modifications, merging
> 
> This is generic, theoretically scales well even to more trees, is easy
> to parse trivially, still is human readable (actually the asterisk in
> the 'conflict' column is there basically only for the humans), is
> completely regular and consistent.

Detail: perhaps use underscore instead of space, to avoid space/tab typos
that are invisible on paper and user friendly mail clients?

Regards,
Erik

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 23:12                       ` Junio C Hamano
@ 2005-04-14 20:24                         ` Christopher Li
  2005-04-14 23:31                         ` Petr Baudis
  1 sibling, 0 replies; 148+ messages in thread
From: Christopher Li @ 2005-04-14 20:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Linus Torvalds, git

Hi Junio,

I think if the merge tree belong to plumbing, you can do
even less in the merge.perl. You can just print out the
instruction for the upper level SCM what to to without
actually doing it yourself.

So you don't have to do touch anything in the tree.
That is the way I use in my previous python script.
You just print out some easy to modify  

e.g. in my python script it prints: (BTW, poor choice of print out name)

check out tree 253290af8b9ebc8565dd8de4cda24d0432a92b57
modify pre-process.c 7684c115a87e41a9226ce79478101c746cf22c34
3way-merge check.c dcb970cc1c5a83284dc5986abf07b6da76a8758c f77bfe119c19d928879091e0e3ee6debe3f1e1bf d315b43b025350d0107568a4d42cc2494d38621d

Your merge tree can do the smae.

Then the supper level SCM can easily follow instruction.
Save your effort and make no assumption what SCM module is.

Chris

On Thu, Apr 14, 2005 at 04:12:34PM -0700, Junio C Hamano wrote:
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> I think you are contradicting yourself for saying the above
> after agreeing with me that the script should just work on trees
> not commits.  My understanding is that the tools is just to
> merge two related trees relative to another ancestor tree,
> nothing more.  Especially, it should not care what is in the
> working directory---that is SCM person's business.
> 
> I am just trying to follow my understanding of what Linus
> wanted.  One of the guiding principle is to do as much things as
> in dircache without ever checking things out or touching working
> files unnecessarily.
> 
> PB> This will give the tool maximal flexibility.
> 
> I suspect it would force me to have a working directory
> populated with files, just to do a merge.
> 
> PB> I'm all for an -o, and I don't mind ,, - I just don't want it uselessly
> PB> long. I hope "git~merge~$$" was a joke... :-)
> 
> Which part do you object to?  PID part?  or tilde?  Would
> git~merge do, perhaps?  It probably would not matter to you
> because as an SCM you would always give an explicit --output
> parameter to the script anyway.
> 
> PB> By the way, what about indentation with tabs? If you have a
> PB> strong opinion about this, I don't insist - but if you
> PB> really don't mind/care either way, it'd be great to use tabs
> PB> as in the rest of the git code.
> 
> I do not have a strong opinion, but it is more trouble for me
> only because I am lazy and am used to the indentation my Emacs
> gives me.  I write code other than git, so changing Perl-mode
> indentation setting globally for all .pl files is not an option
> for me.  I'll see what I can do when I have time.
> 
> PB> Is there a fundamental reason why the directory cache
> PB> contains the ancestor instead of the destination branch?
> 
> Because you are thinking as an SCM person where there are
> distinction between tree-A and tree-B, two heads being merged.
> There is no "destination branch" nor "source branch" in what I
> am doing.  It is a merge of two equals derived from the same
> ancestor.
> 
> PB> I think the script actually does not fundamentally depend on it. My main
> PB> motivation is that the user can then trivially see what is he actually
> PB> going to commit to his destination branch, which would be bought for
> PB> free by that.
> 
> And again the user is *not* commiting to his "destination
> branch".  At the level I am working at, the merge result should
> be commited with two -p parameters to commit-tree --- tree-A and
> tree-B, both being equal parents from the POV of git object
> storage.
> 
> PB> And this is another thing I dislike a lot. I'd like merge-tree.pl to
> PB> leave my directory cache alone, thank you very much. You know, I see
> PB> what goes to the directory cache as actually part of the policy part.
> 
> Remember I am not touching *your* dircache.  It is a dircache in
> the temporary merge area, specifically set up to help you review
> the merge.  
> 
> Can't the SCM driver do things along this line, perhaps?
> 
>  - You have your working files and your dircache.  They may not
>    match because you have uncommitted changes to your
>    environment.  You want to merge with Linus head.  You know
>    its SHA1 (call it COMMIT-Linus).  Your SCM knows which commit
>    you started with (call it COMMIT-Current).
> 
>  - First you merge the tree associated with COMMIT-Current.  Use
>    it and COMMIT-Linus to find the common ancestor to use.
> 
>  - Now use the tree SHA of COMMIT-Current, tree SHA1 of
>    COMMIT-Linus, and tree SHA1 of the common ancestor commit to
>    drive git-merge.perl (to be renamed ;-).  You will get a
>    temporary directory.  Have your user examine what is in
>    there, and fix the merge and have them tell you they are
>    happy.
> 
>  - You go to that temporary directory, do write-tree and
>    commit-tree with -p parameter of COMMIT-Linus and
>    COMMIT-Current.  This will result in a new commit.  Call that
>    COMMIT-Merge.
> 
>  - You, as an SCM, should know what your user have done in the
>    working directory relative to COMMIT-Current.  Especially you
>    should know the set of paths involved in that change.  Go in
>    to the temporary area, checkout-cache those files if you have
>    not done so.  Apply the changes you have there.  Optionally
>    have the user examine the changes and have him confirm.  Lift
>    those files into the user's working directory.
> 
>  - Do your bookkeeping like "echo COMMIT-Merge >.git/Head", to
>    make the user's working files based on COMMIT-Merge, and run
>    read-tree using the COMMIT-Merge in the user's working
>    directory.  At this point, show-diff output should show what
>    the changes your user have had made if he had started working
>    based on COMMIT-Merge instead of starting from
>    COMMIT-Current.
> 
> I think the above would result in what SCM person would call
> "merge upstream/sidestream changes into my working directory".
> 

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: Merge with git-pasky II.
  2005-04-14 20:23                       ` Re: Merge with git-pasky II Erik van Konijnenburg
@ 2005-04-14 20:24                         ` Petr Baudis
  0 siblings, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-14 20:24 UTC (permalink / raw)
  To: Erik van Konijnenburg; +Cc: Junio C Hamano, Linus Torvalds, git

Dear diary, on Thu, Apr 14, 2005 at 10:23:26PM CEST, I got a letter
where Erik van Konijnenburg <ekonijn@xs4all.nl> told me that...
> On Thu, Apr 14, 2005 at 09:35:07PM +0200, Petr Baudis wrote:
> > Hmm. I actually don't like this naming. I think it's not too consistent,
> > is irregular, therefore parsing it would be ugly. What I propose:
> > 
> > 12c\tname <- legend
> >           <- original file
> > D         <- tree #1 removed file
> >  D        <- tree #2 removed file
> > DD        <- both trees removed file
> > M         <- tree #1 modified file
> >  M
> > DM*       <- conflict, tree #1 removed file, tree #2 modified file
> > MD*
> > MM        <- exact same modification
> > MM*       <- different modifications, merging
> > 
> > This is generic, theoretically scales well even to more trees, is easy
> > to parse trivially, still is human readable (actually the asterisk in
> > the 'conflict' column is there basically only for the humans), is
> > completely regular and consistent.
> 
> Detail: perhaps use underscore instead of space, to avoid space/tab typos
> that are invisible on paper and user friendly mail clients?

I'd go for dots in that case. Looks less intrusive. :^)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14 23:31                         ` Petr Baudis
@ 2005-04-14 20:30                           ` Christopher Li
  2005-04-14 20:37                             ` Christopher Li
  2005-04-15  0:58                           ` Junio C Hamano
  2005-04-15 10:22                           ` Junio C Hamano
  2 siblings, 1 reply; 148+ messages in thread
From: Christopher Li @ 2005-04-14 20:30 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Linus Torvalds, git

On Fri, Apr 15, 2005 at 01:31:59AM +0200, Petr Baudis wrote:
> > I am just trying to follow my understanding of what Linus
> > wanted.  One of the guiding principle is to do as much things as
> > in dircache without ever checking things out or touching working
> > files unnecessarily.
> 
> I'm just arguing that instead of directly touching the directory cache,
> you should just list what would you do there - and you already do this,

That is exactly what I suggest in the previous email. And my python script
does exactly that ;-)

Chris


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14 20:30                           ` Christopher Li
@ 2005-04-14 20:37                             ` Christopher Li
  2005-04-14 20:50                               ` Christopher Li
  0 siblings, 1 reply; 148+ messages in thread
From: Christopher Li @ 2005-04-14 20:37 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Linus Torvalds, git

Is that some thing you want to see? Maybe clean up the error printing.


Chris

--- /dev/null	2003-01-30 05:24:37.000000000 -0500
+++ merge.py	2005-04-14 16:34:39.000000000 -0400
@@ -0,0 +1,76 @@
+#!/usr/bin/env python
+
+import re
+import sys
+import os
+from pprint import pprint
+
+def get_tree(commit):
+    data = os.popen("cat-file commit %s"%commit).read()
+    return re.findall(r"(?m)^tree (\w+)", data)[0]
+
+PREFIX = 0
+PATH = -1
+SHA = -2
+ORIGSHA = -3
+
+def get_difftree(old, new):
+    lines = os.popen("diff-tree %s %s"%(old, new)).read().split("\x00")
+    patterns = (r"(\*)(\d+)->(\d+)\s(\w+)\s(\w+)->(\w+)\s(.*)",
+		r"([+-])(\d+)\s(\w+)\s(\w+)\s(.*)")
+    res = {}
+    for l in lines:
+	if not l: continue
+	for p in patterns:
+	    m = re.findall(p, l)
+	    if m:
+		m = m[0]
+		res[m[-1]] = m
+		break
+	else:
+	    raise "difftree: unknow line", l
+    return res
+
+def analyze(diff1, diff2):
+    diff1only = [ diff1[k] for k in diff1 if k not in diff2 ]
+    diff2only = [ diff2[k] for k in diff2 if k not in diff1 ]
+    both = [ (diff1[k],diff2[k]) for k in diff2 if k in diff1 ]
+
+    action(diff1only)
+    action(diff2only)
+    action_two(both)
+
+def action(diffs):
+    for act in diffs:
+	if act[PREFIX] == "*":
+	    print "modify", act[PATH], act[SHA]
+	elif act[PREFIX] == '-':
+	    print "remove", act[PATH], act[SHA]
+	elif act[PREFIX] == '+':
+	    print "add", act[PATH], act[SHA]
+	else:
+	    raise "unknow action"
+
+def action_two(diffs):
+    for act1, act2 in diffs:
+	if len(act1) == len(act2):	# same kind type
+	    if act1[PREFIX] == act2[PREFIX]:
+		if act1[SHA] == act2[SHA] or act1[PREFIX] == '-': 
+		    return action(act1)
+	    	if act1[PREFIX]=='*':
+		    print "do_merge", act1[PATH], act1[ORIGSHA], act1[SHA], act2[SHA]
+		    return
+	print "unable to handle", act[PATH]
+	print "one side wants", act1[PREFIX]
+	print "the other side wants", act2[PREFIX]
+	
+
+args = sys.argv[1:]
+if len(args)!=3:
+    print "Usage merge.py <common> <rev1> <rev2>"
+trees = map(get_tree, args)
+print "checkout-tree", trees[0]
+diff1 = get_difftree(trees[0], trees[1])
+diff2 = get_difftree(trees[0], trees[2])
+analyze(diff1, diff2)
+

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14 20:37                             ` Christopher Li
@ 2005-04-14 20:50                               ` Christopher Li
  0 siblings, 0 replies; 148+ messages in thread
From: Christopher Li @ 2005-04-14 20:50 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Linus Torvalds, git

BTW, I am not competing with Junio script. If that is the way
we all agree on. It is should be very easy for Junio to fix his
perl script. right?

Chris

On Thu, Apr 14, 2005 at 04:37:17PM -0400, Christopher Li wrote:
> Is that some thing you want to see? Maybe clean up the error printing.
> 
> 
> Chris
> 
> --- /dev/null	2003-01-30 05:24:37.000000000 -0500
> +++ merge.py	2005-04-14 16:34:39.000000000 -0400
> @@ -0,0 +1,76 @@
> +#!/usr/bin/env python
> +
> +import re
> +import sys
> +import os
> +from pprint import pprint
> +
> +def get_tree(commit):
> +    data = os.popen("cat-file commit %s"%commit).read()
> +    return re.findall(r"(?m)^tree (\w+)", data)[0]
> +
> +PREFIX = 0
> +PATH = -1
> +SHA = -2
> +ORIGSHA = -3
> +
> +def get_difftree(old, new):
> +    lines = os.popen("diff-tree %s %s"%(old, new)).read().split("\x00")
> +    patterns = (r"(\*)(\d+)->(\d+)\s(\w+)\s(\w+)->(\w+)\s(.*)",
> +		r"([+-])(\d+)\s(\w+)\s(\w+)\s(.*)")
> +    res = {}
> +    for l in lines:
> +	if not l: continue
> +	for p in patterns:
> +	    m = re.findall(p, l)
> +	    if m:
> +		m = m[0]
> +		res[m[-1]] = m
> +		break
> +	else:
> +	    raise "difftree: unknow line", l
> +    return res
> +
> +def analyze(diff1, diff2):
> +    diff1only = [ diff1[k] for k in diff1 if k not in diff2 ]
> +    diff2only = [ diff2[k] for k in diff2 if k not in diff1 ]
> +    both = [ (diff1[k],diff2[k]) for k in diff2 if k in diff1 ]
> +
> +    action(diff1only)
> +    action(diff2only)
> +    action_two(both)
> +
> +def action(diffs):
> +    for act in diffs:
> +	if act[PREFIX] == "*":
> +	    print "modify", act[PATH], act[SHA]
> +	elif act[PREFIX] == '-':
> +	    print "remove", act[PATH], act[SHA]
> +	elif act[PREFIX] == '+':
> +	    print "add", act[PATH], act[SHA]
> +	else:
> +	    raise "unknow action"
> +
> +def action_two(diffs):
> +    for act1, act2 in diffs:
> +	if len(act1) == len(act2):	# same kind type
> +	    if act1[PREFIX] == act2[PREFIX]:
> +		if act1[SHA] == act2[SHA] or act1[PREFIX] == '-': 
> +		    return action(act1)
> +	    	if act1[PREFIX]=='*':
> +		    print "do_merge", act1[PATH], act1[ORIGSHA], act1[SHA], act2[SHA]
> +		    return
> +	print "unable to handle", act[PATH]
> +	print "one side wants", act1[PREFIX]
> +	print "the other side wants", act2[PREFIX]
> +	
> +
> +args = sys.argv[1:]
> +if len(args)!=3:
> +    print "Usage merge.py <common> <rev1> <rev2>"
> +trees = map(get_tree, args)
> +print "checkout-tree", trees[0]
> +diff1 = get_difftree(trees[0], trees[1])
> +diff2 = get_difftree(trees[0], trees[2])
> +analyze(diff1, diff2)
> +
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* git merge
  2005-04-14  0:29 Merge with git-pasky II Petr Baudis
  2005-04-13 21:25 ` Christopher Li
  2005-04-14  0:30 ` Petr Baudis
@ 2005-04-14 22:11 ` Petr Baudis
  2 siblings, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-14 22:11 UTC (permalink / raw)
  To: git; +Cc: torvalds

  Hi,

  note that in my git tree there is a git merge implementation which
does out-of-tree merges now. It is still very trivial, and basically
just does something along the lines of (symbolically written)

	checkout-cache $(diff-tree)
	git diff $base $mergedbranch | git apply
	.. fix rejects etc ..
	git commit

  It seems to work, but it is only very lightly tested - it is likely
there are various tiny mistakes and typos in various unusual code paths
and other weird corners of the scripts. Testing is encouraged, and
especially patches fixing bugs you come over.

  It is designed in a way to make it possible to just replace the
checkout-cache and git diff | git apply steps with the merge-tree.pl
tool when it is finished.

  Thanks,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15  0:58                           ` Junio C Hamano
@ 2005-04-14 22:30                             ` Christopher Li
  2005-04-15  7:43                               ` Junio C Hamano
  2005-04-15 19:54                             ` Re: Merge with git-pasky II Petr Baudis
  1 sibling, 1 reply; 148+ messages in thread
From: Christopher Li @ 2005-04-14 22:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Linus Torvalds, git

On Thu, Apr 14, 2005 at 05:58:25PM -0700, Junio C Hamano wrote:
> 
> I do like, however, the idea of separating the step of doing any
> checkout/merge etc. and actually doing them.  So the command set
> of parse-your-output needs to be defined.  Based on what I have
> done so far, it would consist of the following:
> 
>  - Result is this object $SHA1 with mode $mode at $path (takes
>    one of the trees); you can do update-cache --cacheinfo (if
>    you want to muck with dircache) or cat-file blob (if you want
>    to get the file) or both.

Is that SHA1 for tree or the file object? If it is tree it don't
need the $mode any more.  If it is file you might need to emit
entry for it's parent directory, including the modes of directory.

> 
>  - Result is to delete $path.
> 
>  - Result is a merge between object $SHA1-1 and $SHA1-2 with
>    mode $mode-1 or $mode-2 at $path.
>
> Would this be a good enough command set?

And of course error/command for the files that unable to perform
auto merge. including information of both revisions. That needs
to be defined as well.

Chris


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 19:35                     ` Petr Baudis
  2005-04-14 20:01                       ` Live Merging from remote repositories Barry Silverman
  2005-04-14 20:23                       ` Re: Merge with git-pasky II Erik van Konijnenburg
@ 2005-04-14 23:12                       ` Junio C Hamano
  2005-04-14 20:24                         ` Christopher Li
  2005-04-14 23:31                         ` Petr Baudis
  2 siblings, 2 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-14 23:12 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, git

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> What I would like your script to do is therefore just do the
PB> merge in a given already prepared (including built index)
PB> directory, with a passed base. The base should be determined
PB> by a separate tool (I already saw some patches); most future
PB> "science" will probably go to a clever selection of this
PB> base, anyway.

I think you are contradicting yourself for saying the above
after agreeing with me that the script should just work on trees
not commits.  My understanding is that the tools is just to
merge two related trees relative to another ancestor tree,
nothing more.  Especially, it should not care what is in the
working directory---that is SCM person's business.

I am just trying to follow my understanding of what Linus
wanted.  One of the guiding principle is to do as much things as
in dircache without ever checking things out or touching working
files unnecessarily.

PB> This will give the tool maximal flexibility.

I suspect it would force me to have a working directory
populated with files, just to do a merge.

PB> I'm all for an -o, and I don't mind ,, - I just don't want it uselessly
PB> long. I hope "git~merge~$$" was a joke... :-)

Which part do you object to?  PID part?  or tilde?  Would
git~merge do, perhaps?  It probably would not matter to you
because as an SCM you would always give an explicit --output
parameter to the script anyway.

PB> By the way, what about indentation with tabs? If you have a
PB> strong opinion about this, I don't insist - but if you
PB> really don't mind/care either way, it'd be great to use tabs
PB> as in the rest of the git code.

I do not have a strong opinion, but it is more trouble for me
only because I am lazy and am used to the indentation my Emacs
gives me.  I write code other than git, so changing Perl-mode
indentation setting globally for all .pl files is not an option
for me.  I'll see what I can do when I have time.

PB> Is there a fundamental reason why the directory cache
PB> contains the ancestor instead of the destination branch?

Because you are thinking as an SCM person where there are
distinction between tree-A and tree-B, two heads being merged.
There is no "destination branch" nor "source branch" in what I
am doing.  It is a merge of two equals derived from the same
ancestor.

PB> I think the script actually does not fundamentally depend on it. My main
PB> motivation is that the user can then trivially see what is he actually
PB> going to commit to his destination branch, which would be bought for
PB> free by that.

And again the user is *not* commiting to his "destination
branch".  At the level I am working at, the merge result should
be commited with two -p parameters to commit-tree --- tree-A and
tree-B, both being equal parents from the POV of git object
storage.

PB> And this is another thing I dislike a lot. I'd like merge-tree.pl to
PB> leave my directory cache alone, thank you very much. You know, I see
PB> what goes to the directory cache as actually part of the policy part.

Remember I am not touching *your* dircache.  It is a dircache in
the temporary merge area, specifically set up to help you review
the merge.  

Can't the SCM driver do things along this line, perhaps?

 - You have your working files and your dircache.  They may not
   match because you have uncommitted changes to your
   environment.  You want to merge with Linus head.  You know
   its SHA1 (call it COMMIT-Linus).  Your SCM knows which commit
   you started with (call it COMMIT-Current).

 - First you merge the tree associated with COMMIT-Current.  Use
   it and COMMIT-Linus to find the common ancestor to use.

 - Now use the tree SHA of COMMIT-Current, tree SHA1 of
   COMMIT-Linus, and tree SHA1 of the common ancestor commit to
   drive git-merge.perl (to be renamed ;-).  You will get a
   temporary directory.  Have your user examine what is in
   there, and fix the merge and have them tell you they are
   happy.

 - You go to that temporary directory, do write-tree and
   commit-tree with -p parameter of COMMIT-Linus and
   COMMIT-Current.  This will result in a new commit.  Call that
   COMMIT-Merge.

 - You, as an SCM, should know what your user have done in the
   working directory relative to COMMIT-Current.  Especially you
   should know the set of paths involved in that change.  Go in
   to the temporary area, checkout-cache those files if you have
   not done so.  Apply the changes you have there.  Optionally
   have the user examine the changes and have him confirm.  Lift
   those files into the user's working directory.

 - Do your bookkeeping like "echo COMMIT-Merge >.git/Head", to
   make the user's working files based on COMMIT-Merge, and run
   read-tree using the COMMIT-Merge in the user's working
   directory.  At this point, show-diff output should show what
   the changes your user have had made if he had started working
   based on COMMIT-Merge instead of starting from
   COMMIT-Current.

I think the above would result in what SCM person would call
"merge upstream/sidestream changes into my working directory".


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Live Merging from remote repositories
  2005-04-14 20:01                       ` Live Merging from remote repositories Barry Silverman
@ 2005-04-14 23:22                         ` Junio C Hamano
  2005-04-15  1:07                           ` Question about git process model Barry Silverman
  0 siblings, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-14 23:22 UTC (permalink / raw)
  To: Barry Silverman; +Cc: Petr Baudis, Junio C Hamano, Linus Torvalds, git

>>>>> "BS" == Barry Silverman <barry@disus.com> writes:

I have not thought about remote issues at all, other than the
distribution mechanism vaguely outlined in my previous mail (not
cc'ed to git list but I would not mind if you reproduced it here
if somebody asked), so I am not qualified to comment on that
part of your message.

BS> The way Junio has done it, no intermediate trees or commits
BS> are used...

BS> Is this a bug or a feature?

I would call that a feature in that there is no need to look at
intermediate state.  I also might call that a misfeature in that
it may have resulted in a better merge if it looked at
intermediate state.

I just have this fuzzy feeling that, when doing this merge:

                     A-1 --- A-2 --- A-3
                    /                   \ 
    Common Ancestor                      Merge Result
                    \                   /
                     B-1 --- B-2 --- B-3

looking at diff(Common Ancestor, A-1), diff(Common Ancestor,
B-1), diff(A-1, A-2), ... might give you richer context than
just merging 3-way using Common Ancestor, A-3, and B-3 to derive
the Merge Result.  It might not.  I honestly do not know.

BTW, Pasky, the above paragraph is my answer to your question in
the other message <20050414202016.GC22699@pasky.ji.cz>:

> But one different thing to note here.
> 
> You say "merge these two trees" above (I take it that you mean
> "merge these two trees, taking account of this tree as their
> common ancestor", so actually you are dealing with three trees),
> and I am tending to agree with the notion of merging trees not
> commits.  However you might get richer context and more sensible
> resulting merge if you say "merge these two commits".  Since
> commit chaining is part of the fundamental git object model you
> may as well use it.

Pasky> Could you be more particular on the richer context etc?


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-14 23:12                       ` Junio C Hamano
  2005-04-14 20:24                         ` Christopher Li
@ 2005-04-14 23:31                         ` Petr Baudis
  2005-04-14 20:30                           ` Christopher Li
                                             ` (2 more replies)
  1 sibling, 3 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-14 23:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Dear diary, on Fri, Apr 15, 2005 at 01:12:34AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> PB> What I would like your script to do is therefore just do the
> PB> merge in a given already prepared (including built index)
> PB> directory, with a passed base. The base should be determined
> PB> by a separate tool (I already saw some patches); most future
> PB> "science" will probably go to a clever selection of this
> PB> base, anyway.
> 
> I think you are contradicting yourself for saying the above
> after agreeing with me that the script should just work on trees
> not commits.  My understanding is that the tools is just to
> merge two related trees relative to another ancestor tree,
> nothing more.  Especially, it should not care what is in the
> working directory---that is SCM person's business.

Yes. Isn't this exactly what I'm saying?

I'm arguing for doing less in my paragraph, you are arguing for doing
less in your paragraph, and we even seem to agree on the direction in
which we should do less.

> I am just trying to follow my understanding of what Linus
> wanted.  One of the guiding principle is to do as much things as
> in dircache without ever checking things out or touching working
> files unnecessarily.

I'm just arguing that instead of directly touching the directory cache,
you should just list what would you do there - and you already do this,
I think. So I'd be happy with a switch which would just do that and not
touch the directory cache. I'll parse your output and do the right thing
for me.

> PB> This will give the tool maximal flexibility.
> 
> I suspect it would force me to have a working directory
> populated with files, just to do a merge.

Why would that be so?

> PB> I'm all for an -o, and I don't mind ,, - I just don't want it uselessly
> PB> long. I hope "git~merge~$$" was a joke... :-)
> 
> Which part do you object to?  PID part?  or tilde?  Would
> git~merge do, perhaps?  It probably would not matter to you
> because as an SCM you would always give an explicit --output
> parameter to the script anyway.

Yes. I'll just override it with ,,merge, I think. So, do whatever you
want. ;-))

> PB> By the way, what about indentation with tabs? If you have a
> PB> strong opinion about this, I don't insist - but if you
> PB> really don't mind/care either way, it'd be great to use tabs
> PB> as in the rest of the git code.
> 
> I do not have a strong opinion, but it is more trouble for me
> only because I am lazy and am used to the indentation my Emacs
> gives me.  I write code other than git, so changing Perl-mode
> indentation setting globally for all .pl files is not an option
> for me.  I'll see what I can do when I have time.

Doesn't Emacs have something equivalent to ./.vimrc? I've also seen
those funny -*- strings.

Well, if it would mean a lot of trouble for you, just forget about it.

> PB> Is there a fundamental reason why the directory cache
> PB> contains the ancestor instead of the destination branch?
> 
> Because you are thinking as an SCM person where there are
> distinction between tree-A and tree-B, two heads being merged.
> There is no "destination branch" nor "source branch" in what I
> am doing.  It is a merge of two equals derived from the same
> ancestor.

That's a valid point of view too.

Actually, when you would have a mode in which you would not write to the
directory cache, do you need to read from it? You could do just direct
cat-files like for the other trees, and it would be even faster. Then,
you could do without a directory cache altogether in this mode.

> PB> And this is another thing I dislike a lot. I'd like merge-tree.pl to
> PB> leave my directory cache alone, thank you very much. You know, I see
> PB> what goes to the directory cache as actually part of the policy part.
> 
> Remember I am not touching *your* dircache.  It is a dircache in
> the temporary merge area, specifically set up to help you review
> the merge.  

Yes, but I want to have a control over its dircache too. :-) That is
because I want the user to be able to use the regular git commands like
"git diff" there.

> Can't the SCM driver do things along this line, perhaps?
> 
>  - You have your working files and your dircache.  They may not
>    match because you have uncommitted changes to your
>    environment.  You want to merge with Linus head.  You know
>    its SHA1 (call it COMMIT-Linus).  Your SCM knows which commit
>    you started with (call it COMMIT-Current).
> 
>  - First you merge the tree associated with COMMIT-Current.  Use
>    it and COMMIT-Linus to find the common ancestor to use.
> 
>  - Now use the tree SHA of COMMIT-Current, tree SHA1 of
>    COMMIT-Linus, and tree SHA1 of the common ancestor commit to
>    drive git-merge.perl (to be renamed ;-).  You will get a
>    temporary directory.  Have your user examine what is in
>    there, and fix the merge and have them tell you they are
>    happy.
> 
>  - You go to that temporary directory, do write-tree and
>    commit-tree with -p parameter of COMMIT-Linus and
>    COMMIT-Current.  This will result in a new commit.  Call that
>    COMMIT-Merge.
> 
>  - You, as an SCM, should know what your user have done in the
>    working directory relative to COMMIT-Current.  Especially you
>    should know the set of paths involved in that change.  Go in
>    to the temporary area, checkout-cache those files if you have
>    not done so.  Apply the changes you have there.  Optionally
>    have the user examine the changes and have him confirm.  Lift
>    those files into the user's working directory.
> 
>  - Do your bookkeeping like "echo COMMIT-Merge >.git/Head", to
>    make the user's working files based on COMMIT-Merge, and run
>    read-tree using the COMMIT-Merge in the user's working
>    directory.  At this point, show-diff output should show what
>    the changes your user have had made if he had started working
>    based on COMMIT-Merge instead of starting from
>    COMMIT-Current.
> 
> I think the above would result in what SCM person would call
> "merge upstream/sidestream changes into my working directory".

And that's exactly what I'm doing now with git merge. ;-) In fact,
ideally the whole change in my scripts when your script is finished
would be replacing

	checkout-cache `diff-tree` # symbolic
	git diff $base $merged | git apply

with

	merge-tree.pl -b $base $(tree-id) $merged | parse-your-output

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 19:59                       ` Junio C Hamano
  2005-04-14 20:20                         ` Petr Baudis
@ 2005-04-15  0:42                         ` Linus Torvalds
  2005-04-15  2:33                           ` Barry Silverman
  2005-04-15 10:02                           ` David Woodhouse
  1 sibling, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15  0:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git



On Thu, 14 Apr 2005, Junio C Hamano wrote:
>
> You say "merge these two trees" above (I take it that you mean
> "merge these two trees, taking account of this tree as their
> common ancestor", so actually you are dealing with three trees),

Yes. We're definitely talking three trees.

> and I am tending to agree with the notion of merging trees not
> commits.  However you might get richer context and more sensible
> resulting merge if you say "merge these two commits".  Since
> commit chaining is part of the fundamental git object model you
> may as well use it.

Yes and no. There are real advantages to using the commit state to just 
figure out the trees, and then at least have the _option_ to do the merge 
at a pure tree object.

In particular, if you ever find yourself wanting to graft together two
different commit histories, that almost certainly is what you'd want to
do. Somebody might have arrived at the exact same tree some other way,
starting with a 2.6.12 tar.ball or something, and I think we should at
least support the notion of saying "these two totally unrelated commits
actually have the same base tree, so let's merge them in "space" (ie data)
even if we can't really sanely join them in "time" (ie "commits").

I dunno.

And it's also a question of sanity. The fact is, we know how to make tree 
merges unambiguous, by just totally ignoring the history between them. Ie 
we know how to merge data. I am pretty damn sure that _nobody_ knows how 
to merge "data over time". Maybe BK does. I'm pretty sure it actually 
takes the "over time" into account. But My goal is to get something that 
works, and something that is reliable because it is simple and it has 
simple rules.

As you say:

> This however opens up another set of can of worms---it would
> involve not just three trees but all the trees in the commit
> chain in between.

Exactly.  I seriously believe that the model is _broken_, simply because 
it gets too complicated. At some point it boils down to "keep it simple, 
stupid".

>  That's when you start wondering if it would
> be better to add renames in the git object model, which is the
> topic of another thread.  I have not formed an opinion on that
> one myself yet.

I've not even been convinved that renames are worth it. Nobody has really 
given a good reason why.

There are two reasons for renames I can think of:

 - space efficiency in delta-based trees. This is a total non-issue for 
   git, and trying to explicitly track renames is going to cause _more_
   space to be wasted rather than less.

 - "annotate". Something git doesn't really handle anyway, and it has 
   little to do with renames. You can fake an annotate, but let's face it, 
   it's _always_ going to be depending on interpreting a diff. In fact, 
   that ends up how traditional SCM's do it too - they don't really 
   annotate lines, they just interpret the diff.

   I think you might as well interpret the whole object thing. Git _does_ 
   tell you how the objects changed, and I actually believe that a diff 
   that works in between objects (ie can show "these lines moved from this
   file X to tjhat file Y") is a _hell_ of a lot more powerful than
   "rename"  is.

   So I'd seriously suggest that instead of worryign about renames, people 
   think about global diffs that aren't per-file. Git is good at limiting 
   the changes to a set of objects, and it should be entirely possible to 
   think of diffs as ways of moving lines _between_ objects and not just
   within objects. It's quite common to move a function from one file to 
   another - certainly more so than renaming the whole file.

   In other words, I really believe renames are just a meaningless special 
   case of a much more interesting problem. Which is just one reason why 
   I'm not at all interested in bothering with them other than as a "data 
   moved" thing, which git already handles very well indeed.

So there,

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 23:31                         ` Petr Baudis
  2005-04-14 20:30                           ` Christopher Li
@ 2005-04-15  0:58                           ` Junio C Hamano
  2005-04-14 22:30                             ` Christopher Li
  2005-04-15 19:54                             ` Re: Merge with git-pasky II Petr Baudis
  2005-04-15 10:22                           ` Junio C Hamano
  2 siblings, 2 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15  0:58 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, git

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
>> I think the above would result in what SCM person would call
>> "merge upstream/sidestream changes into my working directory".

PB> And that's exactly what I'm doing now with git merge. ;-) In fact,
PB> ideally the whole change in my scripts when your script is finished
PB> would be replacing

PB> 	checkout-cache `diff-tree` # symbolic
PB> 	git diff $base $merged | git apply

PB> with

PB> 	merge-tree.pl -b $base $(tree-id) $merged | parse-your-output

In the above I presume by $merged you mean the tree ID (or
commit ID) the user's working directory is based upon?  Well,
merge-trees (Linus has a single directory merge-tree already)
looks at tree IDs (or commit IDs); it would never involve
working files in random state that is not recorded as part of a
tree (committed or not).  Given that constraints I am not sure
how well that would pan out.  I have to think about this a bit.

I do like, however, the idea of separating the step of doing any
checkout/merge etc. and actually doing them.  So the command set
of parse-your-output needs to be defined.  Based on what I have
done so far, it would consist of the following:

 - Result is this object $SHA1 with mode $mode at $path (takes
   one of the trees); you can do update-cache --cacheinfo (if
   you want to muck with dircache) or cat-file blob (if you want
   to get the file) or both.

 - Result is to delete $path.

 - Result is a merge between object $SHA1-1 and $SHA1-2 with
   mode $mode-1 or $mode-2 at $path.

Would this be a good enough command set?

PB> Doesn't Emacs have something equivalent to ./.vimrc? I've also seen
PB> those funny -*- strings.

The former is global per user (that is me including other Perl
files I work outside of git context), which is exactly what I
said is unacceptable to me.  The latter is per file (applying to
everybody else who touch the file), so if it is short and sweet
I should use one.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Question about git process model
  2005-04-14 23:22                         ` Junio C Hamano
@ 2005-04-15  1:07                           ` Barry Silverman
  0 siblings, 0 replies; 148+ messages in thread
From: Barry Silverman @ 2005-04-15  1:07 UTC (permalink / raw)
  To: 'Junio C Hamano'
  Cc: 'Petr Baudis', 'Linus Torvalds', git

JH->Junio Hamano, LT->Linus Torvalds

JH>>I just have this fuzzy feeling that, when doing this merge:

                     A-1 --- A-2 --- A-3
                    /                   \ 
    Common Ancestor                      Merge Result
                    \                   /
                     B-1 --- B-2 --- B-3

JH>>looking at diff(Common Ancestor, A-1), diff(Common Ancestor,
JH>>B-1), diff(A-1, A-2), ... might give you richer context than
JH>>just merging 3-way using Common Ancestor, A-3, and B-3 to derive
JH>>the Merge Result.  It might not.  I honestly do not know.

In the distributed git model, with lots of parallel development, and
lots of merging - Is it important at the "business process" level that
intermediate history (and content) be present for any forks off the
mainline (in particular, forks maintained by someone else)?

Git has the property that it is NOT delta based. In Junio's example
above, if branch A were your mainline, and branch B were imported
changes from elsewhere, Is it necessary to have B1, B2 available to you,
when all that was required for you to merge successfully was B3?

Which leads me to....

LT>>In particular, if you ever find yourself wanting to graft together
two LT>>different commit histories, that almost certainly is what you'd
want to LT>>do. Somebody might have arrived at the exact same tree some
other way, LT>>starting with a 2.6.12 tar.ball or something, and I think
we should at LT>>least support the notion of saying "these two totally
unrelated commits LT>>actually have the same base tree, so let's merge
them in "space" (ie LT>>data) even if we can't really sanely join them
in "time" (ie "commits").

So in the case of only merging B3 - we would write into the commit
record of
Merge-Result, that it was parented by A1, and another new commit record
-> B3-Prime.

B3-Prime is space-wise identical to B3, but "time-wise" different.

B3-Prime would be different than B3 because it would necessarily not
have the same SHA1 as B3. Why? B3 has B2 as a parent, B3-Prime has the
Common Ancestor as a parent - thus the Commit record is different, and
so is the SHA1. 

We would need a facility to recognize that B3 and B3-Prime were
space-wise the same, (and maybe have the SHA for B3 kept in somewhere in
the SCM portion of B3-Prime's commit record???)

Linus, is that what you were saying?




^ permalink raw reply	[flat|nested] 148+ messages in thread

* [Patch] ls-tree enhancements
  2005-04-14 18:36                     ` Linus Torvalds
  2005-04-14 19:59                       ` Junio C Hamano
@ 2005-04-15  2:21                       ` Junio C Hamano
  2005-04-15 16:13                         ` Petr Baudis
  2005-04-15  9:14                       ` Merge with git-pasky II David Woodhouse
  2 siblings, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15  2:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git

This adds '-r' (recursive) option and '-z' (NUL terminated)
option to ls-tree.  I need it so that the merge-trees (formerly
known as git-merge.perl) script does not need to create any
temporary dircache while merging.  It used to use show-files on
a temporary dircache to get the list of files in the ancestor
tree, and also used the dircache to store the result of its
automerge.  I probably still need it for the latter reason, but
with this patch not for the former reason anymore.

It is relative to bb95843a5a0f397270819462812735ee29796fb4

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 ls-tree.c |  108 +++++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 90 insertions(+), 18 deletions(-)


--- ,,Linus/ls-tree.c	2005-04-14 19:08:17.000000000 -0700
+++ ,,Siam/ls-tree.c	2005-04-14 19:11:23.000000000 -0700
@@ -5,45 +5,117 @@
  */
 #include "cache.h"
 
-static int list(unsigned char *sha1)
+int line_termination = '\n';
+int recursive = 0;
+
+struct path_prefix {
+	struct path_prefix *prev;
+	const char *name;
+};
+
+static void print_path_prefix(struct path_prefix *prefix)
 {
-	void *buffer;
-	unsigned long size;
-	char type[20];
+	if (prefix) {
+		if (prefix->prev)
+			print_path_prefix(prefix->prev);
+		fputs(prefix->name, stdout);
+		putchar('/');
+	}
+}
+
+static void list_recursive(void *buffer,
+			  unsigned char *type,
+			  unsigned long size,
+			  struct path_prefix *prefix)
+{
+	struct path_prefix this_prefix;
+	this_prefix.prev = prefix;
 
-	buffer = read_sha1_file(sha1, type, &size);
-	if (!buffer)
-		die("unable to read sha1 file");
 	if (strcmp(type, "tree"))
 		die("expected a 'tree' node");
+
 	while (size) {
-		int len = strlen(buffer)+1;
-		unsigned char *sha1 = buffer + len;
-		char *path = strchr(buffer, ' ')+1;
+		int namelen = strlen(buffer)+1;
+		void *eltbuf;
+		char elttype[20];
+		unsigned long eltsize;
+		unsigned char *sha1 = buffer + namelen;
+		char *path = strchr(buffer, ' ') + 1;
 		unsigned int mode;
-		unsigned char *type;
 
-		if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
+		if (size < namelen + 20 || sscanf(buffer, "%o", &mode) != 1)
 			die("corrupt 'tree' file");
 		buffer = sha1 + 20;
-		size -= len + 20;
+		size -= namelen + 20;
+
 		/* XXX: We do some ugly mode heuristics here.
 		 * It seems not worth it to read each file just to get this
-		 * and the file size. -- pasky@ucw.cz */
-		type = S_ISDIR(mode) ? "tree" : "blob";
-		printf("%03o\t%s\t%s\t%s\n", mode, type, sha1_to_hex(sha1), path);
+		 * and the file size. -- pasky@ucw.cz
+		 * ... that is, when we are not recursive -- junkio@cox.net
+		 */
+		eltbuf = (recursive ? read_sha1_file(sha1, elttype, &eltsize) :
+			  NULL);
+		if (! eltbuf) {
+			if (recursive)
+				error("cannot read %s", sha1_to_hex(sha1));
+			type = S_ISDIR(mode) ? "tree" : "blob";
+		}
+		else
+			type = elttype;
+
+		printf("%03o\t%s\t%s\t", mode, type, sha1_to_hex(sha1));
+		print_path_prefix(prefix);
+		fputs(path, stdout);
+		putchar(line_termination);
+
+		if (eltbuf && !strcmp(type, "tree")) {
+			this_prefix.name = path;
+			list_recursive(eltbuf, elttype, eltsize, &this_prefix);
+		}
+		free(eltbuf);
 	}
+}
+
+static int list(unsigned char *sha1)
+{
+	void *buffer;
+	unsigned long size;
+	char type[20];
+
+	buffer = read_sha1_file(sha1, type, &size);
+	if (!buffer)
+		die("unable to read sha1 file");
+	list_recursive(buffer, type, size, NULL);
 	return 0;
 }
 
+static void _usage(void)
+{
+	usage("ls-tree [-r] [-z] <key>");
+}
+
 int main(int argc, char **argv)
 {
 	unsigned char sha1[20];
 
+	while (1 < argc && argv[1][0] == '-') {
+		switch (argv[1][1]) {
+		case 'z':
+			line_termination = 0;
+			break;
+		case 'r':
+			recursive = 1;
+			break;
+		default:
+			_usage();
+		}
+		argc--; argv++;
+	}
+
 	if (argc != 2)
-		usage("ls-tree <key>");
+		_usage();
 	if (get_sha1_hex(argv[1], sha1) < 0)
-		usage("ls-tree <key>");
+		_usage();
 	sha1_file_directory = getenv(DB_ENVIRONMENT);
 	if (!sha1_file_directory)
 		sha1_file_directory = DEFAULT_DB_ENVIRONMENT;


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: Merge with git-pasky II.
  2005-04-15  0:42                         ` Linus Torvalds
@ 2005-04-15  2:33                           ` Barry Silverman
  2005-04-15 10:02                           ` David Woodhouse
  1 sibling, 0 replies; 148+ messages in thread
From: Barry Silverman @ 2005-04-15  2:33 UTC (permalink / raw)
  To: 'Linus Torvalds', 'Junio C Hamano'
  Cc: 'Petr Baudis', git

>>In particular, if you ever find yourself wanting to graft together two
>>different commit histories, that almost certainly is what you'd want
to >>do. Somebody might have arrived at the exact same tree some other
way, >>starting with a 2.6.12 tar.ball or something, and I think we
should at >>least support the notion of saying "these two totally
unrelated commits >>actually have the same base tree, so let's merge
them in "space" (ie data) >>even if we can't really sanely join them in
"time" (ie "commits").

If this is true - then the tree-id's of the two commits would be
identical, but the commit-id's wouldn't.

Does this imply that common ancestor lookup should work by comparing the
tree-id's (space-wise the same) rather than the commit-ids (time-wise
the same)?

-----Original Message-----
From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On
Behalf Of Linus Torvalds
Sent: Thursday, April 14, 2005 8:43 PM
To: Junio C Hamano
Cc: Petr Baudis; git@vger.kernel.org
Subject: Re: Merge with git-pasky II.



On Thu, 14 Apr 2005, Junio C Hamano wrote:
>
> You say "merge these two trees" above (I take it that you mean
> "merge these two trees, taking account of this tree as their
> common ancestor", so actually you are dealing with three trees),

Yes. We're definitely talking three trees.

> and I am tending to agree with the notion of merging trees not
> commits.  However you might get richer context and more sensible
> resulting merge if you say "merge these two commits".  Since
> commit chaining is part of the fundamental git object model you
> may as well use it.

Yes and no. There are real advantages to using the commit state to just 
figure out the trees, and then at least have the _option_ to do the
merge 
at a pure tree object.

In particular, if you ever find yourself wanting to graft together two
different commit histories, that almost certainly is what you'd want to
do. Somebody might have arrived at the exact same tree some other way,
starting with a 2.6.12 tar.ball or something, and I think we should at
least support the notion of saying "these two totally unrelated commits
actually have the same base tree, so let's merge them in "space" (ie
data)
even if we can't really sanely join them in "time" (ie "commits").

I dunno.

And it's also a question of sanity. The fact is, we know how to make
tree 
merges unambiguous, by just totally ignoring the history between them.
Ie 
we know how to merge data. I am pretty damn sure that _nobody_ knows how

to merge "data over time". Maybe BK does. I'm pretty sure it actually 
takes the "over time" into account. But My goal is to get something that

works, and something that is reliable because it is simple and it has 
simple rules.

As you say:

> This however opens up another set of can of worms---it would
> involve not just three trees but all the trees in the commit
> chain in between.

Exactly.  I seriously believe that the model is _broken_, simply because

it gets too complicated. At some point it boils down to "keep it simple,

stupid".

>  That's when you start wondering if it would
> be better to add renames in the git object model, which is the
> topic of another thread.  I have not formed an opinion on that
> one myself yet.

I've not even been convinved that renames are worth it. Nobody has
really 
given a good reason why.

There are two reasons for renames I can think of:

 - space efficiency in delta-based trees. This is a total non-issue for 
   git, and trying to explicitly track renames is going to cause _more_
   space to be wasted rather than less.

 - "annotate". Something git doesn't really handle anyway, and it has 
   little to do with renames. You can fake an annotate, but let's face
it, 
   it's _always_ going to be depending on interpreting a diff. In fact, 
   that ends up how traditional SCM's do it too - they don't really 
   annotate lines, they just interpret the diff.

   I think you might as well interpret the whole object thing. Git
_does_ 
   tell you how the objects changed, and I actually believe that a diff 
   that works in between objects (ie can show "these lines moved from
this
   file X to tjhat file Y") is a _hell_ of a lot more powerful than
   "rename"  is.

   So I'd seriously suggest that instead of worryign about renames,
people 
   think about global diffs that aren't per-file. Git is good at
limiting 
   the changes to a set of objects, and it should be entirely possible
to 
   think of diffs as ways of moving lines _between_ objects and not just
   within objects. It's quite common to move a function from one file to

   another - certainly more so than renaming the whole file.

   In other words, I really believe renames are just a meaningless
special 
   case of a much more interesting problem. Which is just one reason why

   I'm not at all interested in bothering with them other than as a
"data 
   moved" thing, which git already handles very well indeed.

So there,

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15  7:43                               ` Junio C Hamano
@ 2005-04-15  6:28                                 ` Christopher Li
  2005-04-15 11:11                                   ` Junio C Hamano
  0 siblings, 1 reply; 148+ messages in thread
From: Christopher Li @ 2005-04-15  6:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Linus Torvalds, git

On Fri, Apr 15, 2005 at 12:43:47AM -0700, Junio C Hamano wrote:
> >>>>> "CL" == Christopher Li <git@chrisli.org> writes:
> 
> CL> Is that SHA1 for tree or the file object?
> 
> I am talking about a single file here.
>
Then do you emit the entry for it's parents directory?

e.g. /foo/bar get created. foo doesn't exists. You have
to create foo first. You don't have mode information for
foo yet. If it give the top level tree, the SCM can check it
out by tree. hopefully have the mode on directory correctly.
Well, if they care about those little details.

Chris
 

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 22:30                             ` Christopher Li
@ 2005-04-15  7:43                               ` Junio C Hamano
  2005-04-15  6:28                                 ` Christopher Li
  0 siblings, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15  7:43 UTC (permalink / raw)
  To: Christopher Li; +Cc: Petr Baudis, Linus Torvalds, git

>>>>> "CL" == Christopher Li <git@chrisli.org> writes:

>> - Result is this object $SHA1 with mode $mode at $path (takes
>> one of the trees); you can do update-cache --cacheinfo (if
>> you want to muck with dircache) or cat-file blob (if you want
>> to get the file) or both.

CL> Is that SHA1 for tree or the file object?

I am talking about a single file here.



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 18:36                     ` Linus Torvalds
  2005-04-14 19:59                       ` Junio C Hamano
  2005-04-15  2:21                       ` [Patch] ls-tree enhancements Junio C Hamano
@ 2005-04-15  9:14                       ` David Woodhouse
  2005-04-15  9:36                         ` Ingo Molnar
                                           ` (2 more replies)
  2 siblings, 3 replies; 148+ messages in thread
From: David Woodhouse @ 2005-04-15  9:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git

On Thu, 2005-04-14 at 11:36 -0700, Linus Torvalds wrote:
> And "merge these two trees" (which works on a _tree_ level)
> or "find the common commit" (which works on a _commit_ level)

I suspect that finding the common commit is actually a per-file thing;
it's not just something you do for the _commit_ graph, then use for
merging each file in the two branches you're trying to merge.

Consider a simple repository which contains two files A and B. We start
off with the first version of each ('A1B1'), and the owner of each file
takes a branch and modifies their own file. There is cross-pulling
between the two, and then each modifies the _other's_ file as well as
their own...

   (A1B2)--(A2B2)--(A2'B3)
    /  \   /            \
   /    \ /              \
 (A1B1)  X               (...)
   \    / \              /
    \  /   \            /
   (A2B1)--(A2B2)--(A3B2')

Now, we're trying to merge the two branches. It appears that the most
useful common ancestor to use for a three-way merge of file A is the
version from tree 'A2B1', while the most useful common ancestor for
merging file B is that in 'A1B2'.

(I think it's a coincidence that in my example the useful files 'A2' and
'B2' actually do end up in a single tree together at some point.)

-- 
dwmw2



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15  9:14                       ` Merge with git-pasky II David Woodhouse
@ 2005-04-15  9:36                         ` Ingo Molnar
  2005-04-15 10:05                           ` David Woodhouse
  2005-04-15 12:03                         ` Johannes Schindelin
  2005-04-15 14:53                         ` Linus Torvalds
  2 siblings, 1 reply; 148+ messages in thread
From: Ingo Molnar @ 2005-04-15  9:36 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, Junio C Hamano, Petr Baudis, git


* David Woodhouse <dwmw2@infradead.org> wrote:

> Consider a simple repository which contains two files A and B. We 
> start off with the first version of each ('A1B1'), and the owner of 
> each file takes a branch and modifies their own file. There is 
> cross-pulling between the two, and then each modifies the _other's_ 
> file as well as their own...
> 
>    (A1B2)--(A2B2)--(A2'B3)
>     /  \   /            \
>    /    \ /              \
>  (A1B1)  X               (...)
>    \    / \              /
>     \  /   \            /
>    (A2B1)--(A2B2)--(A3B2')
> 
> Now, we're trying to merge the two branches. It appears that the most 
> useful common ancestor to use for a three-way merge of file A is the 
> version from tree 'A2B1', while the most useful common ancestor for 
> merging file B is that in 'A1B2'.

do such cases occur frequently? In the kernel at least it's not too 
typical. Would it be a problem to go for the simple solution of using 
(A1B1) as the common ancestor (based on the tree graph), and then to do 
a 3-way merge of all changes from that point on?

	Ingo

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15  0:42                         ` Linus Torvalds
  2005-04-15  2:33                           ` Barry Silverman
@ 2005-04-15 10:02                           ` David Woodhouse
  2005-04-15 15:32                             ` Linus Torvalds
  1 sibling, 1 reply; 148+ messages in thread
From: David Woodhouse @ 2005-04-15 10:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git

On Thu, 2005-04-14 at 17:42 -0700, Linus Torvalds wrote:
> I've not even been convinved that renames are worth it. Nobody has
> really given a good reason why.
> 
> There are two reasons for renames I can think of:
> 
>  - space efficiency in delta-based trees.
>  - "annotate".

Neither of those were my motivation for looking at renames. The reasons
I wanted to track renames were:
   - Per-file revision history which doesn't stop dead at a rename.
   - Merging where files have been renamed in one branch and modified in
     another. Which is basically a special case of the above; we need to
     see the per-file revision history.

>    So I'd seriously suggest that instead of worryign about renames, people 
>    think about global diffs that aren't per-file. Git is good at limiting 
>    the changes to a set of objects, and it should be entirely possible to 
>    think of diffs as ways of moving lines _between_ objects and not just
>    within objects. It's quite common to move a function from one file to 
>    another - certainly more so than renaming the whole file.
>
>    In other words, I really believe renames are just a meaningless special 
>    case of a much more interesting problem. Which is just one reason why 
>    I'm not at all interested in bothering with them other than as a "data 
>    moved" thing, which git already handles very well indeed.

Git doesn't handle 'data moved' except at a whole-tree level. For each
commit, it says "these are the old trees; this is the new tree".

Git doesn't actually look hard into the contents of tree; certainly it
has no business looking at the contents of individual files; that is
something that the SCM or possibly only the user should do. The storage
of 'rename' information in the commit object is another kind of 'xattr'
storage which git would provides but not directly interpret.

And you're right; it shouldn't have to be for renames only. There's no
need for us to limit it to one "source" and one "destination"; the SCM
can use it to track content as it sees fit.

As I said, the main aim of this is to track revision history of given
content, for displaying to the user and for performing merges. So when a
file is split up, or a function is moved from it to another file, a
'rename' xattr can be included to mark that files 'foo' and 'bar' in the
new tree are both associated with file 'wibble' in the parent.

That's as much as we need to provide for content tracking, and it _does_
handle the general case as well as we should be attempting to. We don't
want to get into dealing with file contents ourselves; we just want to
store the hint for the SCM or the user that "your data went thataway".

-- 
dwmw2



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15  9:36                         ` Ingo Molnar
@ 2005-04-15 10:05                           ` David Woodhouse
  2005-04-15 14:53                             ` Ingo Molnar
  0 siblings, 1 reply; 148+ messages in thread
From: David Woodhouse @ 2005-04-15 10:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Junio C Hamano, Petr Baudis, git

On Fri, 2005-04-15 at 11:36 +0200, Ingo Molnar wrote:
> do such cases occur frequently? In the kernel at least it's not too 
> typical. 

Isn't it? I thought it was a fairly accurate representation of the
process "I make a whole bunch of changes to files I maintain, pulling
from Linus while occasionally asking him to pull from my tree. Sometimes
my files are changed by someone else in Linus' tree, and sometimes I
change files that I don't actually own.".

-- 
dwmw2



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 12:03                         ` Johannes Schindelin
@ 2005-04-15 10:22                           ` Theodore Ts'o
  0 siblings, 0 replies; 148+ messages in thread
From: Theodore Ts'o @ 2005-04-15 10:22 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: David Woodhouse, Linus Torvalds, Junio C Hamano, Petr Baudis, git

On Fri, Apr 15, 2005 at 02:03:08PM +0200, Johannes Schindelin wrote:
> I disagree. In order to be trusted, this thing has to catch the following
> scenario:
> 
> Skywalker and Solo start from the same base. They commit quite a lot to
> their trees. In between, Skywalker commits a tree, where the function
> "kazoom()" has been added to the file "deathstar.c", but Solo also added
> this function, but to the file "moon.c". A file-based merge would have no
> problem merging each file, such that in the end, "kazoom()" is defined
> twice.
> 
> The same problems arise when one tries to merge line-wise, i.e. when for
> each line a (possibly different) merge-parent is sought.

Be careful.  There is a very big tradeoff between 100% perfections in
catching these sorts of errors, and usability.  There exists SCM's
where you are not allowed to do commit such merges until you do a test
compile, or run a regression test suite (that being the only way to
catch these sorts of problems when we merge two branches like this).  

BitKeeper never caught this sort of thing, and we trusted it.  In
practice it was also rarely a problem.

I'll also note that BitKeeper doesn't restrict you from doing a
committing a changeset when you have modified files that have yet to
be checked in to the tree.  Same issue; you can accidentally check in
changesets that in trees that won't build, but if we added this kind
of SCM-by-straightjacket philosophy it would decrease our productivity
and people would simply not use such an SCM, thus negating its
effectiveness.

						- Ted

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14 23:31                         ` Petr Baudis
  2005-04-14 20:30                           ` Christopher Li
  2005-04-15  0:58                           ` Junio C Hamano
@ 2005-04-15 10:22                           ` Junio C Hamano
  2005-04-15 20:40                             ` Petr Baudis
  2 siblings, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 10:22 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, git

After I re-read [*R1*], in which Linus talks about dircache,
especially this section:

 - The "current directory cache" describes some baseline. In particular,
   note the "some" part. It's not tied to any special baseline, and you
   can change your baseline any way you please.

   So it does NOT have to track any particular state in either the object 
   database _or_ in your actual current working tree. In fact, all real 
   interactions with "git" are really about updating this staging area one 
   way or the other: you might check out the state from it into your 
   working area (partially or fully), you can push your working area into 
   the staging area (again, partially or fully).

   And if you want to, you can write the thing that the staging area 
   represents as a "tree" into the object database, or you can merge a 
   tree from the object database into the staging area.

   In other words: the staging area aka "current directory cache" is 
   really how all interaction takes place. The object database never 
   interacts directly with your working directory contents. ALL 
   interactions go through the current directory cache.

I started to have more doubts on the approach of *not*
performing the merge in the dircache I set up specifically for
merging, which is the direction in which you are pushing if I
understand you correctly.  Maybe I completely misunderstand what
you want.  This message is long but I need a clear understanding
of what is expected to be useful to you, so please bear with me.

PB> 	merge-tree.pl -b $base $(tree-id) $merged | parse-your-output

Please help me understand this example you have given earlier.
Here is my understanding of your assumption when the above
pipeline takes place.  Correct me if I am mistaken.

 * The user is in a working directory $W.  It is controlled by
   git-tools and there are $W/.git/. directory and $W/.git/index
   dircache.

 * The dircache $W/.git/index started its life as a read-tree
   from some commit.  The git-tools is keeping track of which
   commit it is somewhere, presumably in $W/.git/ directory.
   Let's call it $C (commit).

 ? Question.  Is the $(tree-id) in your example the same as $C
   above?

 * The user have run [*1*] (see Footnote below) checkout-cache
   on $W/.git/index some time in the past and $W is full of
   working files.  Some of them may or may not have modified.
   There may be some additions or deletions.  So the contents of
   the working directory may not match the tree associated with
   $C.

 * The user may or may not have run [*1*] update-cache in $W.
   The contents of the dircache $W/.git/index may not match the
   tree associated with $C.

 ? Question.  Are you forbidding the user to run update-cache by
   hand, and keeping track of the changes yourself, to be
   applied all at once at "git commit" time, thereby
   guaranteeing the $W/.git/index to match the tree associated
   with $C all times?  From the description of The "GIT toolkit"
   section in README, it is not clear to me which part of his
   repository an end user is not supposed to muck with himself.

 * Now the user has some changes in his working directory and
   notices upstream or a side branch has notable changes
   desireble to be picked up.  So he runs some git-tools command
   to cause the above quoted pipeline to run.

 ? Question.  Does $merged in your example mean such an upstream
   or side branch?  Is $base in your example the common ancestor
   between $C and $merged?

Assuming that my above understanding of your model is correct,
here are my "thinking aloud".

 - "merge-trees $base $C $merged" looks only at the git object
   database for those three trees named.  The data structure of
   git object database is optimized to distinguish differences
   in those recorded trees (and hence recorded blobs they point
   at) without unpacking most of the files if the changes are
   small, because all the blobs involved are already hashed.  It
   is not very good at comparing things in git object store and
   working files in random states, which would involve unpacking
   blobs and comparing, so "merge-trees" does not bother.

 - What can come out from merge-trees is therefore one of the
   following for each path from the union of paths contained in
   $base, $C, and $merged:

   (a) Neither $C nor $merged changed it --- merge result is what
       is in $C.

   (b) $C changed it but $merged did not --- merge result is what
       is in $C.

   (c) Both $C and $merged changed it in the same way --- merge
       result is what is in $C.

   (d) $C did not change it but $merged did --- merge result is
       what is in $merged.

   (e) Both $C and $merged changed it differently --- merge is
       needed and automatically succeeds between $C and $merge.

   (f) Both $C and $merged changed it differently --- merge is
       needed but have conflicts.

 - Assuming we are dealing with the case where working files are
   dirty and do not match what is in $C, among the above,
   (a)-(c) can be ignored by SCM.  What the user has in his
   working files is exactly what he would have got if he started
   working from the merge result, although in reality the work
   was started from $C.

   Handling (d), (e) and (f) from SCM's point of view would be
   the same.  They all involve 3-way merges between the file in
   the working directory, and the file from $merged, pivoting on
   the file from $base.  In order to help SCM, merge-trees
   therefore should output SHA1 of blobs for such a file from
   $base and $merged and expect SCM to run "cat-file blob" on
   them and then merge or diff3.  Up to the point of giving
   those two SHA1 out is the business of merge-trees and after
   that it is up to SCM.

   That would work.  So I should base the design of output from
   merge-trees on the above analysis, which probably needs to be
   extended to cover differences between creation, modification,
   and deletion.

 - However, the above is quite different from the way Linus
   envisioned initially, on which my current implementation is
   based [*3*].

   My current implementation is to record the merge outcome in
   the temporary dircache $W/,,merge/.git/index for cases
   (a)-(e).  The last case (f) is problematic and needs human
   validation [*2*], so it is not recorded in that temporary
   dircache, but the files to be merged are left in that
   temporary directory and merge-trees stops there.  It is
   expected that the end-user or SCM would merge the resulting
   file and run update-cache to update $W/,,merge/.git/index.
   After that happens, $W/,,merge/.git/index has the tree
   representing the desired result of the merge.  It is expected
   that the end-user or SCM would write-tree, commit-tree there
   in the temporary directory, creating a new commit $C1.

   Then, it is expected that the SCM would make a patch file
   between $C and the user working directory, checks out $C1
   (either in the user's working directory or another temporary
   directory; at this point merge-trees does not care because it
   has already done its job and exited), applies that patch to
   bring the user edits over to $C1.  Then that directory would
   contain the desired merge of user edits.

   That is my understanding of how Linus originally wanted the
   tool to do his kernel work with to work.  My hesitation to
   suggestions from you to change it not to keep its own merge
   dircache is coming from here.  Not doing what I am currently
   doing to $W/,,merge/.git/index dircache would mean that SCM
   would have to do more, not less, to arrive at $C1 (the result
   of the clean $merge and $C merge pivoted at $base), where the
   real SCM merge begins.

Although I suspect I am misunderstanding what you want, your
messages so far suggest that what you want might be quite
different from what Linus wants.  Please do not misunderstand
what I mean by saying this.  I am not saying that Linus is
always right [*4*] and therefore you are wrong for wanting
something else.  It is just that, if what I started writing
needs to support both of those quite different needs, I need to
know what they are.  I think I understand what Linus wants well
enough [*5*], but I am not certain about yours.


[Footnotes]

*1* By "The user have run" I mean either the user directly used
the low-level plumbing command himself, or used git-tools to
cause such command to run.

*2* Strictly speaking, case (e) needs human validation as
well, because successful textual merge does not guarantee
sensible semantic merge.

*3* See [*R2*] for descriptions on the way Linus wanted merge
in git to happen.  Especially around "5) At this point you need
to MERGE" onwards.  The current implementation handles (or
attempts to handle) the `your working directory was fully
committed' case described there.

*4* According to Linus himself, he is always right ;-). [*R3*]

*5* I consider [*R1*] and [*R2*] essential read for anybody
wanting to understand merging operation in git object model (I
am saying this for others; not for Pasky --- it would be like
preaching to the choir ;-)).


[References]

*R1* <Pine.LNX.4.58.0504110928360.1267@ppc970.osdl.org>
http://marc.theaimsgroup.com/?i=%3CPine.LNX.4.58.0504110928360.1267%20()%20ppc970%20!%20osdl%20!%20org%3E

*R2* <Pine.LNX.4.58.0504121606580.4501@ppc970.osdl.org> 
http://marc.theaimsgroup.com/?i=%3CPine.LNX.4.58.0504121606580.4501%20()%20ppc970%20!%20osdl%20!%20org%3E

*R3*
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0008.3/0555.html


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15  6:28                                 ` Christopher Li
@ 2005-04-15 11:11                                   ` Junio C Hamano
       [not found]                                     ` <7vaco0i3t9.fsf_-_@assigned-by-dhcp.cox.net>
  0 siblings, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 11:11 UTC (permalink / raw)
  To: Christopher Li, Linus Torvalds; +Cc: Petr Baudis, git

>>>>> "CL" == Christopher Li <git@chrisli.org> writes:

CL> Then do you emit the entry for it's parents directory?

In GIT object model, directory modes do not matter.  It is not
designed to record directories, and running "update-cache --add
foo" when foo is a directory fails.

The data model of GIT is that it associates file datablob to a
string called "pathname" that happen to contain slashes in them.
It is kinda wierd.  When you externalize it with checkout-cache,
these slashes are mapped to hierarchical UNIX filesystem paths,
relative to whereever you happened to run checkout-cache.  The
hierarchical "tree" representation in the GIT database was
started as just a space optimization thing.

CL> e.g. /foo/bar get created. foo doesn't exists. You have
CL> to create foo first. You don't have mode information for
CL> foo yet.

And you will never have that information, since it is not
recorded anywhere.  If I say you should have foo/bar (by the
way, no leading slashes are placed in the dircache either), and
if it so happens that you do not have foo yet, you'd better
create one without waiting to be told, because I will never tell
you to just create a directory.

By the way, Linus, while I was studying how the new hierarchical
trees are written out, I think I have found one small funny (I
would not call this a *bug*) there.  Here is an excerpt from
write-tree (around ll. 56; I am basing on pasky-0.4 so your line
numbers may have some offsets):

        sha1 = ce->sha1;
        mode = ntohl(ce->st_mode);

        /* Do we have _further_ subdirectories? */
        filename = pathname + baselen;
        dirname = strchr(filename, '/');
        if (dirname) {
                int subdir_written;

                subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
                nr += subdir_written;

                /* Now we need to write out the directory entry into this tree.. */
                mode |= S_IFDIR;
                pathlen = dirname - pathname;

                /* ..but the directory entry doesn't count towards the total count */
                nr--;
                sha1 = subdir_sha1;
        }

This code is going through a flat list of cache entries sorted
by pathnames.  The list is flat in the sense that the pathnames
are like "foo/bar" i.e. with slashes inside.  The if() statement
there, upon seeing "foo/bar", slurps all the entries in foo/
subhierarchy and writes into a separate tree, recursively, to
"represent" foo/.

Notice what mode the "tree" object gets in this case?  File mode
for foo/bar (or whatever happens to be sorted the first among
the stuff in dircache from foo/ directory) ORed with S_IFDIR.  I
think this is nonsense, and we should just store constant
S_IFDIR.

Another option, probably better from the SCM purist's POV, would
be to start recording directories in dircaches, so that people
can actually keep track of directory modes.  Does it matter? ---
I would say not.  GIT does not have to be tar or cpio.  


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15  9:14                       ` Merge with git-pasky II David Woodhouse
  2005-04-15  9:36                         ` Ingo Molnar
@ 2005-04-15 12:03                         ` Johannes Schindelin
  2005-04-15 10:22                           ` Theodore Ts'o
  2005-04-15 14:53                         ` Linus Torvalds
  2 siblings, 1 reply; 148+ messages in thread
From: Johannes Schindelin @ 2005-04-15 12:03 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, Junio C Hamano, Petr Baudis, git

Hi,

On Fri, 15 Apr 2005, David Woodhouse wrote:

> On Thu, 2005-04-14 at 11:36 -0700, Linus Torvalds wrote:
> > And "merge these two trees" (which works on a _tree_ level)
> > or "find the common commit" (which works on a _commit_ level)
>
> I suspect that finding the common commit is actually a per-file thing;
> it's not just something you do for the _commit_ graph, then use for
> merging each file in the two branches you're trying to merge.

I disagree. In order to be trusted, this thing has to catch the following
scenario:

Skywalker and Solo start from the same base. They commit quite a lot to
their trees. In between, Skywalker commits a tree, where the function
"kazoom()" has been added to the file "deathstar.c", but Solo also added
this function, but to the file "moon.c". A file-based merge would have no
problem merging each file, such that in the end, "kazoom()" is defined
twice.

The same problems arise when one tries to merge line-wise, i.e. when for
each line a (possibly different) merge-parent is sought.

The concept here is a *transaction*: when going from one tree to the next
tree via a commit, a sort of integrity is maintained, which is breached
when only looking at files and commits.

Ciao,
Dscho


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15  9:14                       ` Merge with git-pasky II David Woodhouse
  2005-04-15  9:36                         ` Ingo Molnar
  2005-04-15 12:03                         ` Johannes Schindelin
@ 2005-04-15 14:53                         ` Linus Torvalds
  2005-04-15 15:29                           ` David Woodhouse
  2005-04-15 15:54                           ` Paul Jackson
  2 siblings, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15 14:53 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Junio C Hamano, Petr Baudis, git



On Fri, 15 Apr 2005, David Woodhouse wrote:
> 
> I suspect that finding the common commit is actually a per-file thing;
> it's not just something you do for the _commit_ graph, then use for
> merging each file in the two branches you're trying to merge.

I disagree.

Conceptually, you should never do _anything_ on a file level. Why? Because
individual files don't matter. You shouldn't merge two files cleanly just
because they look fine - they _depend_ on the other files in the archive, 
and that's quite fundamentally why per-file tracking is really wrong from 
a project standpoint.

So if you can't merge two files cleanly because the "project" history 
ended up being further back than the "file" history, then that's a _good_ 
thing. You don't know what the hell happened to the other files that this 
file depended on. Merging one file independently of the others is WRONG.

Also, I suspect that you'll find that if you do cross-merges, you'll 
basically always end up in:

> (I think it's a coincidence that in my example the useful files 'A2' and
> 'B2' actually do end up in a single tree together at some point.)

nope, I don't think that's coincidence. I think that's the normal case. 
Your file-based history is the one that can _incorrectly_ and 
coincidentally happen to have a single file at some point, but since that 
file doesn't stand alone, that's really not a fundamentally good reason to 
merge it.

Really, this "individual files matter" approach is a _disease_. They 
don't. Individual files DO NOT EXIST. Files always exist as part of the 
project, and the _only_ time you track a single file is when the project 
is a single file (and then that will be very very obvious in a git 
archive, thank you very much).

So the single-file mentality is a disease brought on by decades of _crap_. 
And by the fact that it ends up limiting the problem scope, so you can do 
certain things easier.

For example, just doing intra-file diffs is a lot _easier_ and less 
time-consuming than doing inter-file diffs. Bit it is _absolutely_ not 
better. In fact, it is clearly inferior to anybody who spends even five 
seconds thinking about it - yet we still do it, because of the historical 
(and INCORRECT) mindset that "files matter".

Files DO NOT matter. Never have. It's an implementation limitation to 
think they do. You'll screw yourself up, and when somebody comes up with a 
half-way efficient way to generate inter-fiel diffs, your architecture is 
totally and utterly unable to handle it.

I don't care what you do at an SCM level, and if the crud you put on top
of git wants to perpetuate mistakes of yesteryear, that's _your_ issue.  
But dammit, git is designed to do the right thing, and I will fight tooth
and nail against anybody who thinks individual files matter.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 10:05                           ` David Woodhouse
@ 2005-04-15 14:53                             ` Ingo Molnar
  2005-04-15 15:09                               ` David Woodhouse
  0 siblings, 1 reply; 148+ messages in thread
From: Ingo Molnar @ 2005-04-15 14:53 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, Junio C Hamano, Petr Baudis, git


* David Woodhouse <dwmw2@infradead.org> wrote:

> On Fri, 2005-04-15 at 11:36 +0200, Ingo Molnar wrote:
> > do such cases occur frequently? In the kernel at least it's not too 
> > typical. 
> 
> Isn't it? I thought it was a fairly accurate representation of the 
> process "I make a whole bunch of changes to files I maintain, pulling 
> from Linus while occasionally asking him to pull from my tree. 
> Sometimes my files are changed by someone else in Linus' tree, and 
> sometimes I change files that I don't actually own.".

but the specific scenario you described would require _Linus'_ tree to 
be in limbo for a long time, and have uncommitted half-done edits. I.e.:

   (A1B2)--(A2B2)--(A2'B3)
    /  \   /            \
   /    \ /              \
 (A1B1)  X               (...)
   \    / \              /
    \  /   \            /
   (A2B1)--(A2B2)--(A3B2')

in the above scenario Linus' tree needs to 'cross' with a maintainer's 
tree.  (maintainer's tree wont cross with another maintainer's tree, as 
maintainer-to-maintainer merges rare.)

but for the scenario to occur, i think there needs to be a prolongued 
"limbo" period in Linus' tree for a 'crossing' to happen. But Linus' 
merges are typically almost atomic: they are done then they are pushed 
out. It's definitely not in the 'days, sometimes weeks' timescale as 
maintainer trees are.

so for the scenario to occur, a maintainer, from whom Linus has just 
pulled an update and Linus is merging the tree manually without 
comitting, has to pull a file from the earlier Linus tree, and then 
Linus has to modify that same file again. This does not seem to be a 
common scenario.

so i think to avoid the scenario, maintainers should not pull from each 
other - they should only pull/push to/from Linus' tree. Maybe this is an 
unacceptable limitation?

	Ingo

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 14:53                             ` Ingo Molnar
@ 2005-04-15 15:09                               ` David Woodhouse
  0 siblings, 0 replies; 148+ messages in thread
From: David Woodhouse @ 2005-04-15 15:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Junio C Hamano, Petr Baudis, git

On Fri, 2005-04-15 at 16:53 +0200, Ingo Molnar wrote:
> but the specific scenario you described would require _Linus'_ tree to
> be in limbo for a long time, and have uncommitted half-done edits.
> I.e.:
> 
>    (A1B2)--(A2B2)--(A2'B3)
>     /  \   /            \
>    /    \ /              \
>  (A1B1)  X               (...)
>    \    / \              /
>     \  /   \            /
>    (A2B1)--(A2B2)--(A3B2')
> 
> in the above scenario Linus' tree needs to 'cross' with a maintainer's
> tree.  (maintainer's tree wont cross with another maintainer's tree,
> as maintainer-to-maintainer merges rare.)

Is that true? Consider (A2B1) to be a bugfixes-only tree which I make
available for Linus to pull from. I keep doing more experimental stuff
in my own private copy of the tree along the bottom branch, while Linus
_eventually_ responds to my pull request and moves on, stopping only to
add a 'static' to one of my new functions. I move on too but don't pull
from Linus again for a little while; the final merge happens when I _do_
pull again.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 14:53                         ` Linus Torvalds
@ 2005-04-15 15:29                           ` David Woodhouse
  2005-04-15 15:51                             ` Linus Torvalds
  2005-04-15 15:54                           ` Paul Jackson
  1 sibling, 1 reply; 148+ messages in thread
From: David Woodhouse @ 2005-04-15 15:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git

On Fri, 2005-04-15 at 07:53 -0700, Linus Torvalds wrote:
> Files DO NOT matter. Never have. It's an implementation limitation to 
> think they do. You'll screw yourself up, and when somebody comes up with a 
> half-way efficient way to generate inter-fiel diffs, your architecture is 
> totally and utterly unable to handle it.
> 
> I don't care what you do at an SCM level, and if the crud you put on top
> of git wants to perpetuate mistakes of yesteryear, that's _your_ issue.  
> But dammit, git is designed to do the right thing, and I will fight tooth
> and nail against anybody who thinks individual files matter.

No, really: individual files _DO_ matter. There's a reason we split
stuff up into separate files, and if you look closely you'll find that
we don't just randomly put different functions into different files with
neither rhyme nor reason -- there's a pattern to it; usually some kind
of functional grouping.

And when I'm looking for the change that broke something, I can almost
always tell which file it's in and go looking in _that_ file. It's a
_whole_ lot easier to use the equivalent of 'bk revtool' than it is to
sift through all the unrelated commits in the whole tree. If that's an
implementation limitation, then it's an implementation limitation in my
_brain_ not just in my tools.

OK, in fact it shouldn't be 'show me the history of this file'; it's
often really 'show me the history of this function' which I want. But
that's fine. All I'm suggesting is that we should include the metadata
which says "content moved from file XXX to file YYY" along with the
commit objects.

I'm certainly not suggesting that we should implement jejb's idea of
explicit 'file revision history' objects -- the tree-based philosophy is
perfectly sane and sufficient. But we do _also_ need a little
information which allows us to track content as it moves around within
the tree, and the SCM has to have a sane way to filter out the noise
when we're looking for what broke. Yes, that's part of the SCM
functionality, and can live in an xattr-type field in the commit object
-- but it does need to be stored, and in practice I suspect it _will_ be
useful for merging too.

It's not about ditching the per-tree tracking and doing per-file
tracking instead. I agree that would be wrong. It's about storing enough
information to track what happened to given content as it moved around
within the tree.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 10:02                           ` David Woodhouse
@ 2005-04-15 15:32                             ` Linus Torvalds
  2005-04-15 16:01                               ` David Woodhouse
                                                 ` (3 more replies)
  0 siblings, 4 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15 15:32 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Junio C Hamano, Petr Baudis, git



On Fri, 15 Apr 2005, David Woodhouse wrote:
> 
> And you're right; it shouldn't have to be for renames only. There's no
> need for us to limit it to one "source" and one "destination"; the SCM
> can use it to track content as it sees fit.

Listen to yourself, and think about the problem for a second.

First off, let's just posit that "files" do not matter. The only thing
that matters is how "content" moved in the tree. Ok? If I copy a function
from one fiel to another, the perfect SCM will notice that, and show it as
a diff that removes it from one file and adds it to another, and is
_still_ able to track authorship past the move. Agreed?

Now, you basically propose to put that information in the "commit" log, 
and that's certainly valid. You can have the commit log say "lines 50-89 
in file kernel/sched.c moved to lines 100-139 in kernel/timer.c", and then 
renames fall out of that as one very small special case.

You can even say "lines 50-89 in file kernel/sched.c copied to.." and 
allow data to be tracked past not just movement, but also duplication.

Do you agree that this is kind of what you'd want to aim for? That's a 
winning SCM concept.

How do you think the SCM _gets_ at this information? In particular, how 
are you proposing that we determine this, especially since 90% of all 
stuff comes in as patches etc? 

You propose that we spend time when generating the tree on doing so. I'm 
telling you that that is wrong, for several reasons:

 - you're ignoring different paths for the same data. For example, you 
   will make it impossible to merge two trees that have done exactly the 
   same thing, except one did it as a patch (create/delete) and one did it 
   using some other heuristic.

 - you're doing the work at the wrong point. Doing it _well_ is quite 
   expensive. So if you do it at commit time, you cannot _afford_ to do it 
   well, and you'll always fall back to doing an ass-backwards job that 
   doesn't really get you to the good state, and only gets you to a 
   not-very-interesting easy 1% of the solution (ie full file renames).

 - you're doing the work at the wrong point for _another_ reason. You're 
   freezing your (crappy) algorithm at tree creation time, and basically 
   making it pointless to ever create something better later, because even 
   if hardware and software improves, you've codified that "we have to
   have crappy information".

Now, look at my proposal: 

 - the actual information tracking tracks _nothing_ but information. You 
   have an SCM that tracks what changed at the only level that really 
   matters, namely the whole project. None of the information actually 
   makes any sense at all at a smaller granularity, since by definition, a
   "project" depends on the other files, or it wouldn't be a project, it
   would be _two_ projects or more.

 - When you're interested in the history of the information, you actually 
   track it, and you try to be _intelligent_ about it. You can actually do 
   a HELL of a lot better than whet you propose if you go the extra mile. 
   For example, let's say that you have a visualization tool that you can 
   use for finding out where a line of code came from. You start out at 
   some arbitrary point in the tree, and you drill down. That's how it 
   works, right?

   So how do you drill down? You simply go backwards in history for that 
   project, tracking when that file+line changed (a "file+line" thing is 
   actually a "sensible" tracking unit at this point, because it makes
   sense within the query you're doing - it's _not_ a sensible thing to
   track at "commit" time, but when you ask yourself "where did this line
   come from", that _question_ makes it sensible. Also note that "where 
   did this _file_ come from is not a sensible question, since the file 
   may have been the combination (or split) of several files, so there is
   no _answer_ to that question"

   So the question then becomes: "how can you reasonably _efficiently_
   find the history of one particular line", and in fact it turns out that 
   by asking the question that way, it's pretty obvious: now that you
   don't have to track the whole repository, you can always try to 
   minimize the thing you're looking for.

   So what you do is walk back the history, and look at the tree objects 
   (both sides when you hit a merge), eand see if that file ever changes. 
   That's actually a very efficient operation in GIT - it matches
   _exactly_ how git tracks things anyway. So it's not expensive at all.

   When that file changes, you need to look if that _line_ changed (and 
   here is where it comes down to usability: from a practical standpoint
   you probably don't care about a single line, you really _probably_ want
   to see changes around it too). So you diff the old state and the new 
   state, and you see if you can still find where you were. If you still 
   can, and the line (and a few lines around it) is still the same, you 
   just continue to drill down. So that's not the interesting case.

   So what happens when you found "ok, that area changed"? Your 
   visualization tool now shows it to the user, AND BECAUSE IT SEES THE 
   WHOLE TREE DIFF, it also shows where it probably came from. At _that_ 
   point, it is actually very trivial to use a modest amount of CPU time, 
   and look for probable sources within that diff. You can do it on modern 
   hardware in basically no time, so your visualization tool can actually 
   notice that
 
	"oops, that line didn't even exist in the previous version, BUT I
	 FOUND FIVE PLACES that matched almost perfectly in the same diff,
	 and here they are"

   and voila, your tool now very efficiently showed the programmer that
   the source of the line in question was actually that we had merged 5 
   copies of the same code in different archtiectures into one common
   helper function.

   And if you didn't find some source that matched, or if the old file was
   actually very similar around that line, and that line hadn't been
   "totally new"? That's the easy case again - you show the programmer the
   diff at that point in time, and you let him decide whether that diff 
   was what he was looking for, or whether he wants to continue to "zoom
   down" into the history.

The above tool is (a) fairly easy to write for git (if you can do 
visualization tools and (b) _exactly_ what I think most programmers 
actually want. Tell me I'm wrong. Honestly..

And notice? My clearly _superior_ algorithm never needed any rename
information at all. It would have been a total waste of time. It would
also have hidden the _real_ pattern, which was that a piece of code was
merged from several other matching pieces of code into one new helper
function. But if it _had_ been a pure rename, my superior tool would have
trivially found that _too_. So rename infomation really really doesn't
matter.

So I'm claiming that any SCM that tries to track renames is fundamentally
broken unless it does so for internal reasons (ie to allow efficient
deltas), exactly because renames do not matter. They don't help you, and 
they aren't what you were interested in _anyway_.

What matters is finding "where did this come from", and the git
architecture does that very well indeed - much better than anything else
out there. I outlined a simple algorithm that can be fairly trivially
coded up by somebody who really cares. Sure, pattern matching isn't
trivial, but you start out with just saying "let's find that exact line,
and two lines on each side", and then you start improving on that.

And that "where did this come from" decision should be done at _search_ 
time, not commit time. Because at that time it's not only trivial to do, 
but at that time you can _dynamically_ change your search criteria. For 
example, you can make the "match" algorithm be dependent on what you are 
looking at.

If it's C source code, it might want to ignore vairable names when it
searches for matching code. And if it's a OpenOffice document, you might
have some open-office-specific tools to do so. See? Also, the person doing 
the searches can say whether he is interested in that particular line (or 
even that particial _identifier_ on a line), or whether he wants to see 
the changes "around" that line.

All of which are very valid things to do, and all of which my world-view
supports very well indeed. And all of which your pitiful "files matter" 
world-view totally doesn't get at all.

In other words, I'm right. I'm always right, but sometimes I'm more right 
than other times. And dammit, when I say "files don't matter", I'm really 
really Right(tm).

Please stop this "track files" crap. Git tracks _exactly_ what matters, 
namely "collections of files". Nothing else is relevant, and even 
_thinking_ that it is relevant only limits your world-view. Notice how the 
notion of CVS "annotate" always inevitably ends up limiting how people use 
it. I think it's a totally useless piece of crap, and I've described 
something that I think is a million times more useful, and it all fell out 
_exactly_ because I'm not limiting my thinking to the wrong model of the 
world.

			Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 15:29                           ` David Woodhouse
@ 2005-04-15 15:51                             ` Linus Torvalds
  0 siblings, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15 15:51 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Junio C Hamano, Petr Baudis, git



On Fri, 15 Apr 2005, David Woodhouse wrote:
> 
> And when I'm looking for the change that broke something, I can almost
> always tell which file it's in and go looking in _that_ file.

Read my email about finding "what changed" that I sent out a minute ago.

I claim that my algorithm for finding "what changed" handles your "single
file" case as a very small (and usually quite uninteresting) special case.

I claim (and if you just look at my proposal I think you'll agree) that I
can track single functions, and do it efficiently. WITHOUT adding any
meta-data at all.

The thing is, if the question is "I have this piece of code, and I want to 
see what changed", you fundamentally _can_ do that efficiently. That's 
really what git was designed for. It's the whole _point_ of having history 
in the first place. If git didn't care, it wouldn't have a back-pointer to 
the tree it came from, and we'd all be just merging pure trees.

But you mix that question up with "how do I save that information in the 
commit", which is a totally unnecessary mix-up, and which makes things 
MUCH more complicated, for absolutely zero gain.

In fact, because you mixed up those two issues, the problem now became so
complicated that you can no longer solve it, so you start doing hacks like
"the user has to tell us what he did" (aka "bk mv" or "svn rename"), and 
you start mentally to limit yourself to files, because you realize that 
you _have_ to limit your intractable problem to make it at all solvable.

And I'm telling you that your problem is STUPID. You made it stupid by 
thinking that every question about the source tree should be answered at 
commit time. Which just clearly isn't true!

If you just drop the tying-together, and accept that "what changed" is a
valid question _regardless_ of trying to track it at commit time, now your
whole world opens up. Birds sing, the sun is shining on you, and beautiful
scantily clad women (or men) dance around you. The world is suddenly a 
good place, just _filled_ with possibilities.

Suddenly you realize that if the question is just "what changed in this
piece of code" (and let's face it, that _is_ the question), you can track 
it afterwards. Trying to tie in "commit time" into the question was what 
made it hard. If you do _not_ due that (totally unnecessary) tie-in, the 
question suddenly becomes easy to answer, and several obvious and simple 
answers spring to mind pretty immediately.

> It's not about ditching the per-tree tracking and doing per-file
> tracking instead. I agree that would be wrong. It's about storing enough
> information to track what happened to given content as it moved around
> within the tree.

No. Git absolutely does have everything you need already. You just aren't 
realizing that it's already there - in the data - and that you can do much 
more intelligent searches for changes if you accept that undeniable fact.

The fact that you can NOT do those searches at commit-time (which is a 
global op), and can only do them if you have a specific question in mind 
("what changed _here_"), is the big issue.

The thing is, at commit-time you'd need to answer every possible question
("what changed here, and here, and here, and in this function, and in this
file, and in this directory and why did this identifier get renamed and 
why is the sky blue"). AND YOU FUNDAMENTALLY CANNOT DO THAT. It's 
impossible.

But once you _know_ the question (which is the only time when the answer
is actually relevant, so why care about if before that time?), you can
find out the answer by just automating the job of looking at the _data_. 
It's easy. The question makes it obvious by its nature. The question is 
the thing that gives you the specifics that makes the search possible in 
the first place.

And _this_ is why the data matters. Renames and file boundaries do not.
And until you accept that, you just limit yourself.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 14:53                         ` Linus Torvalds
  2005-04-15 15:29                           ` David Woodhouse
@ 2005-04-15 15:54                           ` Paul Jackson
  2005-04-15 16:30                             ` C. Scott Ananian
  1 sibling, 1 reply; 148+ messages in thread
From: Paul Jackson @ 2005-04-15 15:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dwmw2, junkio, pasky, git

Linus wrote:
> For example, just doing intra-file diffs is a lot _easier_ and less 
> time-consuming than doing inter-file diffs. 

Um ah ... could you explain what you mean by inter and intra file diffs?

Google found a three year old message by Andrew Morton, discussing
inter and intra file fragmentation on ext2/ext3 file systems and the
find_group_dir() routine.  I don't think that's what you had in mind ;).

When I run the 'diff' command, it usually between two files, not between
two parts of a file.  So I'd have thought inter file diffs were easier.

Clearly, I don't git it.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 15:32                             ` Linus Torvalds
@ 2005-04-15 16:01                               ` David Woodhouse
  2005-04-15 16:31                                 ` C. Scott Ananian
  2005-04-16 15:33                                 ` Johannes Schindelin
  2005-04-15 19:20                               ` Paul Jackson
                                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 148+ messages in thread
From: David Woodhouse @ 2005-04-15 16:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git

On Fri, 2005-04-15 at 08:32 -0700, Linus Torvalds wrote:
>  - you're doing the work at the wrong point. Doing it _well_ is quite 
>    expensive. So if you do it at commit time, you cannot _afford_ to do it 
>    well, and you'll always fall back to doing an ass-backwards job that 
>    doesn't really get you to the good state, and only gets you to a 
>    not-very-interesting easy 1% of the solution (ie full file renames).
> 
>  - you're doing the work at the wrong point for _another_ reason. You're 
>    freezing your (crappy) algorithm at tree creation time, and basically 
>    making it pointless to ever create something better later, because even 
>    if hardware and software improves, you've codified that "we have to
>    have crappy information".

OK, I'm inclined to agree. The only thing that prevents me from
capitulating entirely and resubscribing to the "Torvalds is always
right" school is the concern that it _is_ expensive, and that's why I
originally wanted to do it at commit time because then it's a one-off
cost rather than recurring every time we want to track the history of a
given piece of content. Also because we actually have the developer's
attention at commit time, and we can get _real_ answers from the user
about what she was doing, instead of having to guess.

But if it can be done cheaply enough at a later date even though we end
up repeating ourselves, and if it can be done _well_ enough that we
shouldn't have just asked the user in the first place, then yes, OK I
agree.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: ls-tree enhancements
  2005-04-15  2:21                       ` [Patch] ls-tree enhancements Junio C Hamano
@ 2005-04-15 16:13                         ` Petr Baudis
  2005-04-15 18:25                           ` Junio C Hamano
  0 siblings, 1 reply; 148+ messages in thread
From: Petr Baudis @ 2005-04-15 16:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Dear diary, on Fri, Apr 15, 2005 at 04:21:30AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> +static void _usage(void)
> +{
> +	usage("ls-tree [-r] [-z] <key>");
> +}

(namespace-nazi-hat
 This infriges the system namespaces. FWIW, I prefer to add the
underscore at the end of the identifier if wanting to do stuff like
this. Or just call it my_usage().
)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 15:54                           ` Paul Jackson
@ 2005-04-15 16:30                             ` C. Scott Ananian
  2005-04-15 18:29                               ` Paul Jackson
  0 siblings, 1 reply; 148+ messages in thread
From: C. Scott Ananian @ 2005-04-15 16:30 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Linus Torvalds, dwmw2, junkio, pasky, git

On Fri, 15 Apr 2005, Paul Jackson wrote:

> Um ah ... could you explain what you mean by inter and intra file diffs?

intra file diffs: here are two versions of the same file.  what changed? 
inter file diffs: here is a new file, and here are *all the files in the 
current committed version*.  Where did the contents of this new file come 
from?  (Note that the new file is often a slightly changed version of an 
existing file in the current committed version.  But we don't assume that 
must be true.)
  --scott

supercomputer Pakistan WSHOOFS SECANT LCPANGS SDI assassination ZPSECANT 
SEQUIN AEBARMAN ESCOBILLA bomb mustard STANDEL ESGAIN Nazi FJDEFLECT
                          ( http://cscott.net/ )

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 16:01                               ` David Woodhouse
@ 2005-04-15 16:31                                 ` C. Scott Ananian
  2005-04-15 17:11                                   ` Linus Torvalds
  2005-04-16 15:33                                 ` Johannes Schindelin
  1 sibling, 1 reply; 148+ messages in thread
From: C. Scott Ananian @ 2005-04-15 16:31 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, Junio C Hamano, Petr Baudis, git

On Fri, 15 Apr 2005, David Woodhouse wrote:

> given piece of content. Also because we actually have the developer's
> attention at commit time, and we can get _real_ answers from the user
> about what she was doing, instead of having to guess.

Yes, but it's still hard to get *accurate* information.  And developers 
tend to use very short commit messages already...

> But if it can be done cheaply enough at a later date even though we end
> up repeating ourselves, and if it can be done _well_ enough that we
> shouldn't have just asked the user in the first place, then yes, OK I
> agree.

I think examining the rsync algorithms should convince you that finding 
common chunks can be fairly efficient.  (See my next message for a more 
concrete proposal.)
  --scott

Rijndael AMLASH Moscow Ft. Bragg shotgun HTKEEPER SHERWOOD overthrow 
Uzi anthrax Yeltsin Indonesia Suharto LITEMPO Dictionary Yakima KUBARK
                          ( http://cscott.net/ )

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 16:31                                 ` C. Scott Ananian
@ 2005-04-15 17:11                                   ` Linus Torvalds
  0 siblings, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15 17:11 UTC (permalink / raw)
  To: C. Scott Ananian; +Cc: David Woodhouse, Junio C Hamano, Petr Baudis, git



On Fri, 15 Apr 2005, C. Scott Ananian wrote:
> 
> I think examining the rsync algorithms should convince you that finding 
> common chunks can be fairly efficient.

Note that "efficient" really depends on how good a job you want to do, so 
you can tune it to how much CPU you can afford to waste on the problem.

For example, my example had this thing where we merged five different
functions into one function, and it is truly pretty efficient to find
things like that _IF_ we only look at the files that changed (since the
set of files that change in any one particular commit tends to be small,
relative to the whole repository). There are many good algorithms for 
finding "common code", and with modern hardware that is basically 
instantaneous if you look at a few tens of files.

For example, people wrote efficient things to compare _millions_ of lines 
of code for the whole SCO saga - you can do quite well. Some googling 
comes up with for example

	http://minnie.tuhs.org/Programs

and applying those to a smallish set of files is quite efficient.

What is _not_ necessarily as easy is the situation where you notice that a 
new set of lines appeared, but you don't see any place that matches that 
set of lines in the set of CHANGED files. That's actually quite common, ie 
let's say that you have a new filesystem or a new driver, and almost 
always it's based on a template or something, and you _would_ be able to 
see where that template came from, except it's not in that _changed_ set.

And that is still doable, but now you really have to compare against the
whole tree if you want to do it. Even _that_ is actually efficient if you
cache the hashes - that's how the comparison tools compare two totally
independent trees against each other, and it makes it practically possible
to do even that expensive O(n**2) operation in reasonable time. It's
certainly possible to do exactly the same thing for the "new code got
added, does it bear any similarity to old code" case.

Note! This is a question that is relevant and actually is in the realm of
the "possible to find the answer interactively".  It may fairly expensive, 
but the point is that this is the kind of relevant question that really 
does depend on the fundamental notion that "data matters more than any 
local changes". And when you think about the problem in that form, you 
find these kinds of interesting questions that you _can_ answer.

Because the way git identifies data, the example "is there any other
relevant code that may actually be similar to the newly added code" is
actually not that hard to do in git. Remember: the way to answer that
question is to have a cache of hashes of the contents. Guess what git
_is_? You can now index your line-based hashes of contents against the
_object_ hashes that git keeps track of, and you suddenly have an
efficient way to actually look up those hashes.

NOTE! All of this is outside the scope of git itself. This is all
"visualization and comparison tools" built up on top of git. And I'm not
at all interested in writing those tools myself, and I'm absolutely not
signing up for that part. All I'm arguing for is that the git architecture
is actually a very good architecture for doing these kinds of very very
cool tools.

			Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: ls-tree enhancements
  2005-04-15 16:13                         ` Petr Baudis
@ 2005-04-15 18:25                           ` Junio C Hamano
  0 siblings, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 18:25 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, git

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

>> +static void _usage(void)

PB>  This infriges the system namespaces. FWIW, I prefer to add the
PB> underscore at the end of the identifier if wanting to do stuff like
PB> this. Or just call it my_usage().

Thanks.  My bad.  Noted.



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 16:30                             ` C. Scott Ananian
@ 2005-04-15 18:29                               ` Paul Jackson
  0 siblings, 0 replies; 148+ messages in thread
From: Paul Jackson @ 2005-04-15 18:29 UTC (permalink / raw)
  To: C. Scott Ananian; +Cc: torvalds, dwmw2, junkio, pasky, git

> intra file diffs: here are two versions of the same file. 

Ah so.  Linus faked me out.

I was _sure_ that by "file" he meant "file" -- as in a bucket of bits
with a unique identifying <sha1>.

In that message, I guess by "file" he meant "a version controlled
file, consisting of a series of content versions and meta-data"

That's what I get for trusting Linus to always speak as a kernel
hacker, not an SCM hacker.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: write-tree is pasky-0.4
       [not found]                                     ` <7vaco0i3t9.fsf_-_@assigned-by-dhcp.cox.net>
@ 2005-04-15 18:44                                       ` Linus Torvalds
  2005-04-15 18:56                                         ` Petr Baudis
  2005-04-15 20:10                                         ` Junio C Hamano
  0 siblings, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15 18:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git



On Fri, 15 Apr 2005, Junio C Hamano wrote:
> 
> Linus, sorry for bothering you with a false alarm.  The problem
> turns out to be introduced in pasky-0.4 and does not exist in
> your HEAD.

Hey, all the code I write is always perfect, of course ;)

That said, I'm having some trouble merging with your perfect code,
especially since I decided that Russell's "always big-endian" thing was
definitely the right way to go (but ended up doing it slightly
differently).

I did my own version of "upcate-cache --cacheinfo", although mine is a bit 
more anal, and if you add a new filename it wants that "--add" flag in 
there first (why? I really like to make sure that people who add or remove 
files from the cache say so explicitly, so that there are no surprises). 
Otherwise it should be compatible with yours.

And I merged your "Add -z option to show-files", but you had based your 
other patches on Petr's tree which due to my other changes is not going to 
merge totally cleanly with mine, so I'm wondering if you might want to try 
to re-merge your mergepoint stuff against my current tree? That way I can 
continue to maintain a set of "core files", and Pasky can maintain the 
"usable interfaces" part..

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: write-tree is pasky-0.4
  2005-04-15 18:44                                       ` write-tree is pasky-0.4 Linus Torvalds
@ 2005-04-15 18:56                                         ` Petr Baudis
  2005-04-15 20:13                                           ` Linus Torvalds
  2005-04-15 20:10                                         ` Junio C Hamano
  1 sibling, 1 reply; 148+ messages in thread
From: Petr Baudis @ 2005-04-15 18:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Dear diary, on Fri, Apr 15, 2005 at 08:44:02PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> And I merged your "Add -z option to show-files", but you had based your 
> other patches on Petr's tree which due to my other changes is not going to 
> merge totally cleanly with mine, so I'm wondering if you might want to try 
> to re-merge your mergepoint stuff against my current tree? That way I can 
> continue to maintain a set of "core files", and Pasky can maintain the 
> "usable interfaces" part..

Actually, I wanted to ask about this. :-)

So, I assume that you don't want to merge my "SCM layer" (which is
perfectly fine by me). However, I also apply plenty of patches
concerning the "core git" - be it portability, leak fixes, argument
parsing fixes and so on.

Would it be of any benefit if I maintained two trees, one with just your
core git but what I merge (I think I'd call this branch git-pb), and one
with my git-pasky (to be renamed to Cogito) layer. I'd then put the
"core git" changes to the git-pb branch and pull from it to the Cogito
branch regularily, but it should be safe for you to pull from it too.

In fact, in that case I might even end up entirely separating the Cogito
tools from the core git and distributing them independently.

BTW, just out of interest, are you personally planning to use Cogito for
your kernel and sparse (and possibly even git) work, or will you stay
with your lowlevel plumbing for that?

Thanks,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 15:32                             ` Linus Torvalds
  2005-04-15 16:01                               ` David Woodhouse
@ 2005-04-15 19:20                               ` Paul Jackson
  2005-04-16  1:44                               ` Simon Fowler
  2005-04-16 20:29                               ` Sanjoy Mahajan
  3 siblings, 0 replies; 148+ messages in thread
From: Paul Jackson @ 2005-04-15 19:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dwmw2, junkio, pasky, git

These notions that one can always best answer questions by looking at
the content, and that "Individual files DO NOT EXIST" seem over stated,
to me.

Granted, overstated for a good reason.  A couple sticks of dynamite are
needed to shake loose some old SCM thinking habits.

===

Ingo has a point when he states:

> i believe the fundamental thing to think about is not file or line or 
> namespace, but 'tracking developer intent'.

He too overstates - it's not _the_ (as in one and only) thing.
But it's useful.  Given the traditional terseness of many engineers,
it's certainly not the _only_ thing.  The code speaks too.

===

The above two are related in this way.  Traditional SCM uses per
file (versioned controlled file, as in s.* or *,v files) metadata
to track 'developer intent'.

I'm afraid we are at risk for confusing baby (developer intent)
and bathwater (version controlled file structure of classic SCM's).

===

But we already have a pretty damn good way of tracking developer
intent that needs to fit naturally with whatever we build on top
of git.

     Mr. McGuire: I just want to say one word to you - just one word.
     Ben: Yes sir.
     Mr. McGuire: Are you listening?
     Ben: Yes I am.
     Mr. McGuire: 'Patches.'
     # the original word was 'Plastics' - The Graduate (1967)

Andrew and the other maintainers do a pretty good job of 'encouraging'
developers to provide useful statements of 'intent' in their patch
headers.

The patch series in something like *-mm, including per-patch
commentary, are a valuable part of this project.

===

I have not looked closely at what is being done here, on top of
git, for SCM like capabilities.  Hopefully the next two questions
are not too stupid:

 1) How do we track the patch header commentary?

 2) Why can't we have a quilt like SCM, not bk/rcs/cvs/sccs/... like?

For (2), anyone publishing a Linux source would periodically announce an
<sha1> value, attached to some name suitable for public consumption.

For example, sometime in the next month or so, Linus would announce
that the <sha1> of 2.6.12 is so-and-so.  That would identify the
result of applying a specific set of patches, resulting in a specific
source tree contents.  He would announce a few 2.6.12-rc* <sha1>'s
between now and then.

Between now and then, Andrew would (if using these tools) have published
several <sha1> values, one each for various 2.6.12-rc*-mm* versions.

If you explode such a <sha1> all out into a working directory, you get
both the source contents in the appropriately named files, and the
quilt-style patches subdirectory, of the patch series that gets you
here, starting from some Time Zero for that series of published kernel
versions.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-15  0:58                           ` Junio C Hamano
  2005-04-14 22:30                             ` Christopher Li
@ 2005-04-15 19:54                             ` Petr Baudis
  1 sibling, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-15 19:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Dear diary, on Fri, Apr 15, 2005 at 02:58:25AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> >> I think the above would result in what SCM person would call
> >> "merge upstream/sidestream changes into my working directory".
> 
> PB> And that's exactly what I'm doing now with git merge. ;-) In fact,
> PB> ideally the whole change in my scripts when your script is finished
> PB> would be replacing
> 
> PB> 	checkout-cache `diff-tree` # symbolic
> PB> 	git diff $base $merged | git apply
> 
> PB> with
> 
> PB> 	merge-tree.pl -b $base $(tree-id) $merged | parse-your-output
> 
> In the above I presume by $merged you mean the tree ID (or
> commit ID) the user's working directory is based upon?  Well,
> merge-trees (Linus has a single directory merge-tree already)
> looks at tree IDs (or commit IDs); it would never involve
> working files in random state that is not recorded as part of a
> tree (committed or not).  Given that constraints I am not sure
> how well that would pan out.  I have to think about this a bit.

No, $(tree-id) is the "destination branhc", what the user directory is
based upon; $merged is the branch you are merging now, relative to
$base. When I throw away the useless "-b" argument, in practice it would
look like

	merge-trees abcd 1234 5678

for doing

      /------ 1234 -+-
abcd <             /
      \------ 5678

(not that the order of 1234 and 5678 would actually really matter)

I fear I don't understand the rest of your paragraph. :-(

> I do like, however, the idea of separating the step of doing any
> checkout/merge etc. and actually doing them.  So the command set
> of parse-your-output needs to be defined.  Based on what I have
> done so far, it would consist of the following:
> 
>  - Result is this object $SHA1 with mode $mode at $path (takes
>    one of the trees); you can do update-cache --cacheinfo (if
>    you want to muck with dircache) or cat-file blob (if you want
>    to get the file) or both.
> 
>  - Result is to delete $path.
> 
>  - Result is a merge between object $SHA1-1 and $SHA1-2 with
>    mode $mode-1 or $mode-2 at $path.
> 
> Would this be a good enough command set?

What about the conflicts? Like one tree deleting, other tree modifying?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-14  8:06         ` Linus Torvalds
  2005-04-14  8:39           ` Junio C Hamano
@ 2005-04-15 19:57           ` Junio C Hamano
  2005-04-15 20:45             ` Linus Torvalds
  1 sibling, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Christopher Li, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> In the meantime I wrote a very stupid "merge-tree" which
LT> does things slightly differently, but I really think your
LT> approach (aka my original approach) is actually a lot
LT> faster. I was just starting to worry that the ball didn't
LT> start, so I wrote an even hackier one.

LT> ... This "one directory at a time with very explicit output"
LT> thing is much more down-to-earth, but it's also likely
LT> slower because it will need script help more often.

I was looking at merge-tree.c last night to add recursive
behaviour (my favorite these days ;-) to it [*1*].

But then I started thinking.

LT> ... For each entry in the directory it says either
LT> 	select <mode> <sha1> path
LT> or
LT> 	merge <mode>-><mode>,<mode> <sha1>-><sha1>,<sha1> path
LT> depending on whether it could directly select the right object or not.

Given that the case you are primarily interested in is the one
that affects only small parts of a huge tree (i.e. common kernel
merge pattern I understand from your previous messages), your
"hacky version" [*2*], extended for recursive operation, would
spit out 98% select and 2% merge, and probably the origin of
these selects are distributed across ancestor=90%, his=4%,
my=4%, or something similar.  Am I misestimating grossly?

Assuming I am correct in the above, this would not scale for a
huge project.  We need to cut down the number of "90% select"
part of the output to make it manageable.

I am thinking about:

 - adding recursive behaviour (I am almost done with this);

 - adding another command line argument to merge-tree.c, to 
   tell "do not output anything for the path if the resulting
   merge is the same as what is in this tree";

 - adding another output type, "delete" to make the output type
   repertoire these three:

    delete path
    select <mode> <sha1> path
    merge <mode>-><mode>,<mode> <sha1>-><sha1>,<sha1> path

When the user of the output of 

  $ merge-tree <ancestor-sha1> <my-sha1> <his-sha1> <result-base-sha1>

want to get a dircache populated with the merged result, he can:

  1. read-tree <result-base-sha1>
  2. for each output:
     a) "delete" -- delete path from dircache
     b) "select" -- register mode-sha1 at path
     c) "merge"  -- do the 3-way merge and register result at path

Do you think this is sensible?

The reason I have the separate <result-base-sha1> instead of
always using <ancestor-sha1> is because the user may be thinking
of patching an existing base which is different from "my" or
"his" or "ancestor" and doing it in place.  That way, probably
Pasky's SCM can use it to patch the dircache it creates in its
own ,,merge/ directory, which would most likely be initially
populated from the dircache in the user's working directory---
which may or may not match "my-sha1" if the user has uncommitted
update-cache there.

Pasky, do you think this is workable?  If so do you think this
would make your life easier?


[Footnotes]

*1* That's how I found the S_IFDIR problem (not in your tree but
in the copy I had).

*2* I did not find it quite "hacky".  It was a pleasant read.
Especially I liked "smaller()" part.



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: write-tree is pasky-0.4
  2005-04-15 18:44                                       ` write-tree is pasky-0.4 Linus Torvalds
  2005-04-15 18:56                                         ` Petr Baudis
@ 2005-04-15 20:10                                         ` Junio C Hamano
  2005-04-15 20:58                                           ` C. Scott Ananian
  2005-04-15 21:48                                           ` [PATCH 1/2] merge-trees script for Linus git Junio C Hamano
  1 sibling, 2 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 20:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Hey, all the code I write is always perfect, of course ;)

And you are always right ;-)  Liked that blast-from-the-past?

LT> That said, I'm having some trouble merging with your perfect code,
LT> especially since I decided that Russell's "always big-endian" thing was
LT> definitely the right way to go (but ended up doing it slightly
LT> differently).

LT> I did my own version of "upcate-cache --cacheinfo", although
LT> mine is a bit more anal, and if you add a new filename it
LT> wants that "--add" flag in there first (why? I really like
LT> to make sure that people who add or remove files from the
LT> cache say so explicitly, so that there are no surprises).
LT> Otherwise it should be compatible with yours.

Thanks.  Not honoring "--add" was an oversight on my part.

LT> And I merged your "Add -z option to show-files", but you had
LT> based your other patches on Petr's tree which due to my
LT> other changes is not going to merge totally cleanly with
LT> mine, so I'm wondering if you might want to try to re-merge
LT> your mergepoint stuff against my current tree?

My pleasure.  I am currently not that interested in toilet part
than I am interested in plumbing part, so rebasing my tree back
to yours is no problem for me.  Currently I see your HEAD is at
461aef08823a18a6c69d472499ef5257f8c7f6c8, so I will generate a
set of patches against it.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: write-tree is pasky-0.4
  2005-04-15 18:56                                         ` Petr Baudis
@ 2005-04-15 20:13                                           ` Linus Torvalds
  2005-04-15 22:36                                             ` Petr Baudis
  0 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15 20:13 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git



On Fri, 15 Apr 2005, Petr Baudis wrote:
> 
> So, I assume that you don't want to merge my "SCM layer" (which is
> perfectly fine by me). However, I also apply plenty of patches
> concerning the "core git" - be it portability, leak fixes, argument
> parsing fixes and so on.

I'm actually perfectly happy to merge your SCM layer too eventually, but 
I'm nervous at this point.  Especially while people are discussing some 
SCM options that I'm personally very leery of, and think that may make 
sense for others, but that I personally distrust.

> BTW, just out of interest, are you personally planning to use Cogito for
> your kernel and sparse (and possibly even git) work, or will you stay
> with your lowlevel plumbing for that?

I'm really really hoping I'd use cogito, and that it ends up being just 
one project. In particular, I'm hoping that in a few days, I'll have done 
enough plumbing that I don't even care any more, and then I'd not even 
maintain a tree of my own. 

I'm really not that much of an SCM guy. I detest pretty much all SCM's out
there, and while it's been interesting to do 'git', I've done it because I
was forced to, and because I really wanted to put _my_ needs and opinions
first in an SCM, and see how that works. That's why I've been so adamant
about having a "philosophy", because otherwise I'd probably just end up
with yet another SCM that I'd despise.

So for me, the "optimal" situation really ends up that you guys end up as
the maintainers. I don't even _want_ to maintain it, although I'd be more
than happy to be part of the engineering team. I just want to mark out the
direction well enough and get it to a point where I can _use_ it, that I
feel like I'm done.

But before I can do that, I need to feel like I can live with the end 
result. The only missing part is merges, and I think you and Junio are 
getting pretty close (with Daniel's parent finder, Junio's merger etc). 

			Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: Merge with git-pasky II.
       [not found] <000d01c541ed$32241fd0$6400a8c0@gandalf>
@ 2005-04-15 20:31 ` Linus Torvalds
  2005-04-15 23:00   ` Barry Silverman
  0 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15 20:31 UTC (permalink / raw)
  To: Barry Silverman; +Cc: git


[ I'm cc'ing the git list even though Barry's question wasn't cc'd. 
  Because I think his question is interesting and astute per se, even
  if I disagree with the proposal ]

On Fri, 15 Apr 2005, Barry Silverman wrote:
>
> If git is totally project based, and each commit represents total state
> of the project, then how important is the intermediate commit
> information between two states. 

You need it in order to do further merges.

> IE, Area maintainer has A1->A2->A3->A4->A5 in a repository with 5
> commits, and 5 comments. And I last synced with A1.
> 
> A few days later I sync again. Couldn't I just pull the "diff-tree A5
> A1" and then commit to my tree just the record A1->A5. Why does MY
> repository need trees A2,A3,A4?

Because that second merge needs the first merge to work well. The first 
merge might have had some small differences that ended up auto-merging (or 
even needing some manual help from you). The second time you sync, there 
migth be some more automatic merging. And so on.

If you don't keep track of the incremental merges, you end up with one 
really _difficult_ merge that may not be mergable at all. Not 
automatically, and perhaps not even with help.

So in order to keep things mergable, you need to not diverge. And the 
intermediate merges are the "anchor-points" for the next merge, keeping 
the divergences minimal. 

I'm personally convinced that one of the reasons CVS is a pain to merge is 
obviously that it doesn't do a good job of finding parents, but also 
exactly _because_ it makes merges so painful that people wait longer to do 
them, so you never end up fixing the simple stuff. In contrast, if you
have all these small merges going on all the time, the hope is that 
there's never any really painful nasty "final merge".

So you're right - the small merges do end up cluttering up the revision 
history. But it's a small price to pay if it means that you avoid having 
the painful ones.

> Isn't preserving the A1,A2,A3,A4,A5 a legacy of BK, which required all
> the changesets to be loaded in order, and so is a completely "file"
> driven concept? 

Nope. In fact, to some degree git will need this even _more_, since the
git merger is likely to be _weaker_ than BK, and thus more easily
confused.

I do believe that BK has these things for the same reason.

			Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-15 10:22                           ` Junio C Hamano
@ 2005-04-15 20:40                             ` Petr Baudis
  2005-04-15 22:41                               ` Junio C Hamano
  0 siblings, 1 reply; 148+ messages in thread
From: Petr Baudis @ 2005-04-15 20:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Dear diary, on Fri, Apr 15, 2005 at 12:22:26PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> After I re-read [*R1*], in which Linus talks about dircache,
> especially this section:
> 
>  - The "current directory cache" describes some baseline. In particular,
>    note the "some" part. It's not tied to any special baseline, and you
>    can change your baseline any way you please.
> 
>    So it does NOT have to track any particular state in either the object 
>    database _or_ in your actual current working tree. In fact, all real 
>    interactions with "git" are really about updating this staging area one 
>    way or the other: you might check out the state from it into your 
>    working area (partially or fully), you can push your working area into 
>    the staging area (again, partially or fully).
> 
>    And if you want to, you can write the thing that the staging area 
>    represents as a "tree" into the object database, or you can merge a 
>    tree from the object database into the staging area.
> 
>    In other words: the staging area aka "current directory cache" is 
>    really how all interaction takes place. The object database never 
>    interacts directly with your working directory contents. ALL 
>    interactions go through the current directory cache.
> 
> I started to have more doubts on the approach of *not*
> performing the merge in the dircache I set up specifically for
> merging, which is the direction in which you are pushing if I
> understand you correctly.  Maybe I completely misunderstand what
> you want.  This message is long but I need a clear understanding
> of what is expected to be useful to you, so please bear with me.



> PB> 	merge-tree.pl -b $base $(tree-id) $merged | parse-your-output
> 
> Please help me understand this example you have given earlier.
> Here is my understanding of your assumption when the above
> pipeline takes place.  Correct me if I am mistaken.
> 
>  * The user is in a working directory $W.  It is controlled by
>    git-tools and there are $W/.git/. directory and $W/.git/index
>    dircache.
> 
>  * The dircache $W/.git/index started its life as a read-tree
>    from some commit.  The git-tools is keeping track of which
>    commit it is somewhere, presumably in $W/.git/ directory.
>    Let's call it $C (commit).
> 
>  ? Question.  Is the $(tree-id) in your example the same as $C
>    above?

Yes. Actually $(tree-id) returns ID of the tree object, not the commit
object; but that doesn't matter here, probably - let's ignore that
distinction for simplicity.

>  * The user have run [*1*] (see Footnote below) checkout-cache
>    on $W/.git/index some time in the past and $W is full of
>    working files.  Some of them may or may not have modified.
>    There may be some additions or deletions.  So the contents of
>    the working directory may not match the tree associated with
>    $C.
> 
>  * The user may or may not have run [*1*] update-cache in $W.
>    The contents of the dircache $W/.git/index may not match the
>    tree associated with $C.
> 
>  ? Question.  Are you forbidding the user to run update-cache by
>    hand, and keeping track of the changes yourself, to be
>    applied all at once at "git commit" time, thereby
>    guaranteeing the $W/.git/index to match the tree associated
>    with $C all times?  From the description of The "GIT toolkit"
>    section in README, it is not clear to me which part of his
>    repository an end user is not supposed to muck with himself.

Ideally, he shouldn't be using *any* of the low-level plumbing by now.
The only exception is update-cache --refresh, which he can do at will
(I'm yet thinking what to do with it :-).

The git-tools always assume that index basically contains the state as
of the last commit. (Actually the only time when this matters *now*
might be git diff - the user would get confused from the results.)

>  * Now the user has some changes in his working directory and
>    notices upstream or a side branch has notable changes
>    desireble to be picked up.  So he runs some git-tools command
>    to cause the above quoted pipeline to run.
> 
>  ? Question.  Does $merged in your example mean such an upstream
>    or side branch?  Is $base in your example the common ancestor
>    between $C and $merged?

Correct.

*HOWEVER* what is not correct is that git-tools would let you merge in
your working directory while you have local changes there.

In the past, the merge would happen in your working tree, but git-tools
wouldn't let you go for it unless your working tree has no local
changes. It would complain loudly and refuse to, since it's *not* what
you want to do and it was most likely a mistake.

Currently, git merge just creates a ,,merge/ subdirectory sharing the
object database with your working tree, but with an independent checkout
of it; it will do the merge there, and when you commit it there, it will
update your working tree with the merged changes.

I'm describing both behaviors since I might revert back to the first
one, based on what (if anything) will Linus reply to my mail about
out-of-tree merges.

But either way, when a merge is about to happen upon us, the working
tree is "clean".

> Assuming that my above understanding of your model is correct,
> here are my "thinking aloud".
> 
>  - "merge-trees $base $C $merged" looks only at the git object
>    database for those three trees named.  The data structure of
>    git object database is optimized to distinguish differences
>    in those recorded trees (and hence recorded blobs they point
>    at) without unpacking most of the files if the changes are
>    small, because all the blobs involved are already hashed.  It
>    is not very good at comparing things in git object store and
>    working files in random states, which would involve unpacking
>    blobs and comparing, so "merge-trees" does not bother.
> 
>  - What can come out from merge-trees is therefore one of the
>    following for each path from the union of paths contained in
>    $base, $C, and $merged:
> 
>    (a) Neither $C nor $merged changed it --- merge result is what
>        is in $C.

(Or in $base, if you don't want to give $C "unfair advantage", since it
does not matter. ;-)

>    (b) $C changed it but $merged did not --- merge result is what
>        is in $C.
> 
>    (c) Both $C and $merged changed it in the same way --- merge
>        result is what is in $C.
> 
>    (d) $C did not change it but $merged did --- merge result is
>        what is in $merged.
> 
>    (e) Both $C and $merged changed it differently --- merge is
>        needed and automatically succeeds between $C and $merge.
> 
>    (f) Both $C and $merged changed it differently --- merge is
>        needed but have conflicts.
> 
>  - Assuming we are dealing with the case where working files are
>    dirty and do not match what is in $C, among the above,
>    (a)-(c) can be ignored by SCM.  What the user has in his
>    working files is exactly what he would have got if he started
>    working from the merge result, although in reality the work
>    was started from $C.

Yes. Actually they can be ignored by git-tools in any case since what is
in the directory cache is $C. So it never needs to do any special
action.

>    Handling (d), (e) and (f) from SCM's point of view would be
>    the same.  They all involve 3-way merges between the file in
>    the working directory, and the file from $merged, pivoting on
>    the file from $base.  In order to help SCM, merge-trees
>    therefore should output SHA1 of blobs for such a file from
>    $base and $merged and expect SCM to run "cat-file blob" on
>    them and then merge or diff3.  Up to the point of giving
>    those two SHA1 out is the business of merge-trees and after
>    that it is up to SCM.
> 
>    That would work.  So I should base the design of output from
>    merge-trees on the above analysis, which probably needs to be
>    extended to cover differences between creation, modification,
>    and deletion.

Yes, it sounds sensible.

Actually, you don't even need to make $C more special than $merged; I
can filter out only the $merged changes on the SCM level. I guess that
would add no complexity to your tool and make it usable even for more
exotic kinds of merges (like the floating-in-the-void merge of two
"equally important" trees).

>  - However, the above is quite different from the way Linus
>    envisioned initially, on which my current implementation is
>    based [*3*].
> 
>    My current implementation is to record the merge outcome in
>    the temporary dircache $W/,,merge/.git/index for cases
>    (a)-(e).  The last case (f) is problematic and needs human
>    validation [*2*], so it is not recorded in that temporary
>    dircache, but the files to be merged are left in that
>    temporary directory and merge-trees stops there.  It is
>    expected that the end-user or SCM would merge the resulting
>    file and run update-cache to update $W/,,merge/.git/index.
>    After that happens, $W/,,merge/.git/index has the tree
>    representing the desired result of the merge.  It is expected
>    that the end-user or SCM would write-tree, commit-tree there
>    in the temporary directory, creating a new commit $C1.
> 
>    Then, it is expected that the SCM would make a patch file
>    between $C and the user working directory, checks out $C1
>    (either in the user's working directory or another temporary
>    directory; at this point merge-trees does not care because it
>    has already done its job and exited), applies that patch to
>    bring the user edits over to $C1.  Then that directory would
>    contain the desired merge of user edits.
> 
>    That is my understanding of how Linus originally wanted the
>    tool to do his kernel work with to work.  My hesitation to
>    suggestions from you to change it not to keep its own merge
>    dircache is coming from here.  Not doing what I am currently
>    doing to $W/,,merge/.git/index dircache would mean that SCM
>    would have to do more, not less, to arrive at $C1 (the result
>    of the clean $merge and $C merge pivoted at $base), where the
>    real SCM merge begins.

Well. Currently, apart from the directory cache part, I do it like you
describe. I create a new directory, after commit I apply the diff back
to the original tree etc.

The only problem is really the dircache, and that's because it would be
done totally differently than in the original tree, and I would
unnecessarily have to introduce crowds of special cases to my tools in
order for them to be usable in the merge tree (I call the ",,merge"
temporary directory a "merge tree").

And the user would still lose the capability of easily seeing the
changes being committed. I admit that I'm using this largely as an
excuse and there *could* be a tool made which would compare the given
tree with the cache, but it would be clumsy to use, violate Linus' "ALL
interactions go through the current directory cache" paradigm (whew, the
first time in my life I used this word), and we could do just fine with
our current tools.

> Although I suspect I am misunderstanding what you want, your
> messages so far suggest that what you want might be quite
> different from what Linus wants.  Please do not misunderstand
> what I mean by saying this.  I am not saying that Linus is
> always right [*4*] and therefore you are wrong for wanting
> something else.  It is just that, if what I started writing
> needs to support both of those quite different needs, I need to
> know what they are.  I think I understand what Linus wants well
> enough [*5*], but I am not certain about yours.

I can't see the conflicts between what I want and what Linus wants.
After all, Linus says that I can use the directory cache in any way I
please (well, the user can, but I'm speaking for him ;-). So I'm doing
so, and with your tool I would get into problems, since it is suddenly
imposing a policy on what should be in the index.

..snip..
> *2* Strictly speaking, case (e) needs human validation as
> well, because successful textual merge does not guarantee
> sensible semantic merge.
..snip..

Actually, I think _all_ the caches should make the human validation
_possible_ (by showing the diff of what would be merged), and it is
trivial to do so by having pristine index.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 19:57           ` Junio C Hamano
@ 2005-04-15 20:45             ` Linus Torvalds
  0 siblings, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-15 20:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Christopher Li, git



On Fri, 15 Apr 2005, Junio C Hamano wrote:
> 
> I was looking at merge-tree.c last night to add recursive
> behaviour (my favorite these days ;-) to it [*1*].
> 
> But then I started thinking.

Always good.

> LT> ... For each entry in the directory it says either
> LT> 	select <mode> <sha1> path
> LT> or
> LT> 	merge <mode>-><mode>,<mode> <sha1>-><sha1>,<sha1> path
> LT> depending on whether it could directly select the right object or not.
> 
> Given that the case you are primarily interested in is the one
> that affects only small parts of a huge tree (i.e. common kernel
> merge pattern I understand from your previous messages), your
> "hacky version" [*2*], extended for recursive operation, would
> spit out 98% select and 2% merge, and probably the origin of
> these selects are distributed across ancestor=90%, his=4%,
> my=4%, or something similar.  Am I misestimating grossly?

No. That's _exactly_ right. You do not want a recursive merge-tree. 

The "diff-tree" thing is different, exactly because it prunes out all the 
differences early on.

> I am thinking about:
> 
>  - adding recursive behaviour (I am almost done with this);

I think your suggestion sounds perfectly reasonable.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: write-tree is pasky-0.4
  2005-04-15 20:10                                         ` Junio C Hamano
@ 2005-04-15 20:58                                           ` C. Scott Ananian
  2005-04-15 21:22                                             ` Petr Baudis
  2005-04-15 23:16                                             ` Junio C Hamano
  2005-04-15 21:48                                           ` [PATCH 1/2] merge-trees script for Linus git Junio C Hamano
  1 sibling, 2 replies; 148+ messages in thread
From: C. Scott Ananian @ 2005-04-15 20:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Petr Baudis, git

On Fri, 15 Apr 2005, Junio C Hamano wrote:

> to yours is no problem for me.  Currently I see your HEAD is at
> 461aef08823a18a6c69d472499ef5257f8c7f6c8, so I will generate a
> set of patches against it.

Have you considered using an s/key-like system to make these hashes more 
human-readable?  Using the S/Key translation (11-bit chunks map to a 1-4 
letter word), Linus' HEAD is at:
   WOW-SCAN-NAVE-AUK-JILL-BASH-HI-LACE-LID-RIDE-RUSE-LINE-GLEE-WICK-A
...which is a little longer, but speaking of branch "wow-scan" (which 
gives 22 bits of disambiguation) is probably less error-prone than 
discussing branch '461...' (only 12 bits).

You could supercharge this algorithm by using (say) 
/usr/dict/american-english-large (>2^17 words; 160 bits of hash = 10 
dictionary words), or mixing upper and lower case (likely to reduce the 15 
word s/key phrase to ~11 words) to give something like
    RiDe-Rift-rIMe-rOSy-ScaR-sCat-ShiN-sIde-Sine-seeK-TIEd-TINT
My personal feeling is that case is likely to be dropped in casual 
conversation, so speaking of branch 'wow', 'wow-scan', or 'wow-scan-nave' 
is likely to be significantly more useful than trying to pronounce 
mixed-cased versions of these.

This is obviously a cogito issue, rather than a git-fs thing.
  --scott

[More info is in RFCs 2289 and 1760, although all I'm really using from 
these is the word dictionary in the appendix.]
     http://www.faqs.org/rfcs/rfc1760.html
     http://www.faqs.org/rfcs/rfc2289.html

SKIMMER MKOFTEN Ft. Bragg Sabana Seca ESMERALDITE NORAD HTAUTOMAT 
radar interception Pakistan BOND Kennedy postcard corporate globalization
                          ( http://cscott.net/ )

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: write-tree is pasky-0.4
  2005-04-15 20:58                                           ` C. Scott Ananian
@ 2005-04-15 21:22                                             ` Petr Baudis
  2005-04-15 23:16                                             ` Junio C Hamano
  1 sibling, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-15 21:22 UTC (permalink / raw)
  To: C. Scott Ananian; +Cc: Junio C Hamano, Linus Torvalds, git

Dear diary, on Fri, Apr 15, 2005 at 10:58:10PM CEST, I got a letter
where "C. Scott Ananian" <cscott@cscott.net> told me that...
> On Fri, 15 Apr 2005, Junio C Hamano wrote:
> 
> >to yours is no problem for me.  Currently I see your HEAD is at
> >461aef08823a18a6c69d472499ef5257f8c7f6c8, so I will generate a
> >set of patches against it.
> 
> Have you considered using an s/key-like system to make these hashes more 
> human-readable?  Using the S/Key translation (11-bit chunks map to a 1-4 
> letter word), Linus' HEAD is at:
>   WOW-SCAN-NAVE-AUK-JILL-BASH-HI-LACE-LID-RIDE-RUSE-LINE-GLEE-WICK-A
> ...which is a little longer, but speaking of branch "wow-scan" (which 
> gives 22 bits of disambiguation) is probably less error-prone than 
> discussing branch '461...' (only 12 bits).
> 
> You could supercharge this algorithm by using (say) 
> /usr/dict/american-english-large (>2^17 words; 160 bits of hash = 10 
> dictionary words), or mixing upper and lower case (likely to reduce the 15 
> word s/key phrase to ~11 words) to give something like
>    RiDe-Rift-rIMe-rOSy-ScaR-sCat-ShiN-sIde-Sine-seeK-TIEd-TINT
> My personal feeling is that case is likely to be dropped in casual 
> conversation, so speaking of branch 'wow', 'wow-scan', or 'wow-scan-nave' 
> is likely to be significantly more useful than trying to pronounce 
> mixed-cased versions of these.
> 
> This is obviously a cogito issue, rather than a git-fs thing.

I kind of like it, the only thing I fear is possible conflict with
branch names; it is not very likely though, I think. I believe (at
least) the first three words should be used if possible.

I'm not sure in what cases do you think we should use those "verbal"
names, though. Of course we should accept them as IDs, but I don't think
we should ever show them automatically. Probably provide a trivial to
use tool to convert to them, and parameters for *-id tools to show them.

I assume we would have a custom tool for the translation?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/2] merge-trees script for Linus git
  2005-04-15 20:10                                         ` Junio C Hamano
  2005-04-15 20:58                                           ` C. Scott Ananian
@ 2005-04-15 21:48                                           ` Junio C Hamano
  2005-04-15 21:54                                             ` [PATCH 2/2] " Junio C Hamano
  2005-04-15 23:33                                             ` [PATCH 3/2] " Junio C Hamano
  1 sibling, 2 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 21:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus,

    what you have in 461aef08823a18a6c69d472499ef5257f8c7f6c8 is fine
by me for the essential support for merge-trees (sorry for the
confusing name, but this is a stop-gap Q&D script until I do the real
merge-tree.c conversion).

This patch contains the merge-trees script itself and Makefile entry
for it.  I have some more fixes to merge-trees in the works but that
will follow later.

I have an optional patch to add '-q' option to show-diff so that
complaints for missing files can be squelched, which I will be sending
you in a separate message.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 Makefile    |    2 
 merge-trees |  302 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 303 insertions(+), 1 deletion(-)

Makefile:  b39b4ea37586693dd707d1d0750a9b580350ec50
--- Makefile
+++ Makefile	2005-04-15 13:32:06.000000000 -0700
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-	check-files ls-tree merge-tree
+	check-files ls-tree merge-tree merge-trees
 
 all: $(PROG)
 

--- /dev/null	2005-03-19 15:28:25.000000000 -0800
+++ merge-trees	2005-04-15 13:32:20.000000000 -0700
@@ -0,0 +1,302 @@
+#!/usr/bin/perl -w
+
+use strict;
+use Cwd;
+use Getopt::Long;
+
+my $full_checkout = 0;
+my $partial_checkout = 0;
+my $output_directory = ',,merge~tree';
+
+GetOptions("full-checkout" => \$full_checkout,
+	   "partial-checkout" => \$partial_checkout,
+	   "output-directory=s" => \$output_directory)
+    or die;
+
+
+if (@ARGV != 3) {
+    die "Usage: $0 -o [output-directory] [-f] [-p] ancestor A B\n";
+}
+
+if ($full_checkout) {
+    $partial_checkout = 1;
+}
+
+################################################################
+# UI helper -- although it is encouraged to give tree ID, 
+# it is OK to give commit ID.
+sub possibly_commit_to_tree {
+    my ($commit_or_tree_id) = @_;
+    my $type = read_cat_file_t($commit_or_tree_id);
+    if ($type eq 'tree') { return $commit_or_tree_id }
+    if ($type ne 'commit') {
+	die "Tree ID (or commit ID) required, given $type.";
+    }
+
+    my ($fhi);
+    open $fhi, '-|', 'cat-file', 'commit', $commit_or_tree_id
+	or die "$!: cat-file commit $commit_or_tree_id";
+    my ($tree) = <$fhi>;
+    close $fhi;
+    ($tree =~ s/^tree (.*)$/$1/)
+	or die "$tree: Linus says the first line is guaranteed to be tree.";
+    return $tree;
+}
+
+sub read_cat_file_t {
+    my ($id) = @_;
+    my ($fhi);
+    open $fhi, '-|', 'cat-file', '-t', $id
+	or die "$!: cat-file -t $id";
+    my ($t) = <$fhi>;
+    close $fhi;
+    chomp($t);
+    return $t;
+}
+
+################################################################
+# Reads diff-tree -r output and gives a hash that maps a path
+# to 4-tuple (old-mode new-mode old-oid new-oid).
+# When creating, old-* are undef.  When removing, new-* are undef.
+
+sub OLD_MODE () { 0 }
+sub NEW_MODE () { 1 }
+sub OLD_OID ()  { 2 }
+sub NEW_OID ()  { 3 }
+
+sub read_diff_tree {
+    my (@tree) = @_;
+    my ($fhi);
+
+    # Regular expression piece for mode
+    my $reM  = '[0-7]+';
+
+    # Regular expression piece for object ID.
+    # There is a talk about base-64 so better make it easier to modify...
+    my $reID = '[0-9a-f]{40}';
+
+    local ($_, $/);
+    $/ = "\0"; 
+    my %path;
+    open $fhi, '-|', 'diff-tree', '-r', @tree
+	or die "$!: diff-tree -r @tree";
+    while (<$fhi>) {
+	chomp;
+	if (/^\*($reM)->($reM)\tblob\t($reID)->($reID)\t(.*)$/so) {
+	    $path{$5} = [$1, $2, $3, $4]; # modified
+	}
+	elsif (/^\+($reM)\tblob\t($reID)\t(.*)$/so) {
+	    $path{$3} = [undef, $1, undef, $2]; # added
+	}
+	elsif (/^\-($reM)\tblob\t($reID)\t(.*)$/so) {
+	    $path{$3} = [$1, undef, $2, undef]; # deleted
+	}
+	else {
+	    die "cannot parse diff-tree output: $_";
+	}
+    }
+    close $fhi;
+    return %path;
+}
+
+################################################################
+# Read show-files output to figure out the set of files contained
+# in the tree.  This is used to figure out what ancestor had.
+sub read_show_files {
+    my ($fhi);
+    local ($_, $/);
+    $/ = "\0"; 
+    open $fhi, '-|', 'show-files', '-z', '--cached'
+	or die "$!: show-files -z --cached";
+    my (@path) = map { chomp; $_ } <$fhi>;
+    close $fhi;
+    return @path;
+}
+
+################################################################
+# Given path and info (typically returned from read_diff_tree),
+# create the file in the working directory to match the NEW tree.
+# This does not touch dircache.
+sub checkout_file {
+    my ($path, $info) = @_;
+    my (@elt) = split(/\//, $path);
+    my $j = '';
+    my $tail = pop @elt;
+    my ($fhi, $fho);
+    for (@elt) {
+	mkdir "$j$_";
+	$j = "$j$_/";
+    }
+    open $fho, '>', "$path";
+    open $fhi, '-|', 'cat-file', 'blob', $info->[NEW_OID]
+	or die "$!: cat-file blob $info->[NEW_OID]";
+    while (<$fhi>) {
+	print $fho $_;
+    }
+    close $fhi;
+    close $fho;
+    chmod oct("0$info->[NEW_MODE]"), "$path";
+}
+
+################################################################
+# Given path and info record the file in the dircache without
+# affecting working directory.
+sub record_file {
+    my ($path, $info) = @_;
+    system ('update-cache', '--add', '--cacheinfo',
+	    $info->[NEW_MODE], $info->[NEW_OID], $path);
+}
+
+################################################################
+# Merge info from two trees and leave it in path, without
+# affecting dircache.
+sub merge_tree {
+    my ($path, $infoA, $infoB) = @_;
+    checkout_file("$path~A~", $infoA);
+    checkout_file("$path~B~", $infoB);
+    system 'checkout-cache', $path;
+    rename $path, "$path~O~";
+    my ($fhi, $fho);
+    open $fhi, '-|', 'merge', '-p', "$path~A~", "$path~O~", "$path~B~";
+    open $fho, '>', $path;
+    local ($/);
+    while (<$fhi>) { print $fho $_; }
+    close $fhi;
+    close $fho;
+    # There is no reason to prefer infoA over infoB but
+    # we need to pick one.
+    chmod oct("0$infoA->[NEW_MODE]"), $path;
+}
+
+################################################################
+
+# O stands for "the original".  A and B are being merged.
+my ($treeO, $treeA, $treeB) = map { possibly_commit_to_tree $_ } @ARGV;
+
+# Create a temporary directory and go there.
+system('rm', '-rf', $output_directory) == 0 &&
+system('mkdir', '-p', "$output_directory/.git") == 0 &&
+symlink(Cwd::getcwd . "/.git/objects", "$output_directory/.git/objects") &&
+chdir $output_directory &&
+system('read-tree', $treeO) == 0
+    or die "$!: Failed to set up merge working area $output_directory";
+
+# Find out edits done in each branch.
+my %treeA = read_diff_tree($treeO, $treeA);
+my %treeB = read_diff_tree($treeO, $treeB);
+
+# The list of files that was in the ancestor.
+my @ancestor_file = read_show_files();
+my %ancestor_file = map { $_ => 1 } @ancestor_file;
+
+# Report output is formated as follows:
+#
+# The first letter shows the origin of the result.
+#   O - original
+#   A - treeA
+#   B - treeB
+#   M - both treeA and treeB
+#   * - treeA and treeB conflicts; needs human action.
+#
+# The second and third letter shows what each tree did.
+#   . - no change
+#   A - created
+#   M - modified
+#   D - deleted
+
+for (@ancestor_file) {
+    if (! exists $treeA{$_} && ! exists $treeB{$_}) {
+	if ($full_checkout) {
+	    system 'checkout-cache', $_;
+	}
+	print STDERR "O.. $_\n"; # keep original
+    }
+}
+
+for my $set ([\%treeA, \%treeB, 'A'], [\%treeB, \%treeA, 'B']) {
+    my ($this, $other, $side) = @$set;
+    my $delete_sign = ($side eq 'A') ? 'D.' : '.D';
+    my $create_sign = ($side eq 'A') ? 'A.' : '.A';
+    my $modify_sign = ($side eq 'A') ? 'M.' : '.M';
+    while (my ($path, $info) = each %$this) {
+	# In this loop we do not deal with overlaps.
+	next if (exists $other->{$path});
+
+	if (! defined $info->[NEW_OID]) {
+	    # deleted in this tree only.
+	    unlink $path;
+	    system 'update-cache', '--remove', $path;
+	    print STDERR "${side}${delete_sign} $path\n";
+	}
+	else {
+	    # modified or created in this tree only.
+	    my $create_or_modify =
+		(! defined $info->[OLD_OID]) ? $create_sign : $modify_sign;
+	    print STDERR "${side}${create_or_modify} $path\n";
+	    if ($partial_checkout) {
+		checkout_file($path, $info);
+		system 'update-cache', '--add', $path;
+	    } else {
+		record_file($path, $info);
+	    }
+	}
+    }
+}
+
+my @warning = ();
+
+while (my ($path, $infoA) = each %treeA) {
+    # We need to deal only with overlaps.
+    next if (!exists $treeB{$path});
+
+    my $infoB = $treeB{$path};
+    if (! defined $infoA->[NEW_OID]) {
+	# Deleted in tree A.
+	if (! defined $infoB->[NEW_OID]) {
+	    # Deleted in both trees (obvious).
+	    print STDERR "MDD $path\n";
+	    unlink $path;
+	    system 'update-cache', '--remove', $path;
+	}
+	else {
+	    # TreeA wants to remove but TreeB wants to modify it.
+	    print STDERR "*DM $path\n";
+	    checkout_file("$path~B~", $infoB);
+	    push @warning, $path;
+	}
+    }
+    else {
+	# Modified or created in tree A
+	if (! defined $infoB->[NEW_OID]) {
+	    # TreeA wants to modify but treeB wants to remove it.
+	    print STDERR "*MD $path\n";
+	    checkout_file("$path~A~", $infoA);
+	    push @warning, $path;
+	}
+	else {
+	    # Modified both in treeA and treeB.
+	    # Are they modifying to the same contents?
+	    if ($infoA->[NEW_OID] eq $infoB->[NEW_OID]) {
+		# No changes or just the mode.
+		# we prefer TreeA over TreeB for no particular reason.
+		print STDERR "MMM $path\n";
+		record_file($path, $infoA);
+	    }
+	    else {
+		# Modified in both.  Needs merge.
+		print STDERR "*MM $path\n";
+		merge_tree($path, $infoA, $infoB);
+	    }
+	}
+    }
+}
+
+if (@warning) {
+    print "\nThere are some files that were deleted in one branch and\n"
+	. "modified in another.  Please examine them carefully:\n";
+    for (@warning) {
+	print "$_\n";
+    }
+}
+
+# system 'show-diff', '-q';


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 2/2] merge-trees script for Linus git
  2005-04-15 21:48                                           ` [PATCH 1/2] merge-trees script for Linus git Junio C Hamano
@ 2005-04-15 21:54                                             ` Junio C Hamano
  2005-04-15 23:33                                             ` [PATCH 3/2] " Junio C Hamano
  1 sibling, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 21:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus,

This is the '-q' option for show-diff.c to squelch complaints for
missing files.  It is handy if you want to run it in the merge
temporary directory after running merge-trees with its minimum
checkout mode, which is the default, because you would not find any
files other than the ones that needs human validation after the merge
there.

It also fixes the argument parsing bug Paul Mackerras noticed in
<16991.42305.118284.139777@cargo.ozlabs.ibm.com> but slightly
differently.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 
 show-diff.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

show-diff.c:  3f7acd2a692a03026784a18f28521b9af322b71e
--- show-diff.c
+++ show-diff.c	2005-04-15 14:14:53.000000000 -0700
@@ -58,15 +58,20 @@
 int main(int argc, char **argv)
 {
 	int silent = 0;
+	int silent_on_nonexisting_files = 0;
 	int entries = read_cache();
 	int i;
 
-	while (argc-- > 1) {
-		if (!strcmp(argv[1], "-s")) {
-			silent = 1;
+	for (i = 1; i < argc; i++) {
+		if (!strcmp(argv[i], "-s")) {
+			silent_on_nonexisting_files = silent = 1;
 			continue;
 		}
-		usage("show-diff [-s]");
+		if (!strcmp(argv[i], "-q")) {
+			silent_on_nonexisting_files = 1;
+			continue;
+		}
+		usage("show-diff [-s] [-q]");
 	}
 
 	if (entries < 0) {
@@ -82,8 +87,10 @@
 		void *new;
 
 		if (stat(ce->name, &st) < 0) {
+			if (errno == ENOENT && silent_on_nonexisting_files)
+				continue;
 			printf("%s: %s\n", ce->name, strerror(errno));
-			if (errno == ENOENT && !silent)
+			if (errno == ENOENT)
 				show_diff_empty(ce);
 			continue;
 		}



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: write-tree is pasky-0.4
  2005-04-15 20:13                                           ` Linus Torvalds
@ 2005-04-15 22:36                                             ` Petr Baudis
  2005-04-16  0:22                                               ` Linus Torvalds
  0 siblings, 1 reply; 148+ messages in thread
From: Petr Baudis @ 2005-04-15 22:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Dear diary, on Fri, Apr 15, 2005 at 10:13:21PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> 
> 
> On Fri, 15 Apr 2005, Petr Baudis wrote:
> > 
> > So, I assume that you don't want to merge my "SCM layer" (which is
> > perfectly fine by me). However, I also apply plenty of patches
> > concerning the "core git" - be it portability, leak fixes, argument
> > parsing fixes and so on.
> 
> I'm actually perfectly happy to merge your SCM layer too eventually, but 
> I'm nervous at this point.  Especially while people are discussing some 
> SCM options that I'm personally very leery of, and think that may make 
> sense for others, but that I personally distrust.

You mean the renames tracking and similar yet mostly theoretical
discussions? Or do you dislike something already implemented? I'd be
happy to hear about it in that case. (To argue about it and likely get
persuaded... ;-)

But otherwise it is great news to me. Actually, in that case, is it
worth renaming it to Cogito and using cg to invoke it? Wouldn't be that
actually more confusing after it gets merged? IOW, should I stick to
"git" or feel free to rename it to "cg"?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 20:40                             ` Petr Baudis
@ 2005-04-15 22:41                               ` Junio C Hamano
  0 siblings, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 22:41 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, git

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> I can't see the conflicts between what I want and what Linus wants.
PB> After all, Linus says that I can use the directory cache in any way I
PB> please (well, the user can, but I'm speaking for him ;-). So I'm doing
PB> so, and with your tool I would get into problems, since it is suddenly
PB> imposing a policy on what should be in the index.

I think our misunderstanding is coming from the use of the word
"merge tree".  I think you have been assuming that I wanted you
to run "merge-trees -o ,,merge" --- which would certainly cause
me to muck with your dircache there.  I totally agree with you
that that is a *BAD* *THING*.  No question there.

However, my assumption has been different.  I was assuming that
you would run "merge-trees -o merge~tree" (i.e. different from
your "merge tree"), so that you can get the merge results in a
form parsable by you.  And then, using that information, you can
make your changes in ,,merge.  After you are done with that
information, you can remove "merge~trees", of course.

The format I chose for the "merge result in a form parsable by
you" happens to be a dircache in "merge~tree", with minimum
number of files checked out when merge cannot be automatically
done safely.  In the simplest case of not having any conflicting
merge between $C and $merged, Cogito can immediately run
write-tree in "merge~tree" (not ,,merge) to obtain its tree-ID
$T, so that it can feed it to diff-tree to compare it with
whatever tree state Cogito wants to apply the merges between $C
and $merged to.

I still do not understand what you do in ,,merge directory, but
here is one way you can update the user working directory
in-place without having a ,,merge directory [*2*].  You can run
your "git diff" between $C and $T [*1*].  The result is the diff
you need to apply on top of your user's working files.  If the
user does not like the result of running that diff, it can
easily be reversed.

If a manual merge were needed between $C and $merged, Cogito
could guide the user through that manual edit in "merge~tree",
and run update-cache on those hand merged files in "merge~tree",
before running write-tree in "merge~tree" to obtain $T; after
that, everything else is the same.

You make interesting points in other parts of your message I
need to regurgitate for a while, so I would not comment on them
in this message.

[Footnote]

*1* I really like the convenience of being able to use tree-ID
and commit-ID interchangeably there.  Thanks.

*2* I understand that this would change the user's "git-tools"
experience a bit.  The user will not be told to "go to ,,merge
and commit there which will reflected back to your working tree"
anymore.  Instead the merge happens in-place.  Committing, not
committing, or further hand-fixing the merge is up to the user.
I suspect this change might even be for the better.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: Merge with git-pasky II.
  2005-04-15 20:31 ` Merge with git-pasky II Linus Torvalds
@ 2005-04-15 23:00   ` Barry Silverman
  2005-04-16  0:32     ` Linus Torvalds
  0 siblings, 1 reply; 148+ messages in thread
From: Barry Silverman @ 2005-04-15 23:00 UTC (permalink / raw)
  To: 'Linus Torvalds'; +Cc: git

The issue I am trying to come to grips with in the current design, is
that the git repository of a number of interrelated projects will soon
become the logical OR of all blobs, commits, and trees in ALL the
projects. 

This will involve horrendous amounts of replication, as developers end
interchanging objects that originated from third parties who are not
party to the merge (and which happen to be in the repository because of
previous merge activity).

It would be really nice if the repositories for each project stay
distinct (and maybe even living on different servers). Merges should the
one of the few points of contact where the state be exchanged between
the repositories.

>> If you don't keep track of the incremental merges, you end up with
one 
>> really _difficult_ merge that may not be mergable at all. Not 
>> automatically, and perhaps not even with help.

No argument from me.... In my example, I didn't intend A2,A3,A4 to be
considered hash's of points where you did a merge - rather they are
hashes of individual points where you did a commit BETWEEN the points
where you merged. A1 and A5 are the "merge points"! 

It makes perfect sense to define some subset of commits as "merge-like"
commits, and only have those copied over from one repository to the
other. You could also only use merge-points in the common ancestor
calculation, and not worry about intermediate commits. 

Only small changes to the existing logic are necessary to do a merge by
"distributing" out the merge algorithm to each repository. This involves
querying each repository, and communicating the results, followed by
copying over only those blob objects necessary for the merge. 

After the merge, you would create a "merge-point commit" record that has
one of the parents pointing to a hash in the other repository!

But the BIG issue with this scheme, is that you will not be replicating
over any of the intermediate commits, trees, or blobs (not really needed
by the merge), but currently being traversed by various plumbing
components.

Hence my question....

-----Original Message-----
From: Linus Torvalds [mailto:torvalds@osdl.org] 
Sent: Friday, April 15, 2005 4:31 PM
To: Barry Silverman
Cc: git@vger.kernel.org
Subject: RE: Merge with git-pasky II.


[ I'm cc'ing the git list even though Barry's question wasn't cc'd. 
  Because I think his question is interesting and astute per se, even
  if I disagree with the proposal ]

On Fri, 15 Apr 2005, Barry Silverman wrote:
>
> If git is totally project based, and each commit represents total
state
> of the project, then how important is the intermediate commit
> information between two states. 

You need it in order to do further merges.

> IE, Area maintainer has A1->A2->A3->A4->A5 in a repository with 5
> commits, and 5 comments. And I last synced with A1.
> 
> A few days later I sync again. Couldn't I just pull the "diff-tree A5
> A1" and then commit to my tree just the record A1->A5. Why does MY
> repository need trees A2,A3,A4?

Because that second merge needs the first merge to work well. The first 
merge might have had some small differences that ended up auto-merging
(or 
even needing some manual help from you). The second time you sync, there

migth be some more automatic merging. And so on.

If you don't keep track of the incremental merges, you end up with one 
really _difficult_ merge that may not be mergable at all. Not 
automatically, and perhaps not even with help.

So in order to keep things mergable, you need to not diverge. And the 
intermediate merges are the "anchor-points" for the next merge, keeping 
the divergences minimal. 

I'm personally convinced that one of the reasons CVS is a pain to merge
is 
obviously that it doesn't do a good job of finding parents, but also 
exactly _because_ it makes merges so painful that people wait longer to
do 
them, so you never end up fixing the simple stuff. In contrast, if you
have all these small merges going on all the time, the hope is that 
there's never any really painful nasty "final merge".

So you're right - the small merges do end up cluttering up the revision 
history. But it's a small price to pay if it means that you avoid having

the painful ones.

> Isn't preserving the A1,A2,A3,A4,A5 a legacy of BK, which required all
> the changesets to be loaded in order, and so is a completely "file"
> driven concept? 

Nope. In fact, to some degree git will need this even _more_, since the
git merger is likely to be _weaker_ than BK, and thus more easily
confused.

I do believe that BK has these things for the same reason.

			Linus



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: write-tree is pasky-0.4
  2005-04-15 20:58                                           ` C. Scott Ananian
  2005-04-15 21:22                                             ` Petr Baudis
@ 2005-04-15 23:16                                             ` Junio C Hamano
  1 sibling, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 23:16 UTC (permalink / raw)
  To: C. Scott Ananian; +Cc: Linus Torvalds, Petr Baudis, git

>>>>> "CSA" == C Scott Ananian <cscott@cscott.net> writes:

CSA> On Fri, 15 Apr 2005, Junio C Hamano wrote:
>> to yours is no problem for me.  Currently I see your HEAD is at
>> 461aef08823a18a6c69d472499ef5257f8c7f6c8, so I will generate a
>> set of patches against it.

CSA> Have you considered using an s/key-like system to make these hashes
CSA> more human-readable?  Using the S/Key translation (11-bit chunks map
CSA> to a 1-4
CSA> letter word), Linus' HEAD is at:
CSA>    WOW-SCAN-NAVE-AUK-JILL-BASH-HI-LACE-LID-RIDE-RUSE-LINE-GLEE-WICK-A
CSA> ...which is a little longer, but speaking of branch "wow-scan" (which
CSA> gives 22 bits of disambiguation) is probably less error-prone than
CSA> discussing branch '461...' (only 12 bits).

I understand monotone folks have the same issue and they let you
use unambiguous prefix string.  And why do you stop counting at
"461" in your example?  To my eyes, "461aef" in this particular
string stands out and is easily typable, which gives me 24 bits
;-).

But seriously I doubt the hex format is needed to be shown to
humans very often.  E-mail communications like this one being a
very special exception.  I do not expect for people to be
talking about "Hey, Junio's patch against 461aef... from Linus
is a total crap" like that.

The only reason I mentioned his then-HEAD by hex is because I do
not have a public archive for him to pull from, and I wanted to
make it easy for him to do:

 $ export SHA1_FILE_DIRECTORY
 $ mkdir junk && cd junk && mkdir .git &&
   read-tree `cat-file commit 461aef... | sed -e 's/^tree //;q'`
 $ patch < ../stupid-patch-from-junio-01
 $ show-diff

(it might have been better if I used the tree ID for this purpose).

For Cogito users the hex format does not matter.  "git pull"
will get whatever HEAD recorded in the file on the sending end
and the end user does not even have to know about it.

CSA> This is obviously a cogito issue, rather than a git-fs thing.

Yes.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/2] merge-trees script for Linus git
  2005-04-15 21:48                                           ` [PATCH 1/2] merge-trees script for Linus git Junio C Hamano
  2005-04-15 21:54                                             ` [PATCH 2/2] " Junio C Hamano
@ 2005-04-15 23:33                                             ` Junio C Hamano
  2005-04-16  1:02                                               ` Linus Torvalds
  1 sibling, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-15 23:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus,

    the merge-trees I sent you earlier was expecting the old
diff-tree behaviour, and I did not realize that I need an
explicit -z flag now.  Here is a fix.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 merge-trees |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

--- merge-trees	2005-04-15 13:21:35.000000000 -0700
+++ merge-trees+	2005-04-15 16:27:34.000000000 -0700
@@ -78,8 +78,8 @@
     local ($_, $/);
     $/ = "\0"; 
     my %path;
-    open $fhi, '-|', 'diff-tree', '-r', @tree
-	or die "$!: diff-tree -r @tree";
+    open $fhi, '-|', 'diff-tree', '-r', '-z', @tree
+	or die "$!: diff-tree -r -z @tree";
     while (<$fhi>) {
 	chomp;
 	if (/^\*($reM)->($reM)\tblob\t($reID)->($reID)\t(.*)$/so) {


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: write-tree is pasky-0.4
  2005-04-15 22:36                                             ` Petr Baudis
@ 2005-04-16  0:22                                               ` Linus Torvalds
  2005-04-16  1:13                                                 ` Daniel Barkalow
  2005-04-16 15:34                                                 ` Re: Re: " Petr Baudis
  0 siblings, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16  0:22 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git



On Sat, 16 Apr 2005, Petr Baudis wrote:
> 
> But otherwise it is great news to me. Actually, in that case, is it
> worth renaming it to Cogito and using cg to invoke it? Wouldn't be that
> actually more confusing after it gets merged? IOW, should I stick to
> "git" or feel free to rename it to "cg"?

I'm perfectly happy for it to stay as "git", and in general I don't have
any huge preferences either way. You guys can discuss names as much as you
like, it's the "tracking renames" and "how to merge" things that worry me.

I think I've explained my name tracking worries.  When it comes to "how to 
merge", there's three issues:

 - we do commonly have merge clashes where both trees have applied the 
   exact same patch. That should merge perfectly well using the 3-way
   merge from a common parent that Junio has, but not your current "bring
   patches forward" kind of strategy.
 - I _do_ actually sometimes merge with dirty state in my working 
   directory, which is why I want the merge to take place in a separate 
   (and temporary) directory, which allows for a failed merge without 
   having any major cleanup. If the merge fails, it's not a big deal, and 
   I can just blow the merge directory away without losing the work I had 
   in my "real" working directory.
 - reliability. I care much less for "clever" than I care for "guaranteed 
   to never do the wrong thing". If I have to fix up some stuff by hand, 
   I'll happily do so. But if I can't trust the merge and have to _check_ 
   things by hand afterwards, that will make me leery of the merges, and
   _that_ is bad.

The third point is why I'm going to the ultra-conservative "three-way 
merge from the common parent". It's not fancy, but it's something I feel 
comfortable with as a merge strategy. For example, arch (and in particular 
darcs) seems to want to try to be "clever" about the merges, and I'd 
always live in fear. 

And, finally, there's obviously performance. I _think_ a normal merge with
nary a conflict and just a few tens of files changed should be possible in
a second. I realize that sounds crazy to some people, but I think it's
entirely doable. Half of that is writing the new tree out (that is a
relative costly op due to the compression). The other half is the "work".

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: Merge with git-pasky II.
  2005-04-15 23:00   ` Barry Silverman
@ 2005-04-16  0:32     ` Linus Torvalds
  0 siblings, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16  0:32 UTC (permalink / raw)
  To: Barry Silverman; +Cc: git



On Fri, 15 Apr 2005, Barry Silverman wrote:
>
> The issue I am trying to come to grips with in the current design, is
> that the git repository of a number of interrelated projects will soon
> become the logical OR of all blobs, commits, and trees in ALL the
> projects. 

Nope. I'm actually against the notion of sharing object directories
between projects. The git model _allows_ it, and I think it can be a valid 
approach, but it's absolutely not an approach that I would personally 
suggest be used for the kernel.

In other words, the default git behaviour - and the one I personally am a 
proponent of - is to have one object directory per work tree, and really 
consider all such trees independent. Then you can merge between trees if 
you want to, and bring in objects that way, but normally you would _not_ 
have tons of objects from other trees, and _especially_ not from other 
unrelated projects.

The reason git supports shared object archives is that (a) it falls out 
trivially as part of the design, so not allowing it is silly and (b) it is 
part of a merge, where you _do_ want to get the objects of the trees you 
merge, and in particular you need to generate a seperate tree that has all 
those objects without having to copy them.

(Before you do the merge, you need to bring the new objects into your 
repository of course, but that I consider to be a separate issue, not 
part of the actual technical merge process).

So normally, you'd probably have a totally pruned tree, with only the 
objects you need (and you might even consider the "commit parent links" 
less than necessary, especially if you're just a regular user and not a 
developer who wants to merge).

But the ability to have extra objects is wonderful. It makes going
backwards in time basically free (while the equivalent "bk undo" in the BK
world is a very expensive operation), and it makes it easy to
incrementally keep up-to-date with trees that you know you're _eventually_
going to merge with. But it's not an excuse to put just any random crap in 
that object directory..

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/2] merge-trees script for Linus git
  2005-04-15 23:33                                             ` [PATCH 3/2] " Junio C Hamano
@ 2005-04-16  1:02                                               ` Linus Torvalds
  2005-04-16  4:10                                                 ` Junio C Hamano
  0 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16  1:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Fri, 15 Apr 2005, Junio C Hamano wrote:
> 
>     the merge-trees I sent you earlier was expecting the old
> diff-tree behaviour, and I did not realize that I need an
> explicit -z flag now.

You didn't need one - I just didn't want to merge your "ls-tree" change
without making things be consistent. Once we started using the "-z" flag 
for ls-tree, it just didn't make any sense not to do the same thing for 
diff-tree.

Just a heads-up - I'd really want to do the same thing to "merge-tree.c" 
too, but since you said that you were working on extending that to do 
recursion etc, I decided to hold off. So if you're working on it, maybe 
you can add the "-z" flag there too? 

I'm actually holding off merging the perl version exactly because you 
seemed to be working on the C version. I don't mind perl per se, but if 
there's a real solution coming down the line..

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: write-tree is pasky-0.4
  2005-04-16  0:22                                               ` Linus Torvalds
@ 2005-04-16  1:13                                                 ` Daniel Barkalow
  2005-04-16  2:18                                                   ` Linus Torvalds
  2005-04-16 15:34                                                 ` Re: Re: " Petr Baudis
  1 sibling, 1 reply; 148+ messages in thread
From: Daniel Barkalow @ 2005-04-16  1:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git

On Fri, 15 Apr 2005, Linus Torvalds wrote:

> I think I've explained my name tracking worries.  When it comes to "how to 
> merge", there's three issues:
> 
>  - we do commonly have merge clashes where both trees have applied the 
>    exact same patch. That should merge perfectly well using the 3-way
>    merge from a common parent that Junio has, but not your current "bring
>    patches forward" kind of strategy.

I think 3-way merge is probably the best starting point, but I think that
there might be value in being able to identify the commits of each side
involved in a conflict. I think this would help with cases where both
sides pick up an identical patch, and then each side makes a further
change to a different part of the changed region (you find out that the
other guy's change was supposed to follow the patch, and don't conflict
with it).

>  - I _do_ actually sometimes merge with dirty state in my working 
>    directory, which is why I want the merge to take place in a separate 
>    (and temporary) directory, which allows for a failed merge without 
>    having any major cleanup. If the merge fails, it's not a big deal, and 
>    I can just blow the merge directory away without losing the work I had 
>    in my "real" working directory.

Is there some reason you don't commit before merging? All of the current
merge theory seems to want to merge two commits, using the information git
keeps about them. It should be cheap to get a new clean working directory
to merge in, too, particularly if we add a cache of hardlinkable expanded
blobs.

>  - reliability. I care much less for "clever" than I care for "guaranteed 
>    to never do the wrong thing". If I have to fix up some stuff by hand, 
>    I'll happily do so. But if I can't trust the merge and have to _check_ 
>    things by hand afterwards, that will make me leery of the merges, and
>    _that_ is bad.
> 
> The third point is why I'm going to the ultra-conservative "three-way 
> merge from the common parent". It's not fancy, but it's something I feel 
> comfortable with as a merge strategy. For example, arch (and in particular 
> darcs) seems to want to try to be "clever" about the merges, and I'd 
> always live in fear. 

How much do you care about the situation where there is no best common
ancestor (which can happen if you're merging two main lines, each of which
has merged with both of a pair of minor trees)? I think that arch is even
more conservative, in that it doesn't look for a common ancestor, and
reports conflicts whenever changes overlap at all. Of course, reliability
by virtue of never working without help is not a big win over living in
fear; you always have to check over it, not because you're afraid, but
because it needs you to.

> And, finally, there's obviously performance. I _think_ a normal merge with
> nary a conflict and just a few tens of files changed should be possible in
> a second. I realize that sounds crazy to some people, but I think it's
> entirely doable. Half of that is writing the new tree out (that is a
> relative costly op due to the compression). The other half is the "work".

I think that the time spent on I/O will be overwhelmed by the time spent
issuing the command at that rate. It might matter if you start getting
into merging lots of things at once, but that's more like a minute for a
merge group with 600 changes rather than a second per merge; we could
potentially save a lot of time based of having a bunch of information left
over from the previous merge when starting merge number 2. So 15 seconds
plus half a second per merge might be better than a second per merge in
the case that matters.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 15:32                             ` Linus Torvalds
  2005-04-15 16:01                               ` David Woodhouse
  2005-04-15 19:20                               ` Paul Jackson
@ 2005-04-16  1:44                               ` Simon Fowler
  2005-04-16 12:19                                 ` David Lang
  2005-04-16 20:29                               ` Sanjoy Mahajan
  3 siblings, 1 reply; 148+ messages in thread
From: Simon Fowler @ 2005-04-16  1:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Woodhouse, Junio C Hamano, Petr Baudis, git


[-- Attachment #1.1: Type: text/plain, Size: 1189 bytes --]

On Fri, Apr 15, 2005 at 08:32:46AM -0700, Linus Torvalds wrote:
> In other words, I'm right. I'm always right, but sometimes I'm more right 
> than other times. And dammit, when I say "files don't matter", I'm really 
> really Right(tm).
> 
You're right, of course (All Hail Linus!), if you can make it work
efficiently enough.

Just to put something else on the table, here's how I'd go about
tracking renames and the like, in another world where Linus /does/
make the odd mistake - it's basically a unique id for files in the
repository, added when the file is first recognised and updated when
update-cache adds a new version to the cache. Renames copy the id
across to the new name, and add it into the cache.

This gives you an O(n) way to tell what file was what across
renames, and it might even be useful in Linus' world, or if someone
wanted to build a traditional SCM on top of a git-a-like.

Attached is a patch, and a rename-file.c to use it.

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 

[-- Attachment #1.2: guid2.patch --]
[-- Type: text/plain, Size: 19027 bytes --]

COPYING:  fe2a4177a760fd110e78788734f167bd633be8de
Makefile:  ca50293c4f211452d999b81f122e99babb9f2987
--- Makefile
+++ Makefile	2005-04-15 22:17:49.000000000 +1000
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-	check-files ls-tree
+	check-files ls-tree rename-file
 
 SCRIPT=	parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
 	gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
@@ -73,6 +73,9 @@
 ls-tree: ls-tree.o read-cache.o
 	$(CC) $(CFLAGS) -o ls-tree ls-tree.o read-cache.o $(LIBS)
 
+rename-file: rename-file.o read-cache.o
+	$(CC) $(CFLAGS) -o rename-file rename-file.o read-cache.o $(LIBS)
+
 read-cache.o: cache.h
 show-diff.o: cache.h
 
README:  ded1a3b20e9bbe1f40e487ba5f9361719a1b6b85
VERSION:  c27bd67cd632cc15dd520fbfbf807d482efa2dcf
cache.h:  4d382549041d3281f8d44aa2e52f9f8ec47dd420
--- cache.h
+++ cache.h	2005-04-14 22:35:59.000000000 +1000
@@ -55,6 +55,7 @@
 	unsigned int st_gid;
 	unsigned int st_size;
 	unsigned char sha1[20];
+	unsigned char guid[20];
 	unsigned short namelen;
 	char name[0];
 };
cat-file.c:  45be1badaa8517d4e3a69e0bf1cac2e90191e475
check-files.c:  927b0b9aca742183fc8e7ccd73d73d8d5427e98f
checkout-cache.c:  f06871cdbc1b18ea93bdf4e17126aeb4cca1373e
commit-id:  65c81756c8f10d513d073ecbd741a3244663c4c9
commit-tree.c:  12196c79f31d004dff0df1f50dda67d8204f5568
diff-tree.c:  7dcc9eb7782fa176e27f1677b161ce78ac1d2070
--- diff-tree.c
+++ diff-tree.c	2005-04-16 10:46:52.000000000 +1000
@@ -1,33 +1,144 @@
+#include <sys/param.h>
 #include "cache.h"
 
-static int recursive = 0;
+enum diff_type {
+	REMOVE,
+	ADD,
+	RENAME,
+	MODIFY,
+};
+
+struct guid_cache_entry {
+	enum diff_type diff;
+	unsigned char guid[20];
+	unsigned char sha1[20];
+	struct guid_cache_entry *old;
+	unsigned int mode;
+	unsigned int pathlen;
+	unsigned char path[0];
+};	
+
+struct guid_cache {
+	unsigned int nr;
+	unsigned int alloc;
+	struct guid_cache_entry **cache;
+};
 
-static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base);
+struct guid_cache guid_cache;
+struct guid_cache *cache = &guid_cache;
 
-static void update_tree_entry(void **bufp, unsigned long *sizep)
+int guid_cache_pos(const char *guid)
 {
-	void *buf = *bufp;
-	unsigned long size = *sizep;
-	int len = strlen(buf) + 1 + 20;
+	int first, last;
 
-	if (size < len)
-		die("corrupt tree file");
-	*bufp = buf + len;
-	*sizep = size - len;
+	first = 0;
+	last = cache->nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct guid_cache_entry *gce = cache->cache[next];
+		int cmp = memcmp(guid, gce->guid, 20);
+		if (!cmp)
+			return next;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next + 1;
+	}
+	return - first-1;
 }
 
-static const unsigned char *extract(void *tree, unsigned long size, const char **pathp, unsigned int *modep)
+int add_guid_cache_entry(struct guid_cache_entry *gce)
+{
+	int pos;
+	
+	pos = guid_cache_pos(gce->guid);
+	
+	/* if this is a rename or modify, the guid will show up a
+	 * second time */
+	if (pos >= 0) {
+		struct guid_cache_entry *old = cache->cache[pos];
+		int cmp = cache_name_compare(old->path, old->pathlen, gce->path, gce->pathlen);
+
+		if (!cmp) {
+			/* pathname matches, so this must be a
+			 * modify. */
+			gce->old = old;
+			gce->diff = MODIFY;
+			cache->cache[pos] = gce;
+		} else {
+			/* the pathnames are different, so the file
+			 * must have been renamed somewhere along the
+			 * line.  */
+			gce->old = old;
+			gce->diff = RENAME;
+			cache->cache[pos] = gce;
+		}
+		return 0;
+	}
+	pos = -pos-1;
+
+	if (cache->nr == cache->alloc) {
+		cache->alloc = alloc_nr(cache->alloc);
+		cache->cache = realloc(cache->cache, cache->alloc * sizeof(struct guid_cache_entry *));
+	}
+
+	cache->nr++;
+	if (cache->nr > pos)
+		memmove(cache->cache + pos + 1, cache->cache + pos, (cache->nr - pos - 1) * sizeof(struct guid_cache_entry *));
+	cache->cache[pos] = gce;
+	return 0;
+}
+
+static const unsigned char *extract(void *tree, unsigned long size, const char **pathp, unsigned int *modep, const unsigned char **guid)
 {
 	int len = strlen(tree)+1;
 	const unsigned char *sha1 = tree + len;
 	const char *path = strchr(tree, ' ');
 
-	if (!path || size < len + 20 || sscanf(tree, "%o", modep) != 1)
+	if (!path || size < len + 40 || sscanf(tree, "%o", modep) != 1)
 		die("corrupt tree file");
 	*pathp = path+1;
+	*guid = tree + len + 20;
 	return sha1;
 }
 
+static void guid_cache_tree_entry(void *buf, unsigned int len, const char *base, enum diff_type diff)
+{
+	unsigned mode;
+	const char *path;
+	const unsigned char *guid;
+	const unsigned char *sha1 = extract(buf, len, &path, &mode, &guid);
+	struct guid_cache_entry *gce;
+	int baselen = strlen(base);
+	
+	gce = calloc(1, sizeof(struct guid_cache_entry) + baselen + strlen(path) + 1);
+	memcpy(gce->guid, guid, 20);
+	memcpy(gce->sha1, sha1, 20);
+	gce->diff = diff;
+	gce->mode = mode;
+	gce->pathlen = snprintf(gce->path, MAXPATHLEN, "%s%s", base, path);
+	gce->path[gce->pathlen + 1] = '\0';
+
+	add_guid_cache_entry(gce);
+}
+	
+static int recursive = 0;
+
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base);
+
+static void update_tree_entry(void **bufp, unsigned long *sizep)
+{
+	void *buf = *bufp;
+	unsigned long size = *sizep;
+	int len = strlen(buf) + 1 + 40;
+
+	if (size < len)
+		die("corrupt tree file");
+	*bufp = buf + len;
+	*sizep = size - len;
+}
+
 static char *malloc_base(const char *base, const char *path, int pathlen)
 {
 	int baselen = strlen(base);
@@ -38,23 +149,24 @@
 	return newbase;
 }
 
-static void show_file(const char *prefix, void *tree, unsigned long size, const char *base);
+static void changed_file(void *tree, unsigned long size, const char *base, enum diff_type diff);
 
 /* A whole sub-tree went away or appeared */
-static void show_tree(const char *prefix, void *tree, unsigned long size, const char *base)
+static void changed_tree(void *tree, unsigned long size, const char *base, enum diff_type diff)
 {
 	while (size) {
-		show_file(prefix, tree, size, base);
+		changed_file(tree, size, base, diff);
 		update_tree_entry(&tree, &size);
 	}
 }
 
 /* A file entry went away or appeared */
-static void show_file(const char *prefix, void *tree, unsigned long size, const char *base)
+static void changed_file(void *tree, unsigned long size, const char *base, enum diff_type diff)
 {
 	unsigned mode;
 	const char *path;
-	const unsigned char *sha1 = extract(tree, size, &path, &mode);
+	const unsigned char *guid;
+	const unsigned char *sha1 = extract(tree, size, &path, &mode, &guid);
 
 	if (recursive && S_ISDIR(mode)) {
 		char type[20];
@@ -66,38 +178,96 @@
 		if (!tree || strcmp(type, "tree"))
 			die("corrupt tree sha %s", sha1_to_hex(sha1));
 
-		show_tree(prefix, tree, size, newbase);
+		changed_tree(tree, size, newbase, diff);
 		
 		free(tree);
 		free(newbase);
 		return;
 	}
 
-	printf("%s%o\t%s\t%s\t%s%s%c", prefix, mode,
-	       S_ISDIR(mode) ? "tree" : "blob",
-	       sha1_to_hex(sha1), base, path, 0);
+	guid_cache_tree_entry(tree, size, base, diff);
 }
 
+static void show_one_file(struct guid_cache_entry *gce)
+{
+	struct guid_cache_entry *old;
+	char old_sha1[50];
+	char old_sha2[50];
+
+	switch(gce->diff) {
+	case REMOVE:
+		sprintf(old_sha1, "%s", sha1_to_hex(gce->sha1));
+		printf("-%o\t%s\t%s\t%s\t%s%c", gce->mode,
+		       S_ISDIR(gce->mode) ? "tree" : "blob",
+		       old_sha1, sha1_to_hex(gce->guid), gce->path, 0);
+		break;
+	case ADD:
+		sprintf(old_sha1, "%s", sha1_to_hex(gce->sha1));
+		printf("+%o\t%s\t%s\t%s\t%s%c", gce->mode,
+		       S_ISDIR(gce->mode) ? "tree" : "blob",
+		       old_sha1, sha1_to_hex(gce->guid), gce->path, 0);
+		break;
+	case MODIFY:
+		old = gce->old;
+		if (old) {
+			sprintf(old_sha1, "%s", sha1_to_hex(old->sha1));
+			sprintf(old_sha2, "%s", sha1_to_hex(gce->sha1));
+			
+			printf("*%o->%o\t%s\t%s->%s\t%s\t%s%c", old->mode, gce->mode,
+			       S_ISDIR(old->mode) ? "tree" : "blob",
+			       old_sha1, old_sha2, sha1_to_hex(gce->guid), gce->path, 0);
+		} else {
+			die("diff-tree: internal error");
+		}
+		break;
+	case RENAME:
+		old = gce->old;
+		if (old) {
+			sprintf(old_sha1, "%s", sha1_to_hex(gce->sha1));
+			sprintf(old_sha2, "%s", sha1_to_hex(old->sha1));
+			
+			printf("r%o->%o\t%s\t%s->%s\t%s\t%s%c", gce->mode, old->mode,
+			       S_ISDIR(old->mode) ? "tree" : "blob",
+			       old_sha1, old_sha2, sha1_to_hex(old->guid), old->path, 0);
+		} else {
+			die("diff-tree: internal error");
+		}
+		break;
+	default:
+		die("diff-tree: internal error");
+	}
+}
+
+/* simply iterate over both caches looking for matching guids,
+ * showing all files in both caches */
+static void show_cache(void)
+{
+	int i;
+
+	for (i = 0; i < cache->nr; i++)
+		show_one_file(cache->cache[i]);
+}
+	
 static int compare_tree_entry(void *tree1, unsigned long size1, void *tree2, unsigned long size2, const char *base)
 {
 	unsigned mode1, mode2;
 	const char *path1, *path2;
 	const unsigned char *sha1, *sha2;
+	const unsigned char *guid1, *guid2;
 	int cmp, pathlen1, pathlen2;
-	char old_sha1_hex[50];
 
-	sha1 = extract(tree1, size1, &path1, &mode1);
-	sha2 = extract(tree2, size2, &path2, &mode2);
+	sha1 = extract(tree1, size1, &path1, &mode1, &guid1);
+	sha2 = extract(tree2, size2, &path2, &mode2, &guid2);
 
 	pathlen1 = strlen(path1);
 	pathlen2 = strlen(path2);
 	cmp = cache_name_compare(path1, pathlen1, path2, pathlen2);
 	if (cmp < 0) {
-		show_file("-", tree1, size1, base);
+		changed_file(tree1, size1, base, REMOVE);
 		return -1;
 	}
 	if (cmp > 0) {
-		show_file("+", tree2, size2, base);
+		changed_file(tree2, size2, base, ADD);
 		return 1;
 	}
 	if (!memcmp(sha1, sha2, 20) && mode1 == mode2)
@@ -108,8 +278,8 @@
 	 * file, we need to consider it a remove and an add.
 	 */
 	if (S_ISDIR(mode1) != S_ISDIR(mode2)) {
-		show_file("-", tree1, size1, base);
-		show_file("+", tree2, size2, base);
+		changed_file(tree1, size1, base, REMOVE);
+		changed_file(tree2, size2, base, ADD);
 		return 0;
 	}
 
@@ -121,10 +291,14 @@
 		return retval;
 	}
 
-	strcpy(old_sha1_hex, sha1_to_hex(sha1));
-	printf("*%o->%o\t%s\t%s->%s\t%s%s%c", mode1, mode2,
-	       S_ISDIR(mode1) ? "tree" : "blob",
-	       old_sha1_hex, sha1_to_hex(sha2), base, path1, 0);
+	if (!memcmp(guid1, guid2, 20)) {
+		changed_file(tree1, size1, base, MODIFY);
+		changed_file(tree2, size2, base, MODIFY);
+		return 0;
+	}
+	
+	changed_file(tree1, size1, base, REMOVE);
+	changed_file(tree2, size2, base, ADD);
 	return 0;
 }
 
@@ -132,12 +306,12 @@
 {
 	while (size1 | size2) {
 		if (!size1) {
-			show_file("+", tree2, size2, base);
+			changed_file(tree2, size2, base, ADD);
 			update_tree_entry(&tree2, &size2);
 			continue;
 		}
 		if (!size2) {
-			show_file("-", tree1, size1, base);
+			changed_file(tree1, size1, base, REMOVE);
 			update_tree_entry(&tree1, &size1);
 			continue;
 		}
@@ -179,6 +353,7 @@
 int main(int argc, char **argv)
 {
 	unsigned char old[20], new[20];
+	int retval;
 
 	while (argc > 3) {
 		char *arg = argv[1];
@@ -193,5 +368,7 @@
 
 	if (argc != 3 || get_sha1_hex(argv[1], old) || get_sha1_hex(argv[2], new))
 		usage("diff-tree <tree sha1> <tree sha1>");
-	return diff_tree_sha1(old, new, "");
+	retval =  diff_tree_sha1(old, new, "");
+	show_cache();
+	return retval;
 }
fsck-cache.c:  9c900fe458cecd2bdb4c4571a584115b5cf24f22
--- fsck-cache.c
+++ fsck-cache.c	2005-04-15 20:39:49.000000000 +1000
@@ -165,9 +165,10 @@
 	while (size) {
 		int len = 1+strlen(data);
 		unsigned char *file_sha1 = data + len;
+		unsigned char *guid = file_sha1 + 20;
 		char *path = strchr(data, ' ');
 		unsigned int mode;
-		if (size < len + 20 || !path || sscanf(data, "%o", &mode) != 1)
+		if (size < len + 40 || !path || sscanf(data, "%o", &mode) != 1)
 			return -1;
 
 		/* Warn about trees that don't do the recursive thing.. */
@@ -176,8 +177,8 @@
 			warn_old_tree = 0;
 		}
 
-		data += len + 20;
-		size -= len + 20;
+		data += len + 40;
+		size -= len + 40;
 		mark_needs_sha1(sha1, S_ISDIR(mode) ? "tree" : "blob", file_sha1);
 	}
 	return 0;
git:  2c557dcf2032325acc265b577ee104e605fdaede
gitXnormid.sh:  a5d7a9f4a6e8d4860f35f69500965c2a493d80de
gitadd.sh:  3ed93ea0fcb995673ba9ee1982e0e7abdbe35982
gitaddremote.sh:  bf1f28823da5b5270aa8fa05b321faa514a57a11
gitapply.sh:  d0e3c46e2ce1ee74e1a87ee6137955fa9b35c27b
gitcancel.sh:  ec58f7444a42cd3cbaae919fc68c70a3866420c0
gitcommit.sh:  3629f67bbd3f171d091552814908b67af7537f4d
gitdiff-do:  d6174abceab34d22010c36a8453a6c3f3f184fe0
gitdiff.sh:  5e47c4779d73c3f2f39f6be714c0145175933197
gitexport.sh:  dad00bf251b38ce522c593ea9631f842d8ccc934
gitlntree.sh:  17c4966ea64aeced96ae4f1b00f3775c1904b0f1
gitlog.sh:  177c6d12dd9fa4b4920b08451ffe4badde544a39
gitls.sh:  b6f15d82f16c1e9982c5031f3be22eb5430273af
gitlsobj.sh:  128461d3de6a42cfaaa989fc6401bebdfa885b3f
gitmerge.sh:  23e4a3ff342c6005928ceea598a2f52de6fb9817
gitpull.sh:  0883898dda579e3fa44944b7b1d909257f6dc63e
gitrm.sh:  5c18c38a890c9fd9ad2b866ee7b529539d2f3f8f
gittag.sh:  c8cb31385d5a9622e95a4e0b2d6a4198038a659c
gittrack.sh:  03d6db1fb3a70605ef249c632c04e542457f0808
init-db.c:  aa00fbb1b95624f6c30090a17354c9c08a6ac596
ls-tree.c:  3e2a6c7d183a42e41f1073dfec6794e8f8a5e75c
--- ls-tree.c
+++ ls-tree.c	2005-04-15 15:55:40.000000000 +1000
@@ -10,6 +10,7 @@
 	void *buffer;
 	unsigned long size;
 	char type[20];
+	char old_sha1[50];
 
 	buffer = read_sha1_file(sha1, type, &size);
 	if (!buffer)
@@ -19,19 +20,21 @@
 	while (size) {
 		int len = strlen(buffer)+1;
 		unsigned char *sha1 = buffer + len;
+		unsigned char *guid = buffer + len + 20;
 		char *path = strchr(buffer, ' ')+1;
 		unsigned int mode;
 		unsigned char *type;
 
-		if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
+		if (size < len + 40 || sscanf(buffer, "%o", &mode) != 1)
 			die("corrupt 'tree' file");
-		buffer = sha1 + 20;
-		size -= len + 20;
+		buffer = sha1 + 40;
+		size -= len + 40;
 		/* XXX: We do some ugly mode heuristics here.
 		 * It seems not worth it to read each file just to get this
 		 * and the file size. -- pasky@ucw.cz */
 		type = S_ISDIR(mode) ? "tree" : "blob";
-		printf("%03o\t%s\t%s\t%s\n", mode, type, sha1_to_hex(sha1), path);
+		sprintf(old_sha1, sha1_to_hex(guid));
+		printf("%03o\t%s\t%s\t%s\t%s\n", mode, type, sha1_to_hex(sha1), old_sha1, path);
 	}
 	return 0;
 }
parent-id:  1801c6fe426592832e7250f8b760fb9d2e65220f
read-cache.c:  7a6ae8b9b489f6b67c82e065dedd5716a6bfc0ef
--- read-cache.c
+++ read-cache.c	2005-04-16 10:52:51.000000000 +1000
@@ -4,6 +4,8 @@
  * Copyright (C) Linus Torvalds, 2005
  */
 #include <stdarg.h>
+#include <time.h>
+#include <sys/param.h>
 #include "cache.h"
 
 const char *sha1_file_directory = NULL;
@@ -233,6 +235,22 @@
 	return 0;
 }
 
+void new_guid(const char *filename, int namelen, unsigned char *returnguid)
+{
+	size_t size;
+	time_t now = time(NULL);
+	char buf[MAXPATHLEN + 20];
+	unsigned char guid[20];
+	
+	size = snprintf(buf, MAXPATHLEN + 20, "%ld%s", now, filename) + 1;
+	
+	SHA1(buf, size, guid);
+
+	if (returnguid)
+		memcpy(returnguid, guid, 20);
+	return;
+}
+
 static inline int collision_check(char *filename, void *buf, unsigned int size)
 {
 #ifdef COLLISION_CHECK
@@ -363,11 +381,14 @@
 int add_cache_entry(struct cache_entry *ce, int ok_to_add)
 {
 	int pos;
+	unsigned char guid[20];
 
 	pos = cache_name_pos(ce->name, ce->namelen);
 
 	/* existing match? Just replace it */
 	if (pos >= 0) {
+		struct cache_entry *old_ce = active_cache[pos];
+		memcpy(ce->guid, old_ce->guid, 20);
 		active_cache[pos] = ce;
 		return 0;
 	}
@@ -376,6 +397,12 @@
 	if (!ok_to_add)
 		return -1;
 
+	memset(guid, 0, 20);
+	if (!memcmp(ce->guid, guid, 20)) {
+		new_guid(ce->name, ce->namelen, guid);
+		memcpy(ce->guid, guid, 20);
+	}
+
 	/* Make sure the array is big enough .. */
 	if (active_nr == active_alloc) {
 		active_alloc = alloc_nr(active_alloc);
read-tree.c:  eb548148aa6d212f05c2c622ffbe62a06cd072f9
--- read-tree.c
+++ read-tree.c	2005-04-16 10:41:46.000000000 +1000
@@ -5,7 +5,9 @@
  */
 #include "cache.h"
 
-static int read_one_entry(unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode)
+static int read_one_entry(unsigned char *sha1, unsigned char *guid, 
+			  const char *base, int baselen, 
+			  const char *pathname, unsigned mode)
 {
 	int len = strlen(pathname);
 	unsigned int size = cache_entry_size(baselen + len);
@@ -18,6 +20,7 @@
 	memcpy(ce->name, base, baselen);
 	memcpy(ce->name + baselen, pathname, len+1);
 	memcpy(ce->sha1, sha1, 20);
+	memcpy(ce->guid, guid, 20);
 	return add_cache_entry(ce, 1);
 }
 
@@ -35,14 +38,15 @@
 	while (size) {
 		int len = strlen(buffer)+1;
 		unsigned char *sha1 = buffer + len;
+		unsigned char *guid = buffer + len + 20;
 		char *path = strchr(buffer, ' ')+1;
 		unsigned int mode;
-
-		if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
+		
+		if (size < len + 40 || sscanf(buffer, "%o", &mode) != 1)
 			return -1;
 
-		buffer = sha1 + 20;
-		size -= len + 20;
+		buffer = sha1 + 40;
+		size -= len + 40;
 
 		if (S_ISDIR(mode)) {
 			int retval;
@@ -57,7 +61,7 @@
 				return -1;
 			continue;
 		}
-		if (read_one_entry(sha1, base, baselen, path, mode) < 0)
+		if (read_one_entry(sha1, guid, base, baselen, path, mode) < 0)
 			return -1;
 	}
 	return 0;
rev-tree.c:  395b0b3bfadb0537ae0c62744b25ead4b487f3f6
show-diff.c:  a531ca4078525d1c8dcf84aae0bfa89fed6e5d96
show-files.c:  a9fa6767a418f870a34b39379f417bf37b17ee18
tree-id:  cb70e2c508a18107abe305633612ed702aa3ee4f
update-cache.c:  62d0a6c41560d40863c44599355af10d9e089312
write-tree.c:  1534477c91169ebddcf953e3f4d2872495477f6b
--- write-tree.c
+++ write-tree.c	2005-04-15 13:46:05.000000000 +1000
@@ -47,6 +47,7 @@
 		const char *pathname = ce->name, *filename, *dirname;
 		int pathlen = ce->namelen, entrylen;
 		unsigned char *sha1;
+		unsigned char *guid;
 		unsigned int mode;
 
 		/* Did we hit the end of the directory? Return how many we wrote */
@@ -54,6 +55,7 @@
 			break;
 
 		sha1 = ce->sha1;
+		guid = ce->guid;
 		mode = ce->st_mode;
 
 		/* Do we have _further_ subdirectories? */
@@ -86,6 +88,8 @@
 		buffer[offset++] = 0;
 		memcpy(buffer + offset, sha1, 20);
 		offset += 20;
+		memcpy(buffer + offset, guid, 20);
+		offset += 20;
 		nr++;
 	} while (nr < maxentries);
 

[-- Attachment #1.3: rename-file.c --]
[-- Type: text/x-csrc, Size: 1703 bytes --]

/*
 * rename files in a git repository, keeping the guid.
 * 
 * Copyright Simon Fowler <simon@dreamcraft.com.au>, 2005.
 */
#include <unistd.h>
#include <sys/stat.h>
#include <errno.h>
#include "cache.h"

static int remove_lock = 0;

static void remove_lock_file(void)
{
	if (remove_lock)
		unlink(".git/index.lock");
}

int main(int argc, char *argv[])
{
	struct stat stats;
	struct cache_entry *ce, *new;
	int newfd, entries, pos, pos2;

	if (argc != 3)
		usage("rename-file <old> <new>");
	if (stat(argv[1], &stats)) {
		perror("rename-file: ");
		exit(1);
	}
	if (!stat(argv[2], &stats))
		die("rename-file: destination file already exists");
	
	newfd = open(".git/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600);
	if (newfd < 0)
		die("unable to create new cachefile");

	atexit(remove_lock_file);
	remove_lock = 1;

	entries = read_cache();
	if (entries < 0)
		die("cache corrupted");

	pos = cache_name_pos(argv[1], strlen(argv[1]));
	pos2 = cache_name_pos(argv[2], strlen(argv[2]));
		
	if (pos < 0) 
		die("original file not in cache");
	if (pos2 >= 0)
		die("destination file already in cache");
	ce = active_cache[pos];
	new = malloc(sizeof(struct cache_entry) + strlen(argv[2]) + 1);
	memcpy(new, ce, sizeof(struct cache_entry));
	new->namelen = strlen(argv[2]);
	memcpy(new->name, argv[2], new->namelen);
	
	if (rename(argv[1], argv[2])) {
		perror("rename-file: ");
		exit(1);
	}

	remove_file_from_cache(argv[1]);
	add_cache_entry(new, 1);

	if (write_cache(newfd, active_cache, active_nr) ||
	    rename(".git/index.lock", ".git/index"))
		die("Unable to write new cachefile");
      
	remove_lock = 0;
	return 0;
}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: write-tree is pasky-0.4
  2005-04-16  1:13                                                 ` Daniel Barkalow
@ 2005-04-16  2:18                                                   ` Linus Torvalds
  2005-04-16  2:49                                                     ` Daniel Barkalow
  0 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16  2:18 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Petr Baudis, Junio C Hamano, git



On Fri, 15 Apr 2005, Daniel Barkalow wrote:
> 
> Is there some reason you don't commit before merging? All of the current
> merge theory seems to want to merge two commits, using the information git
> keeps about them.

Note that the 3-way merge would _only_ merge the committed state. The 
thing is, 99% of all merges end up touching files that I never touch 
myself (ie other architectures), so me being able to merge them even when 
_I_ am in the middle of something is a good thing.

So even when I have dirty state, the "merge" would only merge the clean
state. And then before the merge information is put back into my working
directory, I'd do a "check-files" on the result, making sure that nothing
that got changed by the merge isn't up-to-date.

> How much do you care about the situation where there is no best common
> ancestor

I care. Even if the best common parent is 3 months ago, I care. I'd much 
rather get a big explicit conflict than a "clean merge" that ends up being 
debatable because people played games with per-file merging or something 
questionable like that.

> I think that the time spent on I/O will be overwhelmed by the time spent
> issuing the command at that rate.

There is no time at all spent on IO.

All my email is local, and if this all ends up working out well, I can 
track the other peoples object trees in local subdirectories with some 
daily rsyncs. And I have enough memory in my machines that there is 
basically no disk IO - the only tree I normally touch is the kernel trees, 
they all stay in cache.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: write-tree is pasky-0.4
  2005-04-16  2:18                                                   ` Linus Torvalds
@ 2005-04-16  2:49                                                     ` Daniel Barkalow
  2005-04-16  3:13                                                       ` Linus Torvalds
  0 siblings, 1 reply; 148+ messages in thread
From: Daniel Barkalow @ 2005-04-16  2:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git

On Fri, 15 Apr 2005, Linus Torvalds wrote:

> On Fri, 15 Apr 2005, Daniel Barkalow wrote:
> > 
> > Is there some reason you don't commit before merging? All of the current
> > merge theory seems to want to merge two commits, using the information git
> > keeps about them.
> 
> Note that the 3-way merge would _only_ merge the committed state. The 
> thing is, 99% of all merges end up touching files that I never touch 
> myself (ie other architectures), so me being able to merge them even when 
> _I_ am in the middle of something is a good thing.
> 
> So even when I have dirty state, the "merge" would only merge the clean
> state. And then before the merge information is put back into my working
> directory, I'd do a "check-files" on the result, making sure that nothing
> that got changed by the merge isn't up-to-date.

So you want to merge someone else's tree into your committed state, and
then merge the result with your working directory to get the working
directory you continue with, provided that the second merge is trivial?

> > How much do you care about the situation where there is no best common
> > ancestor
> 
> I care. Even if the best common parent is 3 months ago, I care. I'd much 
> rather get a big explicit conflict than a "clean merge" that ends up being 
> debatable because people played games with per-file merging or something 
> questionable like that.

Are you thinking that the best common ancestor is the one that ties up
absolutely all of the chains of commits, or the closest one that the sides
have in common? I have the feeling that the former isn't going to be
useful, because there will be lines you're considering merging which go
back to ancient kernels, where they keep merging in your changes, but
they still have a lineage back to 2.6.0 or something.

For the latter, there are sometimes multiple ancestors which fit this
criterion, and different ones of them are most helpful for different
portions of the merge. I think this primarily happens when a branch you
want to merge has accepted multiple patches that you've also
accepted (and the history identifies this fact); this may or may not be a
situation you want to allow on a regular basis.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: write-tree is pasky-0.4
  2005-04-16  2:49                                                     ` Daniel Barkalow
@ 2005-04-16  3:13                                                       ` Linus Torvalds
  2005-04-16  3:56                                                         ` Daniel Barkalow
  2005-04-16  6:59                                                         ` Paul Jackson
  0 siblings, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16  3:13 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Petr Baudis, Junio C Hamano, git



On Fri, 15 Apr 2005, Daniel Barkalow wrote:
> 
> So you want to merge someone else's tree into your committed state, and
> then merge the result with your working directory to get the working
> directory you continue with, provided that the second merge is trivial?

No, you don't even "merge" the working directory.

The low-level tools should entirely ignore the working directory. To a
low-level merge, the working directory doesn't even exist. It just gets
three commits (or trees) and merges two of them with the third as a
parent, and does all of it in it's own temporary "merge working
directory".

So on a technical level, the "plumbing" part really really doesn't care at 
all.

However, from a _usability_ part, you expect after a merge that your 
working directory has been updated to be the merged tree. And that's where 
the "if I have a working tree that is dirty, I want that part to fail" 
comes in. In other words, the final phase (after the "tree-merge" has 
actually successfully already finished) is to go back to the working 
directory, and check out the merged results.

But that checkout would be a variation on "checkout-cache -a" which first
checks that none of the files it is going to overwrite are dirty.

Don't worry about this part. It's really totally separate from the true
merge itself. The "real work" has already been done by the time we notice
that "oops, we can't actually show him the newly merged tree, because he
has got dirty data where we want to show it".

> > I care. Even if the best common parent is 3 months ago, I care. I'd much 
> > rather get a big explicit conflict than a "clean merge" that ends up being 
> > debatable because people played games with per-file merging or something 
> > questionable like that.
> 
> Are you thinking that the best common ancestor is the one that ties up
> absolutely all of the chains of commits, or the closest one that the sides
> have in common?

The closest common one.

> For the latter, there are sometimes multiple ancestors which fit this
> criterion

Yes. Let's just pick one at random (or more likely, the latest one by 
date - let's not actually be _random_ random) at first. 

There are other heuristics we can try, ie if it turns out that it's common
to have a couple of alternatives (but no more than some small number, say
five or so), we can literally just -try- to do a tree-only merge, and see
how many lines out common output you get from "diff-tree".

Because that "how mnay files do we need to merge" is the number you want
to minimize, and doing a couple of extra "diff-tree" + "join"  operations
should be so fast that nobody will notice that we actually tried five
different merges to see which one looked the best.

But hey, especially if the merge fails with real clashes (ie there are
changes in common and running "merge" leaves conflicts), and there were
other alternate parents to choose, there's nothing wrong with just
printing them out and saying "you might try to specify one of these
manually".

I really don't think we should worry too much about this until we've 
actually used the system for a while and seen what it does. So just start 
with "nearest common parent with most recent date". Which I think you 
already implemented, no?

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: write-tree is pasky-0.4
  2005-04-16  3:13                                                       ` Linus Torvalds
@ 2005-04-16  3:56                                                         ` Daniel Barkalow
  2005-04-16  6:59                                                         ` Paul Jackson
  1 sibling, 0 replies; 148+ messages in thread
From: Daniel Barkalow @ 2005-04-16  3:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git

On Fri, 15 Apr 2005, Linus Torvalds wrote:

> On Fri, 15 Apr 2005, Daniel Barkalow wrote:
> > 
> > So you want to merge someone else's tree into your committed state, and
> > then merge the result with your working directory to get the working
> > directory you continue with, provided that the second merge is trivial?
> 
> No, you don't even "merge" the working directory.
> 
> The low-level tools should entirely ignore the working directory. To a
> low-level merge, the working directory doesn't even exist. It just gets
> three commits (or trees) and merges two of them with the third as a
> parent, and does all of it in it's own temporary "merge working
> directory".

It seems like users won't expect there to be a new working directory for
the merge in which they are supposed to resolve te conflicts, but where
they don't see their uncommited changes. In any case, the low-level tools
have to care about *some* working directory, even if it isn't the parent
of .git, and the parent of .git seems like where other similar things
happen. If we're being conservative about merging, we're likely to report
a lot of conflicts, at least until we work out better techniques than a
simple 3-way merge.

> > For the latter, there are sometimes multiple ancestors which fit this
> > criterion
> 
> Yes. Let's just pick one at random (or more likely, the latest one by 
> date - let's not actually be _random_ random) at first. 

Okay; I've currently got the one where the number of generations it is
away from the further head is the smallest, and of equal ones, an
arbitrary choice. If people are generally similar in the amount they
diverge before commiting, this should be the most similar ancestor.

> There are other heuristics we can try, ie if it turns out that it's common
> to have a couple of alternatives (but no more than some small number, say
> five or so), we can literally just -try- to do a tree-only merge, and see
> how many lines out common output you get from "diff-tree".
> 
> Because that "how mnay files do we need to merge" is the number you want
> to minimize, and doing a couple of extra "diff-tree" + "join"  operations
> should be so fast that nobody will notice that we actually tried five
> different merges to see which one looked the best.
> 
> But hey, especially if the merge fails with real clashes (ie there are
> changes in common and running "merge" leaves conflicts), and there were
> other alternate parents to choose, there's nothing wrong with just
> printing them out and saying "you might try to specify one of these
> manually".

I think we should be able to get good results out of doing the 5 merges
and reporting a conflict only if there's a conflict in all of them; it
shouldn't be possible for two to succeed but give different results (if it
did, clearly our current algorithm is unsafe, since it would give some
undesired output if it happened to use the wrong ancestor).

I'm thinking of not actually calling "merge(1)" for this at all; it just
calls diff3, and diff3 is only 1745 lines including option parsing. We can
probably arrange to look around for better ancestors in case of conflicts
we'd otherwise have to report, and get this all tidy and more efficient
than having diff3 re-read files. And if we only go to other ancestors in
case of conflicts, we're going to be a lot faster total than getting a
reaction from the user, almost no matter what we do.

> I really don't think we should worry too much about this until we've 
> actually used the system for a while and seen what it does. So just start 
> with "nearest common parent with most recent date". Which I think you 
> already implemented, no?

I've got something like that (see above); did you want it in some form
other than the patch I sent you?

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/2] merge-trees script for Linus git
  2005-04-16  1:02                                               ` Linus Torvalds
@ 2005-04-16  4:10                                                 ` Junio C Hamano
  2005-04-16  5:02                                                   ` Linus Torvalds
  0 siblings, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-16  4:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Just a heads-up - I'd really want to do the same thing to "merge-tree.c" 
LT> too, but since you said that you were working on extending that to do 
LT> recursion etc, I decided to hold off. So if you're working on it, maybe 
LT> you can add the "-z" flag there too? 

Sent as a separate patch already.

LT> I'm actually holding off merging the perl version exactly because you 
LT> seemed to be working on the C version. I don't mind perl per se, but if 
LT> there's a real solution coming down the line..

I'd take the hint, but I would say the current Perl version
would be far more usable than the C version I would come up with
by the end of this weekend because:

 - the Perl version creates a new temporary directory and leaves
   a ready-to-use dircache there---the only thing needed from
   that point for you is to fix it up any conflicts and do
   update-cache on that dircache.  In that sense it is already
   usable (Linus-usable, but probably not Pasky-usable due to
   differences in phylosophy).

 - the enhancement I am planning on the C version does not do
   the real work itself, as you have originally written (the
   workings and the output from it are outlined in [*R1*]).
   Somebody has to write the executor part that does read-tree
   the base, update-cache --cacheinfo --add for the selects,
   runs 3-way merge on conflicting files and runs update-cache
   for the merges, update-cache --remove for the deletes, before
   it matches the usability of the Perl version.  I do not
   expect to have enough time this weekend to finish this.

I know of one case in Perl version I need to see if it does the
right thing but other than that it would be far better than the
C version I'm toying with.

Just to let you know, here is the plan I have for my part.

 1. I am currently writing some test cases.  The plan is first
    to make sure the Perl version works OK with the test cases
    to flush initial problems out.

 2. After that I'll see if a dumb but recursive C version I
    already have spits out the right instructions.  This step is
    to make sure that the test cases are sane, and by making
    that sure, we will be able to say that we have something
    usable in extremely short run (i.e. the Perl version) after
    this step.

 3. After that is done, I'll add the fourth argument to
    merge-tree.c to specify the base so that it can cut down 90%
    of trivial selects.  Only after this happens the executioner
    script would be useful performance wise.

[References]

*R1* <7vr7hbhky9.fsf@assigned-by-dhcp.cox.net>


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/2] merge-trees script for Linus git
  2005-04-16  4:10                                                 ` Junio C Hamano
@ 2005-04-16  5:02                                                   ` Linus Torvalds
  2005-04-16  6:26                                                     ` Linus Torvalds
  2005-04-16  8:12                                                     ` Junio C Hamano
  0 siblings, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16  5:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Fri, 15 Apr 2005, Junio C Hamano wrote:
> 
> I'd take the hint, but I would say the current Perl version
> would be far more usable than the C version I would come up with
> by the end of this weekend because:

Actually, it turns out that I have a cunning plan.

I'm full of cunning plans, in fact. It turns out that I can do merges even
more simply, if I just allow the notion of "state" into an index entry,
and allow multiple index entries with the same name as long as they differ
in "state".

And that means that I can do all the merging in the regular index tree, 
using very simple rules.

Let's see how that works out. I'm writing the code now.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/2] merge-trees script for Linus git
  2005-04-16  5:02                                                   ` Linus Torvalds
@ 2005-04-16  6:26                                                     ` Linus Torvalds
  2005-04-16  8:12                                                     ` Junio C Hamano
  1 sibling, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16  6:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Fri, 15 Apr 2005, Linus Torvalds wrote:
> 
> Actually, it turns out that I have a cunning plan.

Damn, my cunning plan is some good stuff. 

Or maybe it is _so_ cunning that I just confuse even myself. But it looks 
like it is actually working, and that it allows pretty much instantaenous 
merges.

The plan goes like this:

 - each "index" entry has two bits worth of "stage" state. stage 0 is the 
   normal one, and is the only one you'd see in any kind of normal use.

 - however, when you do "read-tree" with multiple trees, the "stage" 
   starts out at 0, but increments for each tree you read. And in 
   particular, the old "-m" flag (which used to be "merge with old state")  
   has a new meaning: it now means "start at stage 1" instead.

 - this means that you can do

	read-tree -m <tree1> <tree2> <tree3>

   and you will end up with an index with all of the <tree1> entries in 
   "stage1", all of the <tree2> entries in "stage2" and all of the <tree3>
   entries in "stage3".

 - furthermore, "read-tree" has this special-case logic that says: if you 
   see a file that matches in all respects in all three states, it 
   "collapses" back to "stage0".

 - write-tree refuses to write a nonsensical tree, so write-tree will 
   complain about unmerged entries if it sees a single entry that is not
   stage 0".

Ok, this all sounds like a collection of totally nonsensical rules, but
it's actually exactly what you want in order to do a fast merge. The 
differnt stages represent the "result tree" (stage 0, aka "merged"), the 
original tree (stage 1, aka "orig"), and the two trees you are trying to 
merge (stage 2 and 3 respectively).

In fact, the way "read-tree" works, it's entirely agnostic about how you
assign the stages, and you could really assign them any which way, and the
above is just a suggested way to do it (except since "write-tree" refuses
to write anything but stage0 entries, it makes sense to always consider
stage 0 to be the "full merge" state).

So what happens? Try it out. Select the original tree, and two trees to 
merge, and look how it works:

 - if a file exists in identical format in all three trees, it will 
   automatically collapse to "merged" state by the new read-tree.

 - a file that has _any_ difference what-so-ever in the three trees will 
   stay as separate entries in the index. It's up to "script policy" to 
   determine how to remove the non-0 stages, and insert a merged version. 
   But since the index is always sorted, they're easy to find: they'll be
   clustered together.

 - the index file saves and restores with all this information, so you can 
   merge things incrementally, but as long as it has entries in stages
   1/2/3 (ie "unmerged entries") you can't write the result.

So now the merge algorithm ends up being really simple:

 - you walk the index in order, and ignore all entries of stage 0, since 
   they've already been done.
 - if you find a "stage1", but no matching "stage2" or "stage3", you know 
   it's been removed from both trees (it only existed in the original 
   tree), and you remove that entry.
 - if you find a matching "stage2" and "stage3" tree, you remove one of 
   them, and turn the other into a "stage0" entry. Remove any matching
   "stage1" entry if it exists too.
  .. all the normal trivial rules ..

NOTE NOTE NOTE! I could make "read-tree" do some of these nontrivial 
merges, but I ended up deciding that only the "matches in all three 
states" thing collapses by default. Why? Because even though there are 
other trivial cases ("matches in both merge trees but not in the original 
one"), those cases might actually be interesting for the merge logic to 
know about, so I thought I'd leave all that information around. I expect 
it to be fairly rare anyway, so writing out a few extra index entries to 
disk so that others can decide to annotate the merge a bit more sounded 
like a fair deal.

I should make "ls-files" have a "-l" format, which shows the index and the 
mode for each file too. Right now it's very hard to see what the contents 
of the index is. But all my tests seem to say that not only does this 
work, it's pretty efficient too. And it's dead _simple_, thanks to having 
all the merge information in just one place, the same index we always use 
anyway.

Btw, it also means that you don't even have to have a separate 
subdirectory for this. All the information literally is in the index file, 
which is a temporary thing anyway. We don't need to worry about what is in 
the working directory, since we'll never show it, and we'll never need to 
use it.

Damn, I'm good.

(On the other hand, it is Friday evening at 11PM, and I'm sitting in front
of the computer. I'm a sad case. I will now go take a beer, and relax. I
think this is another of my "Really Good Ideas" (tm), and is worth the
beer.  This "feels" right).

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: write-tree is pasky-0.4
  2005-04-16  3:13                                                       ` Linus Torvalds
  2005-04-16  3:56                                                         ` Daniel Barkalow
@ 2005-04-16  6:59                                                         ` Paul Jackson
  1 sibling, 0 replies; 148+ messages in thread
From: Paul Jackson @ 2005-04-16  6:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: barkalow, pasky, junkio, git

One trick I've used to separate good automatic merges from ones that
need human interaction is to run both the 'patch' and 'merge' commands,
which use different approaches to determining the result.

If they agree, take it.  To apply the changes between file1 and file2
to filez:

	diff -au file1 file2 | patch -f filez
	merge -q filez file1 file2

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/2] merge-trees script for Linus git
  2005-04-16  5:02                                                   ` Linus Torvalds
  2005-04-16  6:26                                                     ` Linus Torvalds
@ 2005-04-16  8:12                                                     ` Junio C Hamano
  2005-04-16  9:27                                                       ` [PATCH] Byteorder fix for read-tree, new -m semantics version Junio C Hamano
                                                                         ` (3 more replies)
  1 sibling, 4 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-16  8:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Damn, my cunning plan is some good stuff. 

I really like this a lot.  It is *so* *simple*, clear, flexible
and an example of elegance.  This is one of the things I would
happily say "Sheeeeeeeeeeeeeesh!  Why didn't *I* think of *THAT*
first!!!" to.

LT> NOTE NOTE NOTE! I could make "read-tree" do some of these nontrivial 
LT> merges, but I ended up deciding that only the "matches in all three 
LT> states" thing collapses by default.

 * Understood and agreed.

LT> Damn, I'm good.

 * Agreed ;-). Wholeheartedly.

So what's next?  Certainly I'd immediately drop (and I would
imagine you would as well) both C or Perl version of
merge-tree(s).

The userland merge policies need ways to extract the stage
information and manipulate them.  Am I correct to say that you
mean by "ls-files -l" the extracting part?

LT> I should make "ls-files" have a "-l" format, which shows the
LT> index and the mode for each file too.

You probably meant "ls-tree".  You used the word "mode" but it
already shows the mode so I take it to mean "stage".  Perhaps
something like this?

$ ls-tree -l -r 49c200191ba2e3cd61978672a59c90e392f54b8b
100644	blob	fe2a4177a760fd110e78788734f167bd633be8de	COPYING
100644	blob	b39b4ea37586693dd707d1d0750a9b580350ec50:1	man/frotz.6
100644	blob	b39b4ea37586693dd707d1d0750a9b580350ec50:2	man/frotz.6
100664	blob	eeed997e557fb079f38961354473113ca0d0b115:3	man/frotz.6
 ...

The above example shows that COPYING has merged successfully,
and O and A have the same contents and B has something different
at man/frotz.6.

Assuming that you would be working on that, I'd like to take the
dircache manipulation part.  Let's think about the minimally
necessary set of operations:

 * The merge policy decides to take one of the existing stage.

   In this case we need a way to register a known mode/sha1 at a
   path.  We already have this as "update-cache --cacheinfo".
   We just need to make sure that when "update-cache" puts
   things at stage 0 it clears other stages as well.

 * The merge policy comes up with a desired blob somewhere on
   the filesystem (perhaps by running an external merge
   program).  It wants to register it as the result of the
   merge.

   We could do this today by first storing the "desired blob"
   in a temporary file somewhere in the path the dircache
   controls, "update-cache --add" the temporary file, ls-tree to
   find its mode/sha1, "update-cache --remove" the temporary
   file and finally "update-cache --cacheinfo" the mode/sha1.
   This is workable but clumsy.  How about:

   $ update-cache --graft [--add] desired-blob path

   to say "I want to register mode/sha1 from desired-blob, which
   may not be of verify_path() satisfying name, at path in the
   dircache"?

 * The merge policy decides to delete the path.

   We could do this today by first stashing away the file at the
   path if it exists, "update-cache --remove" it, and restore
   if necessary.  This is again workable but clumsy.  How about:

   $ update-cache --force-remove path

   to mean "I want to remove the path from dircache even though
   it may exist in my working tree"?

So it all boils down to update-cache.  The new things to be
introduced are:

 * An explicit update-cache always removes stage 1/2/3 entries
   associated with the named path.

 * update-cache --graft

 * update-cache --force-remove

Am I on the right track?

You might want to go even lower level by letting them say
something like:

 * update-cache --register-stage mode sha1 stage path

   Registers the mode/sha1 at stage for path.  Does not look at
   the working tree.  stage is [0-3]
 
 * update-cache --delete-stage stage-list path

   Removes the entry at named stages for path.  Does not look at
   the working tree.  stage-list is either [0-3](,[0-3])+ or
   bitmask (i.e. (1 << stage-number) ORed together).  The former
   would probably be easier to work with by scripts

 * write-blob path

   Hashes and registers the file at path (regardless of what
   verify_path() says) and writes the resulting blob's mode/sha1
   to the standard output.

If you take this lower-level approach, an explicit update-cache
would not clear stage1/2/3.

My preference is the former, not so low-level, interface.
Guidance?


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH] Byteorder fix for read-tree, new -m semantics version.
  2005-04-16  8:12                                                     ` Junio C Hamano
@ 2005-04-16  9:27                                                       ` Junio C Hamano
  2005-04-16 10:35                                                       ` [PATCH 1/2] Add --stage to show-files for new stage dircache Junio C Hamano
                                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-16  9:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

The ce_namelen field has been renamed to ce_flags and split into
the top 2-bit unused, next 2-bit stage number and the lowest
12-bit name-length, stored in the network byte order.  A new
macro create_ce_flags() is defined to synthesize this value from
length and stage, but it forgets to turn the value into the
network byte order.  Here is a fix.

The patch is against 9c03bd47892d11d0bb28c442184786db3c189978.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 cache.h |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

--- cache.h
+++ cache.h	2005-04-16 02:22:05.000000000 -0700
@@ -66,7 +66,7 @@
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 
-#define create_ce_flags(len, stage) ((len) | ((stage) << 12))
+#define create_ce_flags(len, stage) htons((len) | ((stage) << 12))
 
 const char *sha1_file_directory;
 struct cache_entry **active_cache;



^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/2] Add --stage to show-files for new stage dircache.
  2005-04-16  8:12                                                     ` Junio C Hamano
  2005-04-16  9:27                                                       ` [PATCH] Byteorder fix for read-tree, new -m semantics version Junio C Hamano
@ 2005-04-16 10:35                                                       ` Junio C Hamano
  2005-04-16 10:42                                                         ` [PATCH 2/2] " Junio C Hamano
  2005-04-16 14:03                                                       ` Issues with higher-order stages in dircache Junio C Hamano
  2005-04-16 15:28                                                       ` [PATCH 3/2] merge-trees script for Linus git Linus Torvalds
  3 siblings, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-16 10:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "JNH" == Junio C Hamano <junkio@cox.net> writes:
>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I should make "ls-files" have a "-l" format, which shows the
LT> index and the mode for each file too.

JNH> You probably meant "ls-tree".  You used the word "mode" but it
JNH> already shows the mode so I take it to mean "stage".

I was *wrong*.  Of course you meant "show-files".

Instead of sending you an apology, I am sending you the one I
wrote myself.  Please find it in the next message ;-).

Here is its sample output.  It shows file-mode, SHA1, stage and
pathname.  I am attaching this one because this is a
verification that your read-tree -m passed the test.

$ ../show-files --stage
100664 578cc900ed980b72acfbdd1eea63e688a893c458 2 AA
100664 f355077379fce072c210628691da232b59b6f25c 3 AA
100664 d698ebc45d0edfe6e5b95aebb5983cb5c760960b 2 AN
100664 0fa6a8e41814531679e1c76e968a9066fceb689d 1 DD
100664 aff448a9467a4d83b164ef969cfe92ff18eb96be 1 DM
100664 4bfe111723f11cb4a4deec7c837e12601030285f 3 DM
100664 9b0f86e5cded99b9de3bd9d234747ec2d1a4cddd 1 DN
100664 9b0f86e5cded99b9de3bd9d234747ec2d1a4cddd 3 DN
100664 a6772f2a2c15bac796d8c7bb55885891956534cf 1 MD
100664 dc2088ce13f659f2bd554b2c1b343f4966143b9b 2 MD
100664 e4310204563a9059828644464779874c3a406fee 1 MM
100664 fe5ddcd7618d26384cf98c6fcd15780c7125e6d6 2 MM
100664 53a9d14868dbe346a9f0cf01fcda742545b55987 3 MM
100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 1 MN
100664 d7600381b69b92f61bad50c5f8408e831b622ef0 2 MN
100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 3 MN
100664 67fb1517ea8d59949a8e4f5f07f0422b212f64dc 3 NA
100664 0e5842253af8881b2c9f579029d7b50a8e03d7f6 1 ND
100664 0e5842253af8881b2c9f579029d7b50a8e03d7f6 2 ND
100664 0d45c04c9d05fa9c21edf95fc2c1a43519a8c440 1 NM
100664 0d45c04c9d05fa9c21edf95fc2c1a43519a8c440 2 NM
100664 849bfa41d15951f5e97cb93e22cbcc2924ce4517 3 NM
100664 83d94b8fd056921f22ad2ca0122dd7f64974be7c 0 NN

This is taken from the dircache after I ran

    $ read-tree -m O A B

using the merge testcase I prepared earlier.  Very trivial,
single ancestor O, with two branches A & B merge case.  This
covers all possible patterns, except file vs directory
conflicts.  The filenames are all two letters, first letter
being what the first branch does to that file while the second
one encodes what the second branch does to it.  The actions are:

 - A means "Added in this branch --- did not exist in the ancestor."

 - N means "No change in this branch."

 - D means "Deleted in this branch."

 - M means "Modified in this branch."

So, for example, the first branch modified file MN while the
second one did not touch it.  Of course it existed in the
ancestor.  You can see that read-tree did the right thing
because SHA1 for stage 1 and stage 3 match, and stage 2 is
different.

    100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 1 MN
    100664 d7600381b69b92f61bad50c5f8408e831b622ef0 2 MN
    100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 3 MN

I verified all of the above result and it shows your algorithm
is doing exactly what is expected.

For those of you who are interested, this is the recipe to
reproduce this merge testcase.  NOTE! NOTE! NOTE!  Do not run
this in your working tree, because it trashes .git in its
working directory.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

--- /dev/null
+++ generate-merge-test.sh
@@ -0,0 +1,163 @@
+#!/bin/sh
+
+: Skip execution up to <<\End_of_Commentary
+
+This directory is to hold a test case for merges.
+
+There is one ancestor (called O for Original) and two branches A
+and B derived from it.  We want to do 3-way merge between A and
+B, using O as the common ancestor.
+
+    merge A O B
+    diff3 A O B
+
+Decisions are made by comparing contents of O, A and B pathname
+by pathname.  The result is determined by the following guiding
+principle:
+
+ - If only A does something to it and B does not touch it, take
+   whatever A does.
+
+ - If only B does something to it and A does not touch it, take
+   whatever B does.
+
+ - If both A and B does something but in the same way, take
+   whatever they do.
+
+ - If A and B does something but different things, we need a
+   3-way merge:
+
+   - We cannot do anything about the following cases:
+
+     * O does not have it.  A and B both must be adding to the
+       same path independently.
+
+     * A deletes it.  B must be modifying.
+
+   - Otherwise, A and B are modifying.  Run 3-way merge.
+
+
+First, the case matrix.
+
+ - Vertical axis is for A's actions.
+ - Horizontal axis is for B's actions.
+
+.----------------------------------------------------------------.
+| A        B | No Action  |   Delete   |   Modify   |    Add     |
+|------------+------------+------------+------------+------------|
+| No Action  |            |            |            |            |
+|            | select O   | delete     | select B   | select B   |
+|            |            |            |            |            |
+|------------+------------+------------+------------+------------|
+| Delete     |            |            | ********** |    can     |
+|            | delete     | delete     | merge      |    not     |
+|            |            |            |            |  happen    |
+|------------+------------+------------+------------+------------|
+| Modify     |            | ********** | ?????????? |    can     |
+|            | select A   | merge      | select A=B |    not     |
+|            |            |            | merge      |  happen    |
+|------------+------------+------------+------------+------------|
+| Add        |            |    can     |    can     | ?????????? |
+|            | select A   |    not     |    not     | select A=B |
+|            |            |  happen    |  happen    | merge      |
+.----------------------------------------------------------------.
+
+End_of_Commentary
+
+rm -fr [NDMA][NDMA] S .git Trivial
+init-db
+
+# Original tree.
+mkdir S
+for a in N D M
+do
+    for b in N D M
+    do
+        p=$a$b
+	echo This is $p from the original tree. >$p
+	echo This is S/$p from the original tree. >S/$p
+	update-cache --add $p || exit
+	update-cache --add S/$p || exit
+    done
+done
+cat >Trivial <<\EOF
+This is a trivial merge sample text.
+Branch A is expected to upcase this word.
+There are some filler words to foil diff contexts here,
+like this one,
+and this one,
+and this one is yet another one of them.
+At the very end, here comes another line, that is
+the word, expected to be upcased by Branch B.
+This concludes the trivial merge sample file.
+EOF
+update-cache --add Trivial || exit
+tree_O=$(write-tree)
+commit_O=$(echo 'Original tree for the merge test.' | commit-tree $tree_O)
+
+# Branch A and B makes the changes according to the above matrix.
+# Branch A
+to_remove=$(echo D? S/D?)
+rm -f $to_remove
+update-cache --remove $to_remove || exit
+
+for p in M? S/M?
+do
+    echo This is modified $p in the branch A. >$p
+    update-cache $p || exit
+done
+
+for p in AN AA
+do
+    echo This is added $p in the branch A. >$p
+    update-cache --add $p || exit
+done
+mv Trivial ,,Trivial
+sed -e '/Branch A/s/word/WORD/g' <,,Trivial >Trivial
+rm -f ,,Trivial
+update-cache Trivial || exit
+
+tree_A=$(write-tree)
+commit_A=$(echo 'Branch A for the merge test.' |
+           commit-tree $tree_A -p $commit_O)
+	   
+
+# Branch B
+# Start from O
+rm -rf [NDMA][NDMA] S Trivial
+mkdir S
+../read-tree $tree_O
+checkout-cache -a
+
+to_remove=$(echo ?D S/?D)
+rm -f $to_remove
+update-cache --remove $to_remove || exit
+
+for p in ?M S/?M
+do
+    echo This is modified $p in the branch B. >$p
+    update-cache $p || exit
+done
+
+for p in NA AA
+do
+    echo This is added $p in the branch B. >$p
+    update-cache --add $p || exit
+done
+mv Trivial ,,Trivial
+sed -e '/Branch B/s/word/WORD/g' <,,Trivial >Trivial
+rm -f ,,Trivial
+update-cache Trivial || exit
+
+tree_B=$(write-tree)
+commit_B=$(echo 'Branch B for the merge test.' |
+           commit-tree $tree_B -p $commit_O)
+
+for commit in $commit_O $commit_A $commit_B
+do
+    echo ================
+    echo commit $commit
+    cat-file commit $commit
+done
+echo ================
+




^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 2/2] Add --stage to show-files for new stage dircache.
  2005-04-16 10:35                                                       ` [PATCH 1/2] Add --stage to show-files for new stage dircache Junio C Hamano
@ 2005-04-16 10:42                                                         ` Junio C Hamano
  0 siblings, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-16 10:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This adds --stage option to show-files command.  It shows
file-mode, SHA1, stage and pathname.  Record separator follows
the usual convention of -z option as before.

The patch is on top of the byte order fix for create_ce_flags in my
previous message.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 cache.h      |   12 +++++++-----
 show-files.c |   22 ++++++++++++++++++----
 2 files changed, 25 insertions(+), 9 deletions(-)

--- cache.h	2005-04-16 03:02:36.000000000 -0700
+++ cache.h=show-files-stage-flags	2005-04-16 02:48:47.000000000 -0700
@@ -65,8 +65,14 @@
 
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
+#define CE_STAGESHIFT 12
 
-#define create_ce_flags(len, stage) htons((len) | ((stage) << 12))
+#define create_ce_flags(len, stage) htons((len) | ((stage) << CE_STAGESHIFT))
+#define ce_namelen(ce) (CE_NAMEMASK & ntohs((ce)->ce_flags))
+#define ce_size(ce) cache_entry_size(ce_namelen(ce))
+#define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
+
+#define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
 
 const char *sha1_file_directory;
 struct cache_entry **active_cache;
@@ -75,10 +81,6 @@
 #define DB_ENVIRONMENT "SHA1_FILE_DIRECTORY"
 #define DEFAULT_DB_ENVIRONMENT ".git/objects"
 
-#define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
-#define ce_namelen(ce) (CE_NAMEMASK & ntohs((ce)->ce_flags))
-#define ce_size(ce) cache_entry_size(ce_namelen(ce))
-
 #define alloc_nr(x) (((x)+16)*3/2)
 
 /* Initialize and use the cache information */



--- show-files.c
+++ show-files.c	2005-04-16 02:58:32.000000000 -0700
@@ -14,6 +14,7 @@
 static int show_cached = 0;
 static int show_others = 0;
 static int show_ignored = 0;
+static int show_stage = 0;
 static int line_terminator = '\n';
 
 static const char **dir;
@@ -108,10 +109,19 @@
 		for (i = 0; i < nr_dir; i++)
 			printf("%s%c", dir[i], line_terminator);
 	}
-	if (show_cached) {
+	if (show_cached | show_stage) {
 		for (i = 0; i < active_nr; i++) {
 			struct cache_entry *ce = active_cache[i];
-			printf("%s%c", ce->name, line_terminator);
+			if (!show_stage)
+				printf("%s%c", ce->name, line_terminator);
+			else
+				printf(/* "%06o %s %d %10d %s%c", */
+				       "%06o %s %d %s%c",
+				       ntohl(ce->ce_mode),
+				       sha1_to_hex(ce->sha1),
+				       ce_stage(ce),
+				       /* ntohl(ce->ce_size), */
+				       ce->name, line_terminator); 
 		}
 	}
 	if (show_deleted) {
@@ -156,12 +166,16 @@
 			show_ignored = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--stage")) {
+			show_stage = 1;
+			continue;
+		}
 
-		usage("show-files (--[cached|deleted|others|ignored])*");
+		usage("show-files [-z] (--[cached|deleted|others|ignored|stage])*");
 	}
 
 	/* With no flags, we default to showing the cached files */
-	if (!(show_cached | show_deleted | show_others | show_ignored))
+	if (!(show_stage | show_deleted | show_others | show_ignored))
 		show_cached = 1;
 
 	read_cache();


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-16  1:44                               ` Simon Fowler
@ 2005-04-16 12:19                                 ` David Lang
  2005-04-16 15:55                                   ` Simon Fowler
  0 siblings, 1 reply; 148+ messages in thread
From: David Lang @ 2005-04-16 12:19 UTC (permalink / raw)
  To: simon; +Cc: Linus Torvalds, David Woodhouse, Junio C Hamano, Petr Baudis, git


On Fri, Apr 15, 2005 at 08:32:46AM -0700, Linus Torvalds wrote:
> In other words, I'm right. I'm always right, but sometimes I'm more 
right
> than other times. And dammit, when I say "files don't matter", I'm 
really
> really Right(tm).
>
You're right, of course (All Hail Linus!), if you can make it work
efficiently enough.

Just to put something else on the table, here's how I'd go about
tracking renames and the like, in another world where Linus /does/
make the odd mistake - it's basically a unique id for files in the
repository, added when the file is first recognised and updated when
update-cache adds a new version to the cache. Renames copy the id
across to the new name, and add it into the cache.

This gives you an O(n) way to tell what file was what across
renames, and it might even be useful in Linus' world, or if someone
wanted to build a traditional SCM on top of a git-a-like.

Attached is a patch, and a rename-file.c to use it.

Simon

given that you have multiple machines creating files, how do you deal with 
the idea of the same 'unique id' being assigned to different files by 
different machines?

David Lang



-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Issues with higher-order stages in dircache
  2005-04-16  8:12                                                     ` Junio C Hamano
  2005-04-16  9:27                                                       ` [PATCH] Byteorder fix for read-tree, new -m semantics version Junio C Hamano
  2005-04-16 10:35                                                       ` [PATCH 1/2] Add --stage to show-files for new stage dircache Junio C Hamano
@ 2005-04-16 14:03                                                       ` Junio C Hamano
  2005-04-17  5:11                                                         ` Junio C Hamano
  2005-04-17 10:00                                                         ` Summary of "read-tree -m O A B" mechanism Junio C Hamano
  2005-04-16 15:28                                                       ` [PATCH 3/2] merge-trees script for Linus git Linus Torvalds
  3 siblings, 2 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-16 14:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "JCH" == Junio C Hamano <junkio@cox.net> writes:

JCH> So what's next?

Here is my current thinking on the impact your higher-order
stage dircache entries would have to the rest of the system and
how to deal with them.

 * read-tree

   - When merging two trees, i.e. "read-tree -m A B", shouldn't
     we collapse identical stage-1/2 into stage-0?

 * update-cache

   - An explicit "update-cache [--add] [--remove] path" should
     be taken as a signal from the user (or Cogito) to tell the
     dircache layer "the merge is done and here is the result".
     So just delete higher-order stages for the path and record
     the specified path at stage 0 (or remove it altogether).

   - "update-cache --refresh" should just ignore a path that has
     not been merged,  Maybe say "needs merge", just like "needs
     update" [*1*].

   - "update-cache --cacheinfo" should get an extra "stage"
     argument.  Unmerged state is typically produced by running
     "read-tree -m", but the user or Cogito can do it by hand
     with this if he wanted to.

   - I do not think we need a separate "remove the entry for
     this path at this stage" thing.  That is only necessary if
     the user or Cogito is doing things by hand (as opposed to
     "read-tree -m"), which should be a very rare case.  He can
     always do "update-cache --remove" followed by "update-cache
     --cacheinfo" to obtain the desired result if he really
     wanted to.  For that, "update-cache --force-remove" may
     come in handy.

 * show-diff

   - What should we do about unmerged paths?  Showing diffs
     between the combinations (1->2), (1->3), and (2->3) that
     exist may not be a bad idea.  It would not be confusing
     because by definition dircache with higher-order stages is
     a merge temporary directory and the user should not have a
     working file there to begin with.

     I think the current implementation does a very bad thing:
     repeating the same diff as many times as it has
     higher-order stages for the same path.

 * checkout-cache

   - When checkout-cache is run with explicit paths that are
     unmerged, what should we do?  What does that mean in the
     first place?  One use scenario I can think of is that the
     user or Cogito wants the contents at all three stages, in
     order to run a merge tool on them.  From this point of
     view, checking out all the available stages for the path
     makes sense.

     My "cunning plan" is to drop ".1-$file", ".2-$file", and
     ".3-$file" in the working directory.  How does that sound?

   - When checkout-cache -a is run, presumably the user wants to
     check out everything to verify (e.g. build-test) the
     result.  In this case, we should skip unmerged paths, give
     a warning, and check out only the merged ones.


[Footnotes]

*1* Unrelated note.  Who is the intended consumer of this "needs
    update" message?  Should we make it machine readable with
    '-z' flag as well?  Otherwise, shouldn't it go to stderr?
    Currently it goes to stdout.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/2] merge-trees script for Linus git
  2005-04-16  8:12                                                     ` Junio C Hamano
                                                                         ` (2 preceding siblings ...)
  2005-04-16 14:03                                                       ` Issues with higher-order stages in dircache Junio C Hamano
@ 2005-04-16 15:28                                                       ` Linus Torvalds
  2005-04-16 16:36                                                         ` Linus Torvalds
  3 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16 15:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sat, 16 Apr 2005, Junio C Hamano wrote:
> 
> LT> NOTE NOTE NOTE! I could make "read-tree" do some of these nontrivial 
> LT> merges, but I ended up deciding that only the "matches in all three 
> LT> states" thing collapses by default.
> 
>  * Understood and agreed.

Having slept on it, I think I'll merge all the trivial cases that don't 
involve a file going away or being added. Ie if the file is in all three 
trees, but it's the same in two of them, we know what to do.

That way we'll leave thigns where the tree itself changed (files added or 
removed at any point) and/or cases where you actually need a 3-way merge.

> The userland merge policies need ways to extract the stage
> information and manipulate them.  Am I correct to say that you
> mean by "ls-files -l" the extracting part?

No, I meant "show-files", since we need to show the index, not a tree (no 
valid tree can ever have the "modes" information, since (a) it doesn't 
have the space for it anyway and (b) we refuse to write out a dirty index 
file.



> 
> LT> I should make "ls-files" have a "-l" format, which shows the
> LT> index and the mode for each file too.
> 
> You probably meant "ls-tree".  You used the word "mode" but it
> already shows the mode so I take it to mean "stage".  Perhaps
> something like this?
> 
> $ ls-tree -l -r 49c200191ba2e3cd61978672a59c90e392f54b8b
> 100644	blob	fe2a4177a760fd110e78788734f167bd633be8de	COPYING
> 100644	blob	b39b4ea37586693dd707d1d0750a9b580350ec50:1	man/frotz.6
> 100644	blob	b39b4ea37586693dd707d1d0750a9b580350ec50:2	man/frotz.6
> 100664	blob	eeed997e557fb079f38961354473113ca0d0b115:3	man/frotz.6

Apart from the fact that it would be

	show-files -l

since there are no tree objects that can have anything but fully merged
state, yes.

> Assuming that you would be working on that, I'd like to take the
> dircache manipulation part.  Let's think about the minimally
> necessary set of operations:
> 
>  * The merge policy decides to take one of the existing stage.
> 
>    In this case we need a way to register a known mode/sha1 at a
>    path.  We already have this as "update-cache --cacheinfo".
>    We just need to make sure that when "update-cache" puts
>    things at stage 0 it clears other stages as well.
> 
>  * The merge policy comes up with a desired blob somewhere on
>    the filesystem (perhaps by running an external merge
>    program).  It wants to register it as the result of the
>    merge.
> 
>    We could do this today by first storing the "desired blob"
>    in a temporary file somewhere in the path the dircache
>    controls, "update-cache --add" the temporary file, ls-tree to
>    find its mode/sha1, "update-cache --remove" the temporary
>    file and finally "update-cache --cacheinfo" the mode/sha1.
>    This is workable but clumsy.  How about:
> 
>    $ update-cache --graft [--add] desired-blob path
> 
>    to say "I want to register mode/sha1 from desired-blob, which
>    may not be of verify_path() satisfying name, at path in the
>    dircache"?
> 
>  * The merge policy decides to delete the path.
> 
>    We could do this today by first stashing away the file at the
>    path if it exists, "update-cache --remove" it, and restore
>    if necessary.  This is again workable but clumsy.  How about:
> 
>    $ update-cache --force-remove path
> 
>    to mean "I want to remove the path from dircache even though
>    it may exist in my working tree"?

Yes.

> Am I on the right track?

Exactly.

> You might want to go even lower level by letting them say
> something like:
> 
>  * update-cache --register-stage mode sha1 stage path
> 
>    Registers the mode/sha1 at stage for path.  Does not look at
>    the working tree.  stage is [0-3]

I'd prefer not. I'd avoid playing games with the stages at any other level
than the "full tree" level until we show a real need for it.

Let's go with the known-needed minimal cases that are high-level enough to
make the scripting simple, and see if there is any reason to ever touch
the tree any other way.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 16:01                               ` David Woodhouse
  2005-04-15 16:31                                 ` C. Scott Ananian
@ 2005-04-16 15:33                                 ` Johannes Schindelin
  2005-04-17 13:14                                   ` David Woodhouse
  1 sibling, 1 reply; 148+ messages in thread
From: Johannes Schindelin @ 2005-04-16 15:33 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, Junio C Hamano, Petr Baudis, git

Hi,

On Fri, 15 Apr 2005, David Woodhouse wrote:

> But if it can be done cheaply enough at a later date even though we end
> up repeating ourselves, and if it can be done _well_ enough that we
> shouldn't have just asked the user in the first place, then yes, OK I
> agree.

The repetition could be helped by using a cache.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Re: Re: write-tree is pasky-0.4
  2005-04-16  0:22                                               ` Linus Torvalds
  2005-04-16  1:13                                                 ` Daniel Barkalow
@ 2005-04-16 15:34                                                 ` Petr Baudis
  1 sibling, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-16 15:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Dear diary, on Sat, Apr 16, 2005 at 02:22:45AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> 
> 
> On Sat, 16 Apr 2005, Petr Baudis wrote:
> > 
> > But otherwise it is great news to me. Actually, in that case, is it
> > worth renaming it to Cogito and using cg to invoke it? Wouldn't be that
> > actually more confusing after it gets merged? IOW, should I stick to
> > "git" or feel free to rename it to "cg"?
> 
> I'm perfectly happy for it to stay as "git", and in general I don't have
> any huge preferences either way. You guys can discuss names as much as you
> like, it's the "tracking renames" and "how to merge" things that worry me.

:-)

> I think I've explained my name tracking worries.  When it comes to "how to 
> merge", there's three issues:
> 
>  - we do commonly have merge clashes where both trees have applied the 
>    exact same patch. That should merge perfectly well using the 3-way
>    merge from a common parent that Junio has, but not your current "bring
>    patches forward" kind of strategy.

My current "bring patches forward" strategy is only very interim, to
have something working well enough for me to merge with you. I will
gladly change it to use merge-tree*, when it is done. (Or read-tree -m -
I will yet have to have a look, but it looks extremely promising.)

>  - I _do_ actually sometimes merge with dirty state in my working 
>    directory, which is why I want the merge to take place in a separate 
>    (and temporary) directory, which allows for a failed merge without 
>    having any major cleanup. If the merge fails, it's not a big deal, and 
>    I can just blow the merge directory away without losing the work I had 
>    in my "real" working directory.

Ok. But still, especially when you do some nontrivial conflicts
resolving, how do you check if it even compiles after the merge? Or do
you just commit it and possibly fix the compilation in another commit?

>  - reliability. I care much less for "clever" than I care for "guaranteed 
>    to never do the wrong thing". If I have to fix up some stuff by hand, 
>    I'll happily do so. But if I can't trust the merge and have to _check_ 
>    things by hand afterwards, that will make me leery of the merges, and
>    _that_ is bad.
> 
> The third point is why I'm going to the ultra-conservative "three-way 
> merge from the common parent". It's not fancy, but it's something I feel 
> comfortable with as a merge strategy. For example, arch (and in particular 
> darcs) seems to want to try to be "clever" about the merges, and I'd 
> always live in fear. 

I agree and I would like to achieve the same. I too think the three-way
merge from the common parent is the best way to go for now.

> And, finally, there's obviously performance. I _think_ a normal merge with
> nary a conflict and just a few tens of files changed should be possible in
> a second. I realize that sounds crazy to some people, but I think it's
> entirely doable. Half of that is writing the new tree out (that is a
> relative costly op due to the compression). The other half is the "work".

Being written in shell, there is plenty of space for optimization - from
using bash internals instead of textutils to rewriting parts of it in C.
My priority now is to get it right first, though. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-16 12:19                                 ` David Lang
@ 2005-04-16 15:55                                   ` Simon Fowler
  2005-04-16 16:03                                     ` Petr Baudis
  0 siblings, 1 reply; 148+ messages in thread
From: Simon Fowler @ 2005-04-16 15:55 UTC (permalink / raw)
  To: David Lang; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

On Sat, Apr 16, 2005 at 05:19:24AM -0700, David Lang wrote:
> Simon
> 
> given that you have multiple machines creating files, how do you deal with 
> the idea of the same 'unique id' being assigned to different files by 
> different machines?
> 
The id is a sha1 hash of the current time and the full path of the
file being added - the chances of that being replicated without
malicious intent is extremely small. There are other things that
could be used, like the hostname, username of the person running the
program, etc, but I don't really see them being necessary.

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-16 15:55                                   ` Simon Fowler
@ 2005-04-16 16:03                                     ` Petr Baudis
  2005-04-16 16:26                                       ` Simon Fowler
  2005-04-16 16:26                                       ` Linus Torvalds
  0 siblings, 2 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-16 16:03 UTC (permalink / raw)
  To: Simon Fowler; +Cc: David Lang, git

Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter
where Simon Fowler <simon@himi.org> told me that...
> On Sat, Apr 16, 2005 at 05:19:24AM -0700, David Lang wrote:
> > Simon
> > 
> > given that you have multiple machines creating files, how do you deal with 
> > the idea of the same 'unique id' being assigned to different files by 
> > different machines?
> > 
> The id is a sha1 hash of the current time and the full path of the
> file being added - the chances of that being replicated without
> malicious intent is extremely small. There are other things that
> could be used, like the hostname, username of the person running the
> program, etc, but I don't really see them being necessary.

Why not just use UUID?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-16 16:03                                     ` Petr Baudis
@ 2005-04-16 16:26                                       ` Simon Fowler
  2005-04-16 16:26                                       ` Linus Torvalds
  1 sibling, 0 replies; 148+ messages in thread
From: Simon Fowler @ 2005-04-16 16:26 UTC (permalink / raw)
  To: Petr Baudis; +Cc: David Lang, git

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

On Sat, Apr 16, 2005 at 06:03:33PM +0200, Petr Baudis wrote:
> Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter
> where Simon Fowler <simon@himi.org> told me that...
> > On Sat, Apr 16, 2005 at 05:19:24AM -0700, David Lang wrote:
> > > Simon
> > > 
> > > given that you have multiple machines creating files, how do you deal with 
> > > the idea of the same 'unique id' being assigned to different files by 
> > > different machines?
> > > 
> > The id is a sha1 hash of the current time and the full path of the
> > file being added - the chances of that being replicated without
> > malicious intent is extremely small. There are other things that
> > could be used, like the hostname, username of the person running the
> > program, etc, but I don't really see them being necessary.
> 
> Why not just use UUID?
> 
Hey, everything else in git seems to use sha1, so I just copied
Linus' sha1 code ;-)

All I wanted was something that had a good chance of being unique
across any potential set of distributed repositories, to avoid the
chance of accidental clashes. A sha1 hash of something that's not
likely to be replicated is a simple way to do that.

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-16 16:03                                     ` Petr Baudis
  2005-04-16 16:26                                       ` Simon Fowler
@ 2005-04-16 16:26                                       ` Linus Torvalds
  2005-04-16 23:02                                         ` David Lang
  2005-04-17 14:52                                         ` Ingo Molnar
  1 sibling, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16 16:26 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Simon Fowler, David Lang, git



On Sat, 16 Apr 2005, Petr Baudis wrote:

> Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter
> where Simon Fowler <simon@himi.org> told me that...
>
> > The id is a sha1 hash of the current time and the full path of the
> > file being added - the chances of that being replicated without
> > malicious intent is extremely small. There are other things that
> > could be used, like the hostname, username of the person running the
> > program, etc, but I don't really see them being necessary.
> 
> Why not just use UUID?

Note that using anything that isn't data-related totally destroys the 
whole point of the object database. Remember: any time we don't uniquely 
generate the same name for the same object, we'll waste disk-space.

So adding in user/machine/uuid's to the thing is always a mistake. The 
whole thing depends on the hash being as close to 1:1 with the contents as 
humanly possible. 

There's also the issue of size. Yes, I could have chosen sha256 instead of
sha1. But the keys would be almost twice as big, which in turn means that 
the "tree" objects would be bigger, and that the "index" file would be 
bigger.

Is that a huge problem? No. We can certainly move to it if sha1 ever shows
itself to be weak. But I really think we are much better off just
re-generating the whole tree and history at that point, rather than try to 
predict the future.

The fact is, with current knowledge, sha1 _is_ safe for what git uses it 
for, for the forseeable future. And we have a migration strategy if I'm 
wrong. Don't worry about it.

Almost all attacks on sha1 will depend on _replacing_ a file with a bogus
new one. So guys, instead of using sha256 or going overboard, just make 
sure that when you synchronize, you NEVER import a file you already have.

It's really that simple. Add "--ignore-existing" to your rsync scripts,
and you're pretty much done. That guarantees that a new evil blob by the
next mad scientist out to take over the world will never touch your
repository, and if we make this part of the _standard_ scripts, then
dammit, security is in good _practices_ rather than just relying blindly
on the hash being secure.

In other words, I think we could have used md5's as the hash, if we just
make sure we have good practices. And it wouldn't have been "insecure".

The fact is, you don't merge with people you don't trust. If you don't
trust them, they have a much easier time corrupting your repository by
just creating bugs in the code and checking that thing in. Who cares about
hash collisions, when you can generate a kernel root vulnerability by just
adding a single line of code and use the _correct_ hash for it.

So the sha1 hash does not replace _trust_. That comes from something else 
altogether.

			Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/2] merge-trees script for Linus git
  2005-04-16 15:28                                                       ` [PATCH 3/2] merge-trees script for Linus git Linus Torvalds
@ 2005-04-16 16:36                                                         ` Linus Torvalds
  2005-04-16 17:14                                                           ` Junio C Hamano
  0 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16 16:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sat, 16 Apr 2005, Linus Torvalds wrote:
> 
> Having slept on it, I think I'll merge all the trivial cases that don't 
> involve a file going away or being added. Ie if the file is in all three 
> trees, but it's the same in two of them, we know what to do.

Junio, I pushed this out, along with the two patches from you. It's still
more anal than my original "tree-diff" algorithm, in that it refuses to
touch anything where the name isn't the same in all three versions
(original, new1 and new2), but now it does the "if two of them match, just
select the result directly" trivial merges.

I really cannot see any sane case where user policy might dictate doing
anything else, but if somebody can come up with an argument for a merge
algorithm that wouldn't do what that trivial merge does, we can make a
flag for "don't merge at all".

The reason I do want to merge at all in "read-tree" is that I want to
avoid having to write out a huge index-file (it's 1.6MB on the kernel, so
if you don't do _any_ trivial merges, it would be 4.8MB after reading
three trees) and then having people read it and parse it just to do stuff
that is obvious. Touching 5MB of data isn't cheap, even if you don't do a 
whole lot to it.

Anyway, with the modified read-tree, as far as I can tell it will now 
merge all the cases where one side has done something to a file, and the 
other side has left it alone (or where both sides have done the exact same 
modification). That should _really_ cut down the cases to just a few files 
for most of the kernel merges I can think of. 

Does it do the right thing for your tests?

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/2] merge-trees script for Linus git
  2005-04-16 16:36                                                         ` Linus Torvalds
@ 2005-04-16 17:14                                                           ` Junio C Hamano
  0 siblings, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-16 17:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Anyway, with the modified read-tree, as far as I can tell it will now 
LT> merge all the cases where one side has done something to a file, and the 
LT> other side has left it alone (or where both sides have done the exact same 
LT> modification). That should _really_ cut down the cases to just a few files 
LT> for most of the kernel merges I can think of. 

LT> Does it do the right thing for your tests?

Yes.



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-15 15:32                             ` Linus Torvalds
                                                 ` (2 preceding siblings ...)
  2005-04-16  1:44                               ` Simon Fowler
@ 2005-04-16 20:29                               ` Sanjoy Mahajan
  2005-04-16 20:41                                 ` Linus Torvalds
  3 siblings, 1 reply; 148+ messages in thread
From: Sanjoy Mahajan @ 2005-04-16 20:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Woodhouse, Junio C Hamano, Petr Baudis, git

> And that "where did this come from" decision should be done at _search_
> time, not commit time.

I like this elegant approach, but clever pattern matching can help even
at commit time.  Suppose hello.c is simply:

  printf ("Hello %d\n", year);

And then developer A updates hello.c to:

  printf ("Hello %d\n", year);
  printf ("And   %d\n", year+1);

Meanwhile developer B updates hello.c to:

  printf ("Hello %d\n", yyyy);

How to merge these two changes?  The psychic solution is

  printf ("Hello %d\n", yyyy);
  printf ("And   %d\n", yyyy+1);

Darcs handles token renames specially, but it's not a general solution
so let's leave it aside.  The example does not have enough information
to make the psychic solution unique or reliable, but imagine that the
example were longer to solve that problem.  You'd want to describe the
delta A(hello.c) as
   
  1. duplicated message line
  2. changed 2nd line a bit

And B(hello.c) as

  1. Changed year to yyyy

In that representation, merging the two deltas becomes

  1. duplicated message line
  2. changed 2nd line a bit
  3. Changed year to yyyy in both lines

Or, by commuting the merge operations and adjusting for their
non-commutativity (in terminology like darcs's -- I'm also a physicist):

  1. Changed year to yyyy
  2. duplicated message line
  3. changed 2nd line a bit

So here some of the computation that Linus wants only at question time
(e.g. 'how did that line get here??') is also useful at merge time.
It's difficult (expensive, unreliable) to describe deltas in the form
above or, worse, to merge two such descriptions, but I hope it
illustrates the point.  And perhaps a robust and easier-to-compute
change-description language can be dreamt up, even if the general
problem of describing changes compactly is not computable -- it's almost
the same, or is the same, as finding the Kolmogorov complexity of a data
set.

Or have I missed a fundamental point?

-Sanjoy

`A society of sheep must in time beget a government of wolves.'
   - Bertrand de Jouvenal

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-16 20:29                               ` Sanjoy Mahajan
@ 2005-04-16 20:41                                 ` Linus Torvalds
  0 siblings, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-16 20:41 UTC (permalink / raw)
  To: Sanjoy Mahajan; +Cc: David Woodhouse, Junio C Hamano, Petr Baudis, git



On Sat, 16 Apr 2005, Sanjoy Mahajan wrote:
> 
> I like this elegant approach, but clever pattern matching can help even
> at commit time.  Suppose hello.c is simply:

Here, what you're talking about is not "commit", but "merge".

The git model very much separates the two events. You first generate a 
merged tree, an dyou commit that merge as a separate and largely totally 
independent phase.

And yes, I agree that with merging, you do end up potentially wanting to 
try different things. When I've done my "git" merges, all I've really done 
is to make sure that the trivial parts basically merge in zero time, so 
that you can afford to perhaps spend some effort on handling the _real_ 
merge conflicts.

Many systems seem to be designed around a "clever merge" algorithm, with 
darcs perhaps being the most extreme example. The problem with that design 
is that 99.9% of all the work is not at all about being clever, and if you 
try to base your design around the clever things, your performance will 
definitely suck.

So I think that with git, you can actually really try to be clever,
because when you get a merge conflict, you're now only worrying about one
file out of 17,000, and then you can go wild on that one and try different
merge algorithms (token merge, character-merge, line-based merge, you name
it).

Of course, I might not actually personally want to depend on any clever 
merges, but the git infrastructure really doesn't care. My plumbing 
doesn't merge the conflicts that arise within one single object, or the 
filename differences - you can do anything you want on that.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-16 16:26                                       ` Linus Torvalds
@ 2005-04-16 23:02                                         ` David Lang
  2005-04-17 14:52                                         ` Ingo Molnar
  1 sibling, 0 replies; 148+ messages in thread
From: David Lang @ 2005-04-16 23:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Simon Fowler, git

On Sat, 16 Apr 2005, Linus Torvalds wrote:

> Almost all attacks on sha1 will depend on _replacing_ a file with a bogus
> new one. So guys, instead of using sha256 or going overboard, just make
> sure that when you synchronize, you NEVER import a file you already have.
>
> It's really that simple. Add "--ignore-existing" to your rsync scripts,
> and you're pretty much done. That guarantees that a new evil blob by the
> next mad scientist out to take over the world will never touch your
> repository, and if we make this part of the _standard_ scripts, then
> dammit, security is in good _practices_ rather than just relying blindly
> on the hash being secure.
>
> In other words, I think we could have used md5's as the hash, if we just
> make sure we have good practices. And it wouldn't have been "insecure".
>
> The fact is, you don't merge with people you don't trust. If you don't
> trust them, they have a much easier time corrupting your repository by
> just creating bugs in the code and checking that thing in. Who cares about
> hash collisions, when you can generate a kernel root vulnerability by just
> adding a single line of code and use the _correct_ hash for it.
>
> So the sha1 hash does not replace _trust_. That comes from something else
> altogether.

What I am bringing up is not intended to be a trust thing, but instead a 
safety thing, accidents, not evil intent. makeing the rsync scripts 
--ignore-existing will avoid corrupting local data when pulling remotely, 
but it won't solve the problem of running into a collision locally (and 
won't do much to help you figure out what's wrong when you run into a 
remote collision)

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Issues with higher-order stages in dircache
  2005-04-16 14:03                                                       ` Issues with higher-order stages in dircache Junio C Hamano
@ 2005-04-17  5:11                                                         ` Junio C Hamano
  2005-04-17  5:31                                                           ` Linus Torvalds
  2005-04-17 10:00                                                         ` Summary of "read-tree -m O A B" mechanism Junio C Hamano
  1 sibling, 1 reply; 148+ messages in thread
From: Junio C Hamano @ 2005-04-17  5:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus,

    earlier I wrote [*R1*]:

   - An explicit "update-cache [--add] [--remove] path" should
     be taken as a signal from the user (or Cogito) to tell the
     dircache layer "the merge is done and here is the result".
     So just delete higher-order stages for the path and record
     the specified path at stage 0 (or remove it altogether).

and I think this commit of yours implements the adding half.

    commit be7b1f05cea8e5213ffef8f74ebdefed2aacb6fc:1
    author Linus Torvalds <torvalds@ppc970.osdl.org> 1113678345 -0700
    committer Linus Torvalds <torvalds@ppc970.osdl.org> 1113678345 -0700

    When inserting a index entry of stage 0, remove all old unmerged entries.

I am wondering if you have a particular reason not to do the
same for the removing half.  Without it, currently I do not see
a way for the user or Cogito to tell dircache layer that the
merge should result in removal.  That is, other than first
adding a phony entry there (which brings the entry down to stage
0) and then immediately doing a regular update-cache --remove.
That is two instead of one reading of 1.6MB index file for the
kernel case.

Also do you have any comments on this one from the same message?

 * read-tree

   - When merging two trees, i.e. "read-tree -m A B", shouldn't
     we collapse identical stage-1/2 into stage-0?


[References]

*R1* http://marc.theaimsgroup.com/?l=git&m=111366023126466&w=2


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Issues with higher-order stages in dircache
  2005-04-17  5:11                                                         ` Junio C Hamano
@ 2005-04-17  5:31                                                           ` Linus Torvalds
  2005-04-17  6:01                                                             ` Junio C Hamano
  0 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-17  5:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sat, 16 Apr 2005, Junio C Hamano wrote:
> 
> I am wondering if you have a particular reason not to do the
> same for the removing half.

No. Except for me being silly.

Please just make it so.

> Also do you have any comments on this one from the same message?
> 
>  * read-tree
> 
>    - When merging two trees, i.e. "read-tree -m A B", shouldn't
>      we collapse identical stage-1/2 into stage-0?

How do you actually intend to merge two trees? 

That sounds like a total special case, and better done with "diff-tree".  
But regardless, since I assume the result is the later tree, why do a 
"read-tree -m A B", since what you really want is "read-tree B"?

The real merge always needs the base tree, and I'd hate to complicate the 
real merge with some special-case that isn't relevant for that real case.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Issues with higher-order stages in dircache
  2005-04-17  5:31                                                           ` Linus Torvalds
@ 2005-04-17  6:01                                                             ` Junio C Hamano
  0 siblings, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-17  6:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

>> - When merging two trees, i.e. "read-tree -m A B", shouldn't
>> we collapse identical stage-1/2 into stage-0?

LT> How do you actually intend to merge two trees? 

How silly of me.  *BLUSH*


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Summary of "read-tree -m O A B" mechanism
  2005-04-16 14:03                                                       ` Issues with higher-order stages in dircache Junio C Hamano
  2005-04-17  5:11                                                         ` Junio C Hamano
@ 2005-04-17 10:00                                                         ` Junio C Hamano
  1 sibling, 0 replies; 148+ messages in thread
From: Junio C Hamano @ 2005-04-17 10:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Earlier I wrote down a list of issues your recent "merge
stage" changes have introduced to the rest of the plumbing, with
a set of suggested adaptions.  I think all of them are cleared
now (you have a pile of patches from me in your mailbox).

I do not know what percentage of people on this list are using
git without the Cogito part, but I suspect that the number might
be quite small.  I also suspect, from the description Petr gave
us on how the merging in Cogito works, Cogito does not currently
use the "read-tree -m O A B" mechanism, and those majority who
do not deal with the low level tools themselves would not have
to know about the merge issues yet.  But I think it is a good
time, now things have started to settle down, to summarize how
various commands work when they see those "funny" dircache
entries created after "read-tree -m O A B" has run.  Of course,
people working on Cogito needs to know them, once they decide to
use the "reed-tree -m O A B" mechanism.

 * read-tree -m O A B

   - For description on how this works, the definitive reading
     is [*R1*].  In short:

     - unlike ordinary read-tree, "-m" form reads up to three
       trees and creates paths that are "unmerged".  

     - trivial merges are done by read-tree itself.  only
       conflicting paths will be in unmerged state when
       read-tree returns.

 * write-tree

     - write-tree refuses to give you a tree until all the
       unmerged paths are resolved.

 * show-files

   - "show-files --unmerged" and "show-files --stage" can be
     used to examine detailed information on unmerged paths.
     For an unmerged path, instead of recording a single
     mode/SHA1 pair, the dircache records up to three such
     pairs; one from tree O in stage 1, A in stage 2, and B in
     stage 3.  This information can be used by the user (or
     Cogito) to see what should eventually be recorded at the
     path.

 * update-cache

   - An explicit "update-cache [--add] path" or "update-cache
     [--add] --cacheinfo mode SHA1 path" tells the plumbing that
     the user (or Cogito) wants to resolve it by storing
     mode/SHA1 of the given working file or mode SHA1 specified
     on the command line.  The path ceases to be in unmerged
     state after this happens.

     Similarly, "update-cache --remove path" resolves the
     unmerged state and the merge result is not having anything
     at that path.

   - "update-cache --refresh", in addition to the "needs update"
     message people are now familiar with, says "needs merge"
     for unmerged paths.

 * show-diff

   - show-diff on an unmerged path simply says "unmerged" (the
     plumbing would not know what to diff with what among three
     stages and the working file).  

 * checkout-cache

   - "checkout-cache -a" warns about unmerged paths and checks
     out only the merged paths.

   - "checkout-cache [-f] path" on an unmerged path says
     "Unmerged", just like the same command on non-existent path
     says "not in the cache", and does not touch the working
     file.
 

I hope the descriptions in this summary is correct enough to be
useful to somebody.


[Reference]

*R1* http://marc.theaimsgroup.com/?l=git&m=111363270608902&w=2


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-16 15:33                                 ` Johannes Schindelin
@ 2005-04-17 13:14                                   ` David Woodhouse
  0 siblings, 0 replies; 148+ messages in thread
From: David Woodhouse @ 2005-04-17 13:14 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Junio C Hamano, Petr Baudis, git

On Sat, 2005-04-16 at 17:33 +0200, Johannes Schindelin wrote:
> > But if it can be done cheaply enough at a later date even though we end
> > up repeating ourselves, and if it can be done _well_ enough that we
> > shouldn't have just asked the user in the first place, then yes, OK I
> > agree.
> 
> The repetition could be helped by using a cache.

Perhaps. Since neither such a cache nor even the commit comments are
strictly part of the git data, they probably shouldn't be included in
the sha1 hash of the commit object. However, I don't see a fundamental
reason why we couldn't store them in the same file but omit them from
the hash calculations. That also allows us to retrospectively edit
commit comments without completely changing the entire subsequent
history.

Or is that a little too heretical a suggestion?

-- 
dwmw2


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-16 16:26                                       ` Linus Torvalds
  2005-04-16 23:02                                         ` David Lang
@ 2005-04-17 14:52                                         ` Ingo Molnar
  2005-04-17 15:08                                           ` Brad Roberts
  2005-04-17 15:28                                           ` Ingo Molnar
  1 sibling, 2 replies; 148+ messages in thread
From: Ingo Molnar @ 2005-04-17 14:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Simon Fowler, David Lang, git


* Linus Torvalds <torvalds@osdl.org> wrote:

> Almost all attacks on sha1 will depend on _replacing_ a file with a 
> bogus new one. So guys, instead of using sha256 or going overboard, 
> just make sure that when you synchronize, you NEVER import a file you 
> already have.

here is a bit complex, but still practical attack that doesnt rely on 
replacement and which can only be detected if we check the sha1 
uniqueness assumptions.

If you can generate a duplicate sha1 key for an arbitrary 'target' file, 
and Malice sends you a GIT-generated patch that introduces a new file 
(which doesnt exist in the current tree) which you review (in the email) 
and which looks safe to apply & harmless. Maybe the patch has a bit 
weird formatting and some weird comments (which in reality Malice used 
to generate the proper sha1 key) but otherwise the patch is for some 
seldom used arcane driver that no-one used for quite some time and 
no-one really cares about, so you are happy to apply the patch.

The compromise occurs when you apply the patch: the seemingly harmless 
patch has an sha1 key that Malice manufacured to match that of an 
already existing, 'dangerous' object in your database.

With tens of thousands (or hundreds of thousands) of objects expected in 
the repository sooner or later, there's quite a selection to pick from.  
Once you apply the patch, instead of the expected new file that you 
reviewed and found safe, the attacker has the other object included in 
the official kernel.

A dangerous object can be anything: e.g. a debugging hack that allows 
arbitrary kernel-space writes. Or a known-insecure module (which since 
then got fixed, but the buggy code still exists in the DB). The module 
is in a single file and is self-installing (e.g. it has __init code to 
register itself as some driver.)

Malice might even previously plant a dangerous object as some 'firmware 
module' in another arcane driver, which doesnt get compiled by default, 
but still shows up in the DB. Or Malice might plant a dangerous object 
via an innocent-looking documentation file.  (which contains some sample 
code and is called sample.txt)

this type of 'false sharing attack' can only be prevented if an object 
is only 'shared' with another object if it has been memcmp-ed with the 
object in the repository. I.e. if we trust the sharing decision! Once 
the attack has occured it cannot be detected automatically: only people 
will notice it. (why did that weird unrelated module show up in that old 
driver?)

The compromise relies on you having reviewed something harmless, while 
in reality what happened within the DB was far less harmless. And the DB 
remains self-consistent: neither fsck, nor others importing your tree 
will be able to detect the compromise. This attack can only be detected 
when you apply the patch, after that point all the information (except 
Malice's message in your inbox) is gone.

so unless we actively check for collisions, once an sha1 key can be 
generated at will on near-arbitrary input, it's not a secure system 
anymore. We might be lucky and safe, but we wont be secure.

	Ingo

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-17 14:52                                         ` Ingo Molnar
@ 2005-04-17 15:08                                           ` Brad Roberts
  2005-04-17 15:18                                             ` Ingo Molnar
  2005-04-17 15:28                                           ` Ingo Molnar
  1 sibling, 1 reply; 148+ messages in thread
From: Brad Roberts @ 2005-04-17 15:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Petr Baudis, Simon Fowler, David Lang, git

On Sun, 17 Apr 2005, Ingo Molnar wrote:

> Date: Sun, 17 Apr 2005 16:52:32 +0200
> From: Ingo Molnar <mingo@elte.hu>
> To: Linus Torvalds <torvalds@osdl.org>
> Cc: Petr Baudis <pasky@ucw.cz>, Simon Fowler <simon@himi.org>,
>      David Lang <david.lang@digitalinsight.com>, git@vger.kernel.org
> Subject: Re: Re: Merge with git-pasky II.
>
>
> * Linus Torvalds <torvalds@osdl.org> wrote:
>
> > Almost all attacks on sha1 will depend on _replacing_ a file with a
> > bogus new one. So guys, instead of using sha256 or going overboard,
> > just make sure that when you synchronize, you NEVER import a file you
> > already have.
>
> With tens of thousands (or hundreds of thousands) of objects expected in
> the repository sooner or later, there's quite a selection to pick from.
> Once you apply the patch, instead of the expected new file that you
> reviewed and found safe, the attacker has the other object included in
> the official kernel.
>
> A dangerous object can be anything: e.g. a debugging hack that allows
> arbitrary kernel-space writes. Or a known-insecure module (which since
> then got fixed, but the buggy code still exists in the DB). The module
> is in a single file and is self-installing (e.g. it has __init code to
> register itself as some driver.)

While I agree that a hash collision is bad and certainly worth preventing
during new object creation, for it to actually implant a trojan in a build
successfully it'd have to meet even more criteria than you've layed out.
It'd have to...

  - be shadowing an object that's part of an active tree
  - provide all the public symbols the shadowed object provided so that it
    would still build and link successfully

Shadowing an object that's not part of the working tree means something on
another branch or obsoleted some time in the past is still db corruption,
but not nearly as big an issue from a trojan standpoint.

Later,
Brad



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-17 15:08                                           ` Brad Roberts
@ 2005-04-17 15:18                                             ` Ingo Molnar
  0 siblings, 0 replies; 148+ messages in thread
From: Ingo Molnar @ 2005-04-17 15:18 UTC (permalink / raw)
  To: Brad Roberts; +Cc: Linus Torvalds, Petr Baudis, Simon Fowler, David Lang, git


* Brad Roberts <braddr@puremagic.com> wrote:

> While I agree that a hash collision is bad and certainly worth 
> preventing during new object creation, for it to actually implant a 
> trojan in a build successfully it'd have to meet even more criteria 
> than you've layed out. It'd have to...

>   - provide all the public symbols the shadowed object provided so that it
>     would still build and link successfully

that's not a problem. Most modules dont provide public symbols.  
Especially not drivers. Generally it's the modules that _dont_ have any 
global impact that get reviewed less stringently - an attacker would 
thus choose them for psychological reasons anyway.

>   - be shadowing an object that's part of an active tree
>
> Shadowing an object that's not part of the working tree means 
> something on another branch or obsoleted some time in the past is 
> still db corruption, but not nearly as big an issue from a trojan 
> standpoint.

it's not DB corruption, it's a feature of GIT: it's a content _cache_, 
new and old alike. Nothing in GIT says that old objects in the 
repository (which are still very much part of history) cannot be revived 
in newer trees. (in fact it regularly happens - e.g. if a fix is undone 
manually.)

	Ingo

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-17 14:52                                         ` Ingo Molnar
  2005-04-17 15:08                                           ` Brad Roberts
@ 2005-04-17 15:28                                           ` Ingo Molnar
  2005-04-17 17:34                                             ` Linus Torvalds
  1 sibling, 1 reply; 148+ messages in thread
From: Ingo Molnar @ 2005-04-17 15:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Simon Fowler, David Lang, git


* Ingo Molnar <mingo@elte.hu> wrote:

> The compromise relies on you having reviewed something harmless, while 
> in reality what happened within the DB was far less harmless. And the 
> DB remains self-consistent: neither fsck, nor others importing your 
> tree will be able to detect the compromise. This attack can only be 
> detected when you apply the patch, after that point all the 
> information (except Malice's message in your inbox) is gone.

in fact, this attack cannot even be proven to be malicious, purely via 
the email from Malice: it could be incredible bad luck that caused that 
good-looking patch to be mistakenly matching a dangerous object.

In fact this could happen even today, _accidentally_. (but i'm willing 
to bet that hell will be freezing over first, and i'll have some really 
good odds ;) There's probably a much higher likelyhood of Linus' tree 
getting corrupted in some old fashioned way and introducing a security 
hole by accident)

	Ingo

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-17 15:28                                           ` Ingo Molnar
@ 2005-04-17 17:34                                             ` Linus Torvalds
  2005-04-17 22:12                                               ` Herbert Xu
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-17 17:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Petr Baudis, Simon Fowler, David Lang, git



On Sun, 17 Apr 2005, Ingo Molnar wrote:
> 
> in fact, this attack cannot even be proven to be malicious, purely via 
> the email from Malice: it could be incredible bad luck that caused that 
> good-looking patch to be mistakenly matching a dangerous object.

I really hate theoretical discussions. 

The fact is, a lot of _crap_ engineering gets done because of the question
"what if?". It results in over-engineering, often to the point where the 
end result is quite a lot measurably worse than the sane results.

You are _literally_ arguing for the equivalent of "what if a meteorite hit
my plane while it was in flight - maybe I should add three inches of
high-tension armored steel around the plane, so that my passengers would
be protected".

That's not engineering. That's five-year-olds discussing building their
imaginary forts ("I want gun-turrets and a mechanical horse one mile high,
and my command center is 5 miles under-ground and totally encased in 5
meters of lead").

I absolutely _hate_ doing engineering on the principle of "this might be
possible in theory", and I'm violently opposed to it. So far, I have not
heard a single argument that I consider even _remotely_ likely.

The thing is, even if you can force a hash collission by sending somebody 
a patch, it's really pretty much almost guaranteed that the patch is not 
just "a few strange characters", unless sha1 is really broken to the point 
where it's not cryptographically secure _at_all_.

In other words, unless somebody finds a way to make sha1 appear as nothing
more than a complicated set of parity bits, all brute-force "get the same
sha1" is likely to be about generating a really strange blob based on the
thing you want to replace - and by "really strange" I mean total binary
crap. And likely _much_ bigger too. And by "much bigger" I mean "possibly
gigabytes of data".

And the thing is, _if_ somebody finds a way to make sha1 act as just a
complex parity bit, and comes up with generating a clashing object that
actually makes sense, then going to sha256 is likely pointless too - I
think the algorithm is basically the same, just with more bits. If you've
broken sha1 to the point where it's _that_ breakable, then you've likely
broken sha256 too. Nobody has ever proven that you couldn't break sha256 
with some really clever algorithm...

So if you start playing "what if?" games, dammit, I can play mine.

If we want to have any kind of confidence that the hash is reall
yunbreakable, we should make it not just longer than 160 bits, we should
make sure that it's two or more hashes, and that they are based on totally
different principles.

And we should all digitally sign every single object too, and we should
use 4096-bit PGP keys and unguessable passphrases that are at least 20
words in length. And we should then build a bunker 5 miles underground,
encased in lead, so that somebody cannot flip a few bits with a ray-gun, 
and make us believe that the sha1's match when they don't. Oh, and we need 
to all wear aluminum propeller beanies to make sure that they don't use 
that ray-gun to make us do the modification _outselves_.

And the thing is, that's just crazy talk. The difference between a crazy
person and an intelligent one is that the crazy one doesn't realize what
makes sense in the world. The goal of good engineering is not to ask "what
if?", but to ask "how do I make this work as well as possible".

So please stop with the theoretical sha1 attacks. It is simply NOT TRUE
that you can generate an object that looks halfway sane and still gets you
the sha1 you want. Even the "breakage" doesn't actually do that.  And if
it ever _does_ become true, it will quite possibly be thanks to some
technology that breaks other hashes too.

So until proven otherwise, I worry about accidental hashes, and in 160
bits of good hashing, that just isn't an issue either. Anybody who
compares a 128-bit md5-sum to a 160-bit sha1 doesn't understand the math.  
It didn't get "slightly less likely" to happen. It got so _unbelievably_
less likely to happen that it's not even funny.

				Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-17 17:34                                             ` Linus Torvalds
@ 2005-04-17 22:12                                               ` Herbert Xu
  2005-04-17 22:35                                                 ` Linus Torvalds
  2005-04-18  4:16                                               ` Sanjoy Mahajan
  2005-04-18  7:42                                               ` Ingo Molnar
  2 siblings, 1 reply; 148+ messages in thread
From: Herbert Xu @ 2005-04-17 22:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mingo, pasky, simon, david.lang, git

Linus Torvalds <torvalds@osdl.org> wrote:
> 
> If we want to have any kind of confidence that the hash is reall
> yunbreakable, we should make it not just longer than 160 bits, we should
> make sure that it's two or more hashes, and that they are based on totally
> different principles.

Sorry, it has already been shown that combining two difference hashes
doesn't necessarily provide the security that you would hope.

I think what hasn't been discussed here is the cost of actually doing
the comparisons.  In other words, what is the minimum number of
comparisons we can get away and still deal with hash collisions
successfully?

Once we know what the cost is then we can decide whether it's worthwhile
considering the odds involved.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-17 22:12                                               ` Herbert Xu
@ 2005-04-17 22:35                                                 ` Linus Torvalds
  2005-04-17 23:29                                                   ` Herbert Xu
  0 siblings, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-17 22:35 UTC (permalink / raw)
  To: Herbert Xu; +Cc: mingo, pasky, simon, david.lang, git



On Mon, 18 Apr 2005, Herbert Xu wrote:
> 
> Sorry, it has already been shown that combining two difference hashes
> doesn't necessarily provide the security that you would hope.

Sorry, that's not true.

Quite the reverse. Again, you bring up totally theoretical arguments. In 
_practice_ it has indeed been shown that using two hashes _does_ catch 
hash colissions.

The trivial example is using md5 sums with a length. The "length" is a 
rally bad "hash" of the file contents too. And the fact is, that simple 
combination of hashes has proven to be more resistant to attack than the 
hash itself. It clearly _does_ make a difference in practice.

So _please_, can we drop the obviously bogus "in theory" arguments. They 
do not matter. What matters is practice.

And the fact is, in _theory_ we don't know if somebody may be trivially
able to break any particular hash. But in practice we do know that it's
less likely that you can break a combination of two totally unrelated
hashes than you break one particular one.

NOTE! I'm not actually arguing that we should do that. I'm actually
arguing totally the reverse: I'm arguing that there is a fine line between
being "very very careful" and being "crazy to the point of being
incompetent".

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-17 22:35                                                 ` Linus Torvalds
@ 2005-04-17 23:29                                                   ` Herbert Xu
  2005-04-17 23:34                                                     ` Petr Baudis
  2005-04-17 23:50                                                     ` Linus Torvalds
  0 siblings, 2 replies; 148+ messages in thread
From: Herbert Xu @ 2005-04-17 23:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mingo, pasky, simon, david.lang, git

On Sun, Apr 17, 2005 at 03:35:17PM -0700, Linus Torvalds wrote:
> 
> Quite the reverse. Again, you bring up totally theoretical arguments. In 
> _practice_ it has indeed been shown that using two hashes _does_ catch 
> hash colissions.
> 
> The trivial example is using md5 sums with a length. The "length" is a 
> rally bad "hash" of the file contents too. And the fact is, that simple 
> combination of hashes has proven to be more resistant to attack than the 
> hash itself. It clearly _does_ make a difference in practice.

I wasn't disputing that of course.  However, the same effect can be
achieved in using a single hash with a bigger length, e.g., sha256
or sha512.

> So _please_, can we drop the obviously bogus "in theory" arguments. They 
> do not matter. What matters is practice.

I agree.  However, what is the actual cost in practice of detecting
collisions?

I get the feeling that it isn't that bad.  For example, if we did it
at the points where the blobs actually entered the tree, then the cost
is always proportional to the change size (the number of new blobs).

Is this really that bad considering that the average blob isn't very
big?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-17 23:29                                                   ` Herbert Xu
@ 2005-04-17 23:34                                                     ` Petr Baudis
  2005-04-17 23:53                                                       ` Kenneth Johansson
  2005-04-18  0:49                                                       ` Herbert Xu
  2005-04-17 23:50                                                     ` Linus Torvalds
  1 sibling, 2 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-17 23:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Linus Torvalds, mingo, simon, david.lang, git

Dear diary, on Mon, Apr 18, 2005 at 01:29:05AM CEST, I got a letter
where Herbert Xu <herbert@gondor.apana.org.au> told me that...
> I get the feeling that it isn't that bad.  For example, if we did it
> at the points where the blobs actually entered the tree, then the cost
> is always proportional to the change size (the number of new blobs).

No. The collision check is done in the opposite cache - when you want to
write a blob and there is already a file of the same hash in the tree.
So either the blob is already in the database, or you have a collision.

Therefore, the cost is proportional to the size of what stays unchanged.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-17 23:29                                                   ` Herbert Xu
  2005-04-17 23:34                                                     ` Petr Baudis
@ 2005-04-17 23:50                                                     ` Linus Torvalds
  1 sibling, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-17 23:50 UTC (permalink / raw)
  To: Herbert Xu; +Cc: mingo, pasky, simon, david.lang, git



On Mon, 18 Apr 2005, Herbert Xu wrote:
> 
> I wasn't disputing that of course.  However, the same effect can be
> achieved in using a single hash with a bigger length, e.g., sha256
> or sha512.

No it cannot.

If somebody actually literally totally breaks that hash, length won't 
matter. There are (bad) hashes where you can literally edit the content of 
the file, and make sure that the end result has the same hash.

In that case, when the hash algorithm has actually been broken, the length 
of the hash ends up being not very relevant. 

For example, you might "hash" your file by blocking it up in 16-byte
blocks, and xoring all blocks together - the result is a 16-byte hash.  
It's a terrible hash, and obviously trivially breakable, and once broken
it does _not_ help to make it use its 32-byte cousin. Not at all. You can 
just modify the breaking thing to equally cheaply make modifications to a 
file and get the 32-byte hash "right" again.

Is that kind of breakage likely for sha1? Hell no. Is it possible? In your 
"in theory" world where practice doesn't matter, yes.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-17 23:34                                                     ` Petr Baudis
@ 2005-04-17 23:53                                                       ` Kenneth Johansson
  2005-04-18  0:49                                                       ` Herbert Xu
  1 sibling, 0 replies; 148+ messages in thread
From: Kenneth Johansson @ 2005-04-17 23:53 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, mingo, simon, david.lang, git

Petr Baudis wrote:
> Dear diary, on Mon, Apr 18, 2005 at 01:29:05AM CEST, I got a letter
> where Herbert Xu <herbert@gondor.apana.org.au> told me that...
> 
>>I get the feeling that it isn't that bad.  For example, if we did it
>>at the points where the blobs actually entered the tree, then the cost
>>is always proportional to the change size (the number of new blobs).
> 
> 
> No. The collision check is done in the opposite cache - when you want to
> write a blob and there is already a file of the same hash in the tree.
> So either the blob is already in the database, or you have a collision.
> 
> Therefore, the cost is proportional to the size of what stays unchanged.
> 

?? now I'm confused. Surly the only cost involved is to never write over 
a file that already exist in the cache and that is already done NOW as 
far as I read the code. So there is NO extra cost in detecting an collision.






^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-17 23:34                                                     ` Petr Baudis
  2005-04-17 23:53                                                       ` Kenneth Johansson
@ 2005-04-18  0:49                                                       ` Herbert Xu
  2005-04-18  0:55                                                         ` Petr Baudis
  1 sibling, 1 reply; 148+ messages in thread
From: Herbert Xu @ 2005-04-18  0:49 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, mingo, simon, david.lang, git

On Mon, Apr 18, 2005 at 01:34:41AM +0200, Petr Baudis wrote:
>
> No. The collision check is done in the opposite cache - when you want to
> write a blob and there is already a file of the same hash in the tree.
> So either the blob is already in the database, or you have a collision.
> Therefore, the cost is proportional to the size of what stays unchanged.

This is only true if we're calling update-cache on all unchanged files.
If that's what git is doing then we're in trouble anyway.

Remember that prior to the collision check we've already spent the
effort in

1) Compressing the file.
2) Computing a SHA1 hash on the result.

These two steps together (especially the first one) is much more
expensive than a file content comparison of the blob versus what's
already in the tree.

Somehow I have a hard time seeing how this can be at all efficient if
we're compressing all checked out files including those which are
unchanged.

Therefore the only conclusion I can draw is that we're only calling
update-cache on the set of changed files, or at most a small superset
of them.  In that case, the cost of the collision check *is* proportional
to the size of the change.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-18  0:49                                                       ` Herbert Xu
@ 2005-04-18  0:55                                                         ` Petr Baudis
  0 siblings, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-18  0:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Linus Torvalds, mingo, simon, david.lang, git

Dear diary, on Mon, Apr 18, 2005 at 02:49:06AM CEST, I got a letter
where Herbert Xu <herbert@gondor.apana.org.au> told me that...
> Therefore the only conclusion I can draw is that we're only calling
> update-cache on the set of changed files, or at most a small superset
> of them.  In that case, the cost of the collision check *is* proportional
> to the size of the change.

Yes, of course, sorry for the confusion.  We only consider files you
either specify manually or which have their stat metadata changed
relative to the directory cache. (That is from the git-pasky
perspective; from the plumbing perspective, the user just does
update-cache on whatever he picks.)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-17 17:34                                             ` Linus Torvalds
  2005-04-17 22:12                                               ` Herbert Xu
@ 2005-04-18  4:16                                               ` Sanjoy Mahajan
  2005-04-18  7:42                                               ` Ingo Molnar
  2 siblings, 0 replies; 148+ messages in thread
From: Sanjoy Mahajan @ 2005-04-18  4:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Petr Baudis, Simon Fowler, David Lang, git

> So until proven otherwise, I worry about accidental hashes, and in
> 160 bits of good hashing, that just isn't an issue either...[Going
> from 128 bits to 160 bits made it] so _unbelievably_ less likely to
> happen that it's not even funny.

You are right.  Here's how I learnt to stop worrying and love the 160
bits.

A 160-bit hash requires 2^80=10^24 files before the collision
probability is roughly 0.5 (actually 1-e^{-1/2}).  Now be very
conservative: Instead of tolerating a 0.5 probability, worry about
even a 10^-8 probability of a collision anywhere, anytime.

The magic number of files for that probability is 10^20 (roughly 10^40
pairs for 2^160=10^48 boxes).

Given 10 billion people using git, each producing 1 source file per
second -- busy beavers all -- they would need 300 years to produce
10^20 files.  And to reach the 10^-8 collision probability, all 10^20
files must belong to the same project, and even OpenOffice will not be
that bloated.

-Sanjoy

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Re: Merge with git-pasky II.
  2005-04-17 17:34                                             ` Linus Torvalds
  2005-04-17 22:12                                               ` Herbert Xu
  2005-04-18  4:16                                               ` Sanjoy Mahajan
@ 2005-04-18  7:42                                               ` Ingo Molnar
  2 siblings, 0 replies; 148+ messages in thread
From: Ingo Molnar @ 2005-04-18  7:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Simon Fowler, David Lang, git


* Linus Torvalds <torvalds@osdl.org> wrote:

> On Sun, 17 Apr 2005, Ingo Molnar wrote:
> > 
> > in fact, this attack cannot even be proven to be malicious, purely via 
> > the email from Malice: it could be incredible bad luck that caused that 
> > good-looking patch to be mistakenly matching a dangerous object.
> 
> I really hate theoretical discussions.

i was only replying to your earlier point:

> > > Almost all attacks on sha1 will depend on _replacing_ a file with 
> > > a bogus new one. So guys, instead of using sha256 or going 
> > > overboard, just make sure that when you synchronize, you NEVER 
> > > import a file you already have.

which point i still believe is subtly wrong. You were suggesting to 
concentrate on file replacement to counter most of the practical 
attacks, while i pointed out an attack _using the same basic mechanism 
that your point above supposed_.

[ if you can replace a file with a known hash, with a bogus new one, and 
  you still have enough control over the contents of your bogus new file 
  that it is 1) a valid file that builds 2) compromises the kernel, then 
  you likely have the same amount of control my 'theoretical' attack
  requires. ]

> And the thing is, _if_ somebody finds a way to make sha1 act as just a 
> complex parity bit, and comes up with generating a clashing object 
> that actually makes sense, then going to sha256 is likely pointless 
> too [...]

yes, that's why i suggested to not actually trust the hash to be 
cryptographically secure, but to just assume it's a good generic hash we 
can design a DB around, and to turn -DCOLLISION_CHECK on and enforce 
consistency rules on boundaries.

[ it's not bad to keep sha1 because even my suggested enhancement still
  leaves 'content-less trust-pointers to untrusted content via email'
  vectors open against attack (maintainer sends you an email that commit
  X in Malice's repository Y is fine to pull, and you pull it blindly,
  while the attacker has replaced his content with the compromised one
  meanwhile), but it at least validates the bulk traffic that goes into
  the DB: patches via emails and trusted repositories. ]

so all i was suggesting was to extend your suggested 'overwrite 
collision check' to a stricter 'content we throw away and use the sha1 
shortcut for needs to be checked against the in-DB content as well'.

in other words, your suggested 'rename check' is checking for 'positive 
duplicate content', while my addition would also check for 'negative 
duplicate content' as well.

but as usual, i could be wrong, so dont take this too serious :-)

	Ingo

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
@ 2005-04-26 18:55 Bram Cohen
  2005-04-26 19:58 ` Linus Torvalds
  0 siblings, 1 reply; 148+ messages in thread
From: Bram Cohen @ 2005-04-26 18:55 UTC (permalink / raw)
  To: git

(my apologies for responding to old messages, I only just subscribed to
this list)

Linus Torvalds wrote:
> On Thu, 14 Apr 2005, Junio C Hamano wrote:
> >
> > You say "merge these two trees" above (I take it that you mean
> > "merge these two trees, taking account of this tree as their
> > common ancestor", so actually you are dealing with three trees),
>
> Yes. We're definitely talking three trees.

The LCA for different files might be at different points in the history.
Forcing them to all come from the same point produces very bad merges.

> The fact is, we know how to make tree merges unambiguous, by just
> totally ignoring the history between them.  Ie we know how to merge
> data. I am pretty damn sure that _nobody_ knows how to merge "data over
> time".

You're incorrect. Codeville does exactly that (history-aware merges which
do the right thing even in cases where 3-way merge can't)

> > This however opens up another set of can of worms---it would
> > involve not just three trees but all the trees in the commit
> > chain in between.
>
> Exactly.  I seriously believe that the model is _broken_, simply because
> it gets too complicated. At some point it boils down to "keep it simple,
> stupid".

The Codeville merge algorithm is also quite simple, and is already
implemented and mature.

> I've not even been convinved that renames are worth it. Nobody has
> really given a good reason why.

If one person renames a file and another person modifies it then the
changes should be applied to the moved file.

Also, there's the directory rename case where one person moves a directory
and another person adds a file to it, in which case the file should be
moved to the new directory location on merge. I gather than BK doesn't
support this functionality, but Codeville and Monotone both do.

>    I think you might as well interpret the whole object thing. Git
> _does_ tell you how the objects changed, and I actually believe that a
> diff that works in between objects (ie can show "these lines moved from
> this file X to tjhat file Y") is a _hell_ of a lot more powerful than
> "rename"  is.
>
>    So I'd seriously suggest that instead of worryign about renames,
> people think about global diffs that aren't per-file. Git is good at
> limiting the changes to a set of objects, and it should be entirely
> possible to think of diffs as ways of moving lines _between_ objects and
> not just within objects. It's quite common to move a function from one
> file to another - certainly more so than renaming the whole file.
>
> In other words, I really believe renames are just a meaningless special
> case of a much more interesting problem. Which is just one reason why
> I'm not at all interested in bothering with them other than as a "data
> moved" thing, which git already handles very well indeed.

Nothing, not eveny our beloved BitKeeper, has 'move lines between files'
functionality, and for good reason.

To begin with, it's behaviorally extremely dubious. It would be not
uncommon for the system to erroneously think that some files deleted from
one file were added to another, and then further changes down the line
would cause random unrelated files to get modified in unpredictable ways
when merges happened.

Also, it presents a completely unsolved UI problem. If one person moves
lines 5-15 of file A to file B, and another person concurrently rewrites
lines 10-20 of file A, how on earth is that supposed to be presented to
the user? Codeville can support line moves *within* files just fine, but
doesn't do it because of the UI problem of presenting all the corner
cases. Maybe someday somebody will do a PhD thesis on that topic and we'll
add it, but until then we're sticking with the basic functionality.

Honestly, that you would think of doing whole-tree three-way merges and
even consider moving lines between files shows that you haven't explored
the merge problem very deeply. This is a much harder problem than you
think it is, and one which has already been solved by other systems.

-Bram


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 18:55 Bram Cohen
@ 2005-04-26 19:58 ` Linus Torvalds
  2005-04-26 20:30   ` Tom Lord
                     ` (2 more replies)
  0 siblings, 3 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-26 19:58 UTC (permalink / raw)
  To: Bram Cohen; +Cc: git



On Tue, 26 Apr 2005, Bram Cohen wrote:
> 
> If one person renames a file and another person modifies it then the
> changes should be applied to the moved file.

Bzzt. Wrong answer.

The _right_ answer is "if one person moves a function, and another person 
modifies the function, the changes should be applied to the moved 
function".

Which is clearly a _much_ more common case than file renames.

In other words, if your algorithm doesn't handle the latter, then there is 
no point in handling the former either.

And _if_ your algorithm handles the latter, then there's no point in 
handling file renames specially, since the algorithm will have done that 
too, as a small part of it.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 19:58 ` Linus Torvalds
@ 2005-04-26 20:30   ` Tom Lord
  2005-04-26 20:31   ` Bram Cohen
  2005-04-26 20:31   ` Daniel Barkalow
  2 siblings, 0 replies; 148+ messages in thread
From: Tom Lord @ 2005-04-26 20:30 UTC (permalink / raw)
  To: torvalds; +Cc: bram, git



   From: Linus Torvalds <torvalds@osdl.org>

   On Tue, 26 Apr 2005, Bram Cohen wrote:
   > 
   > If one person renames a file and another person modifies it then the
   > changes should be applied to the moved file.

   Bzzt. Wrong answer.

You're a little bit nuts, guy.

-t

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 19:58 ` Linus Torvalds
  2005-04-26 20:30   ` Tom Lord
@ 2005-04-26 20:31   ` Bram Cohen
  2005-04-26 20:39     ` Tom Lord
                       ` (2 more replies)
  2005-04-26 20:31   ` Daniel Barkalow
  2 siblings, 3 replies; 148+ messages in thread
From: Bram Cohen @ 2005-04-26 20:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds wrote:

> On Tue, 26 Apr 2005, Bram Cohen wrote:
> >
> > If one person renames a file and another person modifies it then the
> > changes should be applied to the moved file.
>
> Bzzt. Wrong answer.

I'm trying to be polite. You're not making that easy.

> The _right_ answer is "if one person moves a function, and another person
> modifies the function, the changes should be applied to the moved
> function".

Now that you're done being dismissive, could you either (a) rebut my quite
detailed explanation of exactly why that functionality is both a dubious
idea and difficult to implement, or (b) admit that you have no plans
whatsoever for supporting any of this stuff? You can't have it both ways.

What I'd really like to hear is some explanation of why git is
reimplementing all of this stuff from scratch. Your implicit claims that
git will do more things than the other systems without having to reinvent
all of their functionality first are, honestly, naive, ill-informed
arrogance.

I'd like to reiterate that *nothing* out there supports moving lines
between files, and further predict, with total confidence, that if git
tries to support such functionality it will simply fail, either by giving
up or creating a system which can behave horribly. Before you get all
dismissive about this claim, please remember that I've spent years
thinking about merge algorithms, and have actually designed and
implemented them, and have spoken at length with other people who have
done the same, while you've merely thought about them for a few weeks.

> Which is clearly a _much_ more common case than file renames.

Even if we pretend that these are comparable features, that's far from
clearly true. Function moves within a file occur more frequently, but a
file rename moves *all* the functions within that file.

> In other words, if your algorithm doesn't handle the latter, then there is
> no point in handling the former either.

If someone offers you a dollar, no strings attached, do you turn them down
because they didn't offer you ten?

> And _if_ your algorithm handles the latter, then there's no point in
> handling file renames specially, since the algorithm will have done that
> too, as a small part of it.

In case these concepts got conflated, I'd like to point out that Codeville
merge both supports renames *and* does better than three-way merge can do
at merging a single, non-renamed file. In most cases three-way and
codeville merge give the same answer, but there are some cases where there
isn't a single appropriate LCA available, and in those cases codeville
will do the right thing while three-way can't.

-Bram


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 19:58 ` Linus Torvalds
  2005-04-26 20:30   ` Tom Lord
  2005-04-26 20:31   ` Bram Cohen
@ 2005-04-26 20:31   ` Daniel Barkalow
  2005-04-26 20:44     ` Tom Lord
  2 siblings, 1 reply; 148+ messages in thread
From: Daniel Barkalow @ 2005-04-26 20:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Bram Cohen, git

On Tue, 26 Apr 2005, Linus Torvalds wrote:

> 
> 
> On Tue, 26 Apr 2005, Bram Cohen wrote:
> > 
> > If one person renames a file and another person modifies it then the
> > changes should be applied to the moved file.
> 
> Bzzt. Wrong answer.
> 
> The _right_ answer is "if one person moves a function, and another person 
> modifies the function, the changes should be applied to the moved 
> function".

I'd even go so far as to say that we need to have the user sign off on
this. The modification is reasonably likely to not work in the new
location, due to use of statics that aren't available or something of the
sort.

I suspect that we want to report a conflict in the new location between
the old version and the new version, and let the person merging check
whether the change is okay. That is, both sides modified the same section
at the same time: one side modified its content, while the other modified
its location. We can give a good suggestion, but we need a final ruling.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 20:31   ` Bram Cohen
@ 2005-04-26 20:39     ` Tom Lord
  2005-04-26 20:58     ` Linus Torvalds
  2005-04-26 21:26     ` Diego Calleja
  2 siblings, 0 replies; 148+ messages in thread
From: Tom Lord @ 2005-04-26 20:39 UTC (permalink / raw)
  To: bram; +Cc: git


  > What I'd really like to hear is some explanation of why git is
  > reimplementing all of this stuff from scratch.

Whatever Linus' reasons, it's not a bad exercise and it does help
robustify the kernel project to roll its own.  Not that he seems to be
doing *this* part especially well or anything, but that doesn't matter
from the perspective of the good reasons to do it.

It doesn't matter much -- get stuff into git and people can layer on
that pretty gently.  The low layers of git are a common idea but
context and Linus' nifty code make this instance of the idea a bit
of a gem.



  > please remember that I've spent years
  > thinking about merge algorithms

I'm surprised we haven't met sooner.  If we did and I forgot, sorry.

-t

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 20:31   ` Daniel Barkalow
@ 2005-04-26 20:44     ` Tom Lord
  0 siblings, 0 replies; 148+ messages in thread
From: Tom Lord @ 2005-04-26 20:44 UTC (permalink / raw)
  To: barkalow; +Cc: torvalds, bram, git




^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 20:31   ` Bram Cohen
  2005-04-26 20:39     ` Tom Lord
@ 2005-04-26 20:58     ` Linus Torvalds
  2005-04-26 21:25       ` Linus Torvalds
  2005-04-26 21:28       ` Bram Cohen
  2005-04-26 21:26     ` Diego Calleja
  2 siblings, 2 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-26 20:58 UTC (permalink / raw)
  To: Bram Cohen; +Cc: git



On Tue, 26 Apr 2005, Bram Cohen wrote:
> 
> Now that you're done being dismissive, could you either (a) rebut my quite
> detailed explanation of exactly why that functionality is both a dubious
> idea and difficult to implement, or (b) admit that you have no plans
> whatsoever for supporting any of this stuff? You can't have it both ways.

I'm absolutely not going to do it myself, you're right about that.

I just don't like your notion that you should support the 5% problem with
ugly hacks, and then you dismiss the 95% problem with "nothing else does 
it either".

In other words, we're already merging manually for the 95%. Why do you 
think the 5% is so important?

> What I'd really like to hear is some explanation of why git is
> reimplementing all of this stuff from scratch.

Git does in ~5000 lines and two weeks of work what _I_ think is the right 
thing to do. You're welcome to disagree, but the fact is, people have 
whined and moaned about my use of BK FOR THREE YEARS without showing me 
any better alternatives.

So why are you complaining now, when I implement my own version in two
weeks?

> If someone offers you a dollar, no strings attached, do you turn them down
> because they didn't offer you ten?

"no strings attached"?

There are lots of strings attached to the "follow renames" thing. There's 
30 _years_ of strings attached, and they result in people not looking at 
the _interesting_ problem.

Exactly because people follow renames, they think that they have the 
history of the code, but then they ignore the fact that they don't - 
because it doesn't follow merging of code or splitting of code.

In other words, it's sometimes better to know that you don't know the
answer, than it is to _think_ that you know the answer.

> In case these concepts got conflated, I'd like to point out that Codeville
> merge both supports renames *and* does better than three-way merge can do
> at merging a single, non-renamed file.

And I'd like to point out (again) that git doesn't actually care what 
merge strategy the user uses. 

Me _personally_, I want to have something that is very repeatable and
non-clever. Something I understand _or_ tells me that it can't do it. And
quite frankly, merging single-file history _without_ taking all the other
files' history into account makes me go "ugh".

That's why I like the "we do _not_ look for 'nearer' parents on a per-file
basis", which is what this discussion started with, afaik. I think the 
only original source that makes sense is the least common parent for the 
whole _project_, exactly because other files have done things that may or 
may not depend on the history of the file you're merging.

I (and thus git) really takes a "whole project" approach. 

			Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 20:58     ` Linus Torvalds
@ 2005-04-26 21:25       ` Linus Torvalds
  2005-04-26 21:28       ` Bram Cohen
  1 sibling, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-26 21:25 UTC (permalink / raw)
  To: Bram Cohen; +Cc: git



On Tue, 26 Apr 2005, Linus Torvalds wrote:
> 
> > What I'd really like to hear is some explanation of why git is
> > reimplementing all of this stuff from scratch.
> 
> Git does in ~5000 lines and two weeks of work what _I_ think is the right 
> thing to do. You're welcome to disagree, but the fact is, people have 
> whined and moaned about my use of BK FOR THREE YEARS without showing me 
> any better alternatives.

Btw, I've also been pretty disgusted by SCM's apparently generally caring 
about stuff that is totally not relevant.

For example, it seems like most SCM people think that merging is about
getting the end result of two conflicting patches right.

In my opinion, that's the _least_ important part of a merge. Maybe the 
kernel is very unusual in this, but basically true _conflicts_ are not 
only rare, but they tend to be things you want a human to look at 
regardless.

The important part of a merge is not how it handles conflicts (which need
to be verified by a human anyway if they are at all interesting), but that
it should meld the history together right so that you have a new solid
base for future merges.

In other words, the important part is the _trivial_ part: the naming of
the parents, and keeping track of their relationship. Not the clashes.

For example, CVS gets this part totally wrong. Sure, it can merge the 
contents, but it totally ignores the important part, so once you've done a 
merge, you're pretty much up shit creek wrt any subsequent merges in any 
other direction. All the other CVS problems pale in comparison. Renames? 
Just a detail.

And it looks like 99% of SCM people seem to think that the solution to 
that is to be more clever about content merges. Which misses the point 
entirely.

Don't get me wrong: content merges are nice, but they are _gravy_. They
are not important. You can do them manually if you have to. What's
important is that once you _have_ done them (manually or automatically),
the system had better be able to go on, knowing that they've been done.

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 20:31   ` Bram Cohen
  2005-04-26 20:39     ` Tom Lord
  2005-04-26 20:58     ` Linus Torvalds
@ 2005-04-26 21:26     ` Diego Calleja
  2 siblings, 0 replies; 148+ messages in thread
From: Diego Calleja @ 2005-04-26 21:26 UTC (permalink / raw)
  To: Bram Cohen; +Cc: torvalds, git

El Tue, 26 Apr 2005 13:31:31 -0700 (PDT),
Bram Cohen <bram@bitconjurer.org> escribió:

> Even if we pretend that these are comparable features, that's far from
> clearly true. Function moves within a file occur more frequently, but a
> file rename moves *all* the functions within that file.

Renaming or moving files is _so_ rare and unusual that even not implementing
it (like CVS) is hardly a big issue. Even in the linux kernel people moved subsystems
around - OSS went from drivers/sound to /sound/oss in 2.6, and a USB subdirectory
moved too, I think.

The patch got bigger. People wasted 30 seconds more of their life because the .gz
file was bigger - who cares? If it's something it's going to happen every 5 years,
I'd rather move them like CVS does rather than wasting a single second
implementing file renaming/moving...

If something so uncommon like file renaming has been implemented, I don't see
why people shouldn't implement something really useful like the thing linus
proposes, in fact it doesn't looks like a bad idea at all (and you'd get
file renaming for free, too). Perhaps it would be hard to implement and get
right, but at least it would be _useful_.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 20:58     ` Linus Torvalds
  2005-04-26 21:25       ` Linus Torvalds
@ 2005-04-26 21:28       ` Bram Cohen
  2005-04-26 21:36         ` Fabian Franz
  2005-04-26 22:25         ` Linus Torvalds
  1 sibling, 2 replies; 148+ messages in thread
From: Bram Cohen @ 2005-04-26 21:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds wrote:

> On Tue, 26 Apr 2005, Bram Cohen wrote:
> >
> > Now that you're done being dismissive, could you either (a) rebut my quite
> > detailed explanation of exactly why that functionality is both a dubious
> > idea and difficult to implement, or (b) admit that you have no plans
> > whatsoever for supporting any of this stuff? You can't have it both ways.
>
> I'm absolutely not going to do it myself, you're right about that.

Now you're just being an ass. I stated, flatly, that what you're proposing
to have done (by you or whoever, for feasibility it doesn't matter which)
is not going to happen due to just plain difficulty. You obviously
disagree with me, but rather than coming out and saying so you're
pretending I didn't make that statement.

> > What I'd really like to hear is some explanation of why git is
> > reimplementing all of this stuff from scratch.
>
> Git does in ~5000 lines and two weeks of work what _I_ think is the right
> thing to do.

So you think that a system which supports snapshots and history but has no
merging functionality whatsoever is the right thing? I'm asking this
seriously. You have a magic --make-somebody-else-do-merge command, but for
everybody else the current state of things is workable as a stopgap
measure (the original plan) but very painful for anything more.

Codeville is comparable in terms of number of lines of code to Git, by the
way.

> You're welcome to disagree, but the fact is, people have whined and
> moaned about my use of BK FOR THREE YEARS without showing me any better
> alternatives.

You were happy with BitKeeper, so why should we? Monotone and Codeville
are only just about now really mature, and you aren't exactly known as a
model customer.

> So why are you complaining now, when I implement my own version in two
> weeks?

I'm trying to tell you that the amount of time between now and when a
system as nice as BitKeeper is in use by the kernel can be dramatically
reduced by either using an existing system verbatim or basing new efforts
on one.

If you think that git as it exists right now is at all comparable to
Monotone or Codeville you're completely delusional.

> > In case these concepts got conflated, I'd like to point out that Codeville
> > merge both supports renames *and* does better than three-way merge can do
> > at merging a single, non-renamed file.
>
> And I'd like to point out (again) that git doesn't actually care what
> merge strategy the user uses.
>
> Me _personally_, I want to have something that is very repeatable and
> non-clever. Something I understand _or_ tells me that it can't do it. And
> quite frankly, merging single-file history _without_ taking all the other
> files' history into account makes me go "ugh".

Now you've just gone off the deep end. This is an apples-to-apples
comparison. Please accept one of thee following two statements:

(a) Git doesn't do merging, and none of the related new tools around it do
merging.

(b) Codeville merge (sans rename functionality) would be superior for the
merging which will be done.

-Bram


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 21:28       ` Bram Cohen
@ 2005-04-26 21:36         ` Fabian Franz
  2005-04-26 22:30           ` Linus Torvalds
  2005-04-26 22:25         ` Linus Torvalds
  1 sibling, 1 reply; 148+ messages in thread
From: Fabian Franz @ 2005-04-26 21:36 UTC (permalink / raw)
  To: Bram Cohen; +Cc: git, Linus Torvalds

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am Dienstag, 26. April 2005 23:28 schrieb Bram Cohen:
> Now you've just gone off the deep end. This is an apples-to-apples
> comparison. Please accept one of thee following two statements:
>
> (a) Git doesn't do merging, and none of the related new tools around it do
> merging.
>
> (b) Codeville merge (sans rename functionality) would be superior for the
> merging which will be done.

I have one very humble question:

Why don't you write and contribute some code for git to do good merging?

This would resolve all your problems.

I think the "magic-merge" command is quite exchangable and if your way works
good and is compatible, then people will automatically start using that.

cu

Fabian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFCbrRdI0lSH7CXz7MRAkS4AJ9JEka71M0Zc6cizXhrYpHiKHhL0gCcD/3Q
j+UnPU/cXafGjGG6Bt9mZE8=
=IYk0
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 21:28       ` Bram Cohen
  2005-04-26 21:36         ` Fabian Franz
@ 2005-04-26 22:25         ` Linus Torvalds
  2005-04-28  0:42           ` Petr Baudis
  1 sibling, 1 reply; 148+ messages in thread
From: Linus Torvalds @ 2005-04-26 22:25 UTC (permalink / raw)
  To: Bram Cohen; +Cc: git



On Tue, 26 Apr 2005, Bram Cohen wrote:
> 
> So you think that a system which supports snapshots and history but has no
> merging functionality whatsoever is the right thing?

You haven't looked at git, have you?

Git already merges better than _any_ open-source SCM out there. It just 
does it so effortlessly that you didn't even realize it does that.

Today I've done four (count them) fully automated merges on the kernel
tree: serial, networking, usb and arm.

And they took a fraction of a second (plus the download of the new
objects, which is the real cost).

This is something that SVN _still_ cannot do, for example. 

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 21:36         ` Fabian Franz
@ 2005-04-26 22:30           ` Linus Torvalds
  0 siblings, 0 replies; 148+ messages in thread
From: Linus Torvalds @ 2005-04-26 22:30 UTC (permalink / raw)
  To: Fabian Franz; +Cc: Bram Cohen, git



On Tue, 26 Apr 2005, Fabian Franz wrote:
> 
> Am Dienstag, 26. April 2005 23:28 schrieb Bram Cohen:
> > Now you've just gone off the deep end. This is an apples-to-apples
> > comparison. Please accept one of thee following two statements:
> >
> > (a) Git doesn't do merging, and none of the related new tools around it do
> > merging.
> >
> > (b) Codeville merge (sans rename functionality) would be superior for the
> > merging which will be done.
> 
> I have one very humble question:
> 
> Why don't you write and contribute some code for git to do good merging?

Don't bother. Bram doesn't know what he's talking about. 

		Linus

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: Merge with git-pasky II.
  2005-04-26 22:25         ` Linus Torvalds
@ 2005-04-28  0:42           ` Petr Baudis
  0 siblings, 0 replies; 148+ messages in thread
From: Petr Baudis @ 2005-04-28  0:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Bram Cohen, git

Dear diary, on Wed, Apr 27, 2005 at 12:25:58AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> On Tue, 26 Apr 2005, Bram Cohen wrote:
> > 
> > So you think that a system which supports snapshots and history but has no
> > merging functionality whatsoever is the right thing?
> 
> You haven't looked at git, have you?
> 
> Git already merges better than _any_ open-source SCM out there. It just 
> does it so effortlessly that you didn't even realize it does that.

Did you (or any other kernel developer reading this) actually try the
Codeville merge? (I admit I didn't get time to do anything real with it
yet.) SCM people keep praising it (as basically the best (at least
open-source) merge out there), so it would be interesting to compare
that with the actual real-world experience with it on the kernel.

> Today I've done four (count them) fully automated merges on the kernel
> tree: serial, networking, usb and arm.
> 
> And they took a fraction of a second (plus the download of the new
> objects, which is the real cost).
> 
> This is something that SVN _still_ cannot do, for example. 

I think SVN is just irrelevant here - it is a completely different
league. The contenders here are probably Codeville, Monotone and perhaps
GNU Arch offsprings.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 148+ messages in thread

end of thread, other threads:[~2005-04-28  0:37 UTC | newest]

Thread overview: 148+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-14  0:29 Merge with git-pasky II Petr Baudis
2005-04-13 21:25 ` Christopher Li
2005-04-14  0:45   ` Petr Baudis
2005-04-13 22:00     ` Christopher Li
2005-04-14  3:51     ` Linus Torvalds
2005-04-14  1:23       ` Christopher Li
2005-04-14  5:03         ` Paul Jackson
2005-04-14  2:16           ` Christopher Li
2005-04-14  6:16             ` Paul Jackson
2005-04-14  7:05       ` Junio C Hamano
2005-04-14  8:06         ` Linus Torvalds
2005-04-14  8:39           ` Junio C Hamano
2005-04-14  9:10             ` Linus Torvalds
2005-04-14 11:14               ` Junio C Hamano
2005-04-14 12:16                 ` Petr Baudis
2005-04-14 18:12                   ` Junio C Hamano
2005-04-14 18:36                     ` Linus Torvalds
2005-04-14 19:59                       ` Junio C Hamano
2005-04-14 20:20                         ` Petr Baudis
2005-04-15  0:42                         ` Linus Torvalds
2005-04-15  2:33                           ` Barry Silverman
2005-04-15 10:02                           ` David Woodhouse
2005-04-15 15:32                             ` Linus Torvalds
2005-04-15 16:01                               ` David Woodhouse
2005-04-15 16:31                                 ` C. Scott Ananian
2005-04-15 17:11                                   ` Linus Torvalds
2005-04-16 15:33                                 ` Johannes Schindelin
2005-04-17 13:14                                   ` David Woodhouse
2005-04-15 19:20                               ` Paul Jackson
2005-04-16  1:44                               ` Simon Fowler
2005-04-16 12:19                                 ` David Lang
2005-04-16 15:55                                   ` Simon Fowler
2005-04-16 16:03                                     ` Petr Baudis
2005-04-16 16:26                                       ` Simon Fowler
2005-04-16 16:26                                       ` Linus Torvalds
2005-04-16 23:02                                         ` David Lang
2005-04-17 14:52                                         ` Ingo Molnar
2005-04-17 15:08                                           ` Brad Roberts
2005-04-17 15:18                                             ` Ingo Molnar
2005-04-17 15:28                                           ` Ingo Molnar
2005-04-17 17:34                                             ` Linus Torvalds
2005-04-17 22:12                                               ` Herbert Xu
2005-04-17 22:35                                                 ` Linus Torvalds
2005-04-17 23:29                                                   ` Herbert Xu
2005-04-17 23:34                                                     ` Petr Baudis
2005-04-17 23:53                                                       ` Kenneth Johansson
2005-04-18  0:49                                                       ` Herbert Xu
2005-04-18  0:55                                                         ` Petr Baudis
2005-04-17 23:50                                                     ` Linus Torvalds
2005-04-18  4:16                                               ` Sanjoy Mahajan
2005-04-18  7:42                                               ` Ingo Molnar
2005-04-16 20:29                               ` Sanjoy Mahajan
2005-04-16 20:41                                 ` Linus Torvalds
2005-04-15  2:21                       ` [Patch] ls-tree enhancements Junio C Hamano
2005-04-15 16:13                         ` Petr Baudis
2005-04-15 18:25                           ` Junio C Hamano
2005-04-15  9:14                       ` Merge with git-pasky II David Woodhouse
2005-04-15  9:36                         ` Ingo Molnar
2005-04-15 10:05                           ` David Woodhouse
2005-04-15 14:53                             ` Ingo Molnar
2005-04-15 15:09                               ` David Woodhouse
2005-04-15 12:03                         ` Johannes Schindelin
2005-04-15 10:22                           ` Theodore Ts'o
2005-04-15 14:53                         ` Linus Torvalds
2005-04-15 15:29                           ` David Woodhouse
2005-04-15 15:51                             ` Linus Torvalds
2005-04-15 15:54                           ` Paul Jackson
2005-04-15 16:30                             ` C. Scott Ananian
2005-04-15 18:29                               ` Paul Jackson
2005-04-14 18:51                     ` Christopher Li
2005-04-14 19:35                     ` Petr Baudis
2005-04-14 20:01                       ` Live Merging from remote repositories Barry Silverman
2005-04-14 23:22                         ` Junio C Hamano
2005-04-15  1:07                           ` Question about git process model Barry Silverman
2005-04-14 20:23                       ` Re: Merge with git-pasky II Erik van Konijnenburg
2005-04-14 20:24                         ` Petr Baudis
2005-04-14 23:12                       ` Junio C Hamano
2005-04-14 20:24                         ` Christopher Li
2005-04-14 23:31                         ` Petr Baudis
2005-04-14 20:30                           ` Christopher Li
2005-04-14 20:37                             ` Christopher Li
2005-04-14 20:50                               ` Christopher Li
2005-04-15  0:58                           ` Junio C Hamano
2005-04-14 22:30                             ` Christopher Li
2005-04-15  7:43                               ` Junio C Hamano
2005-04-15  6:28                                 ` Christopher Li
2005-04-15 11:11                                   ` Junio C Hamano
     [not found]                                     ` <7vaco0i3t9.fsf_-_@assigned-by-dhcp.cox.net>
2005-04-15 18:44                                       ` write-tree is pasky-0.4 Linus Torvalds
2005-04-15 18:56                                         ` Petr Baudis
2005-04-15 20:13                                           ` Linus Torvalds
2005-04-15 22:36                                             ` Petr Baudis
2005-04-16  0:22                                               ` Linus Torvalds
2005-04-16  1:13                                                 ` Daniel Barkalow
2005-04-16  2:18                                                   ` Linus Torvalds
2005-04-16  2:49                                                     ` Daniel Barkalow
2005-04-16  3:13                                                       ` Linus Torvalds
2005-04-16  3:56                                                         ` Daniel Barkalow
2005-04-16  6:59                                                         ` Paul Jackson
2005-04-16 15:34                                                 ` Re: Re: " Petr Baudis
2005-04-15 20:10                                         ` Junio C Hamano
2005-04-15 20:58                                           ` C. Scott Ananian
2005-04-15 21:22                                             ` Petr Baudis
2005-04-15 23:16                                             ` Junio C Hamano
2005-04-15 21:48                                           ` [PATCH 1/2] merge-trees script for Linus git Junio C Hamano
2005-04-15 21:54                                             ` [PATCH 2/2] " Junio C Hamano
2005-04-15 23:33                                             ` [PATCH 3/2] " Junio C Hamano
2005-04-16  1:02                                               ` Linus Torvalds
2005-04-16  4:10                                                 ` Junio C Hamano
2005-04-16  5:02                                                   ` Linus Torvalds
2005-04-16  6:26                                                     ` Linus Torvalds
2005-04-16  8:12                                                     ` Junio C Hamano
2005-04-16  9:27                                                       ` [PATCH] Byteorder fix for read-tree, new -m semantics version Junio C Hamano
2005-04-16 10:35                                                       ` [PATCH 1/2] Add --stage to show-files for new stage dircache Junio C Hamano
2005-04-16 10:42                                                         ` [PATCH 2/2] " Junio C Hamano
2005-04-16 14:03                                                       ` Issues with higher-order stages in dircache Junio C Hamano
2005-04-17  5:11                                                         ` Junio C Hamano
2005-04-17  5:31                                                           ` Linus Torvalds
2005-04-17  6:01                                                             ` Junio C Hamano
2005-04-17 10:00                                                         ` Summary of "read-tree -m O A B" mechanism Junio C Hamano
2005-04-16 15:28                                                       ` [PATCH 3/2] merge-trees script for Linus git Linus Torvalds
2005-04-16 16:36                                                         ` Linus Torvalds
2005-04-16 17:14                                                           ` Junio C Hamano
2005-04-15 19:54                             ` Re: Merge with git-pasky II Petr Baudis
2005-04-15 10:22                           ` Junio C Hamano
2005-04-15 20:40                             ` Petr Baudis
2005-04-15 22:41                               ` Junio C Hamano
2005-04-15 19:57           ` Junio C Hamano
2005-04-15 20:45             ` Linus Torvalds
2005-04-14  0:30 ` Petr Baudis
2005-04-14 22:11 ` git merge Petr Baudis
     [not found] <000d01c541ed$32241fd0$6400a8c0@gandalf>
2005-04-15 20:31 ` Merge with git-pasky II Linus Torvalds
2005-04-15 23:00   ` Barry Silverman
2005-04-16  0:32     ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2005-04-26 18:55 Bram Cohen
2005-04-26 19:58 ` Linus Torvalds
2005-04-26 20:30   ` Tom Lord
2005-04-26 20:31   ` Bram Cohen
2005-04-26 20:39     ` Tom Lord
2005-04-26 20:58     ` Linus Torvalds
2005-04-26 21:25       ` Linus Torvalds
2005-04-26 21:28       ` Bram Cohen
2005-04-26 21:36         ` Fabian Franz
2005-04-26 22:30           ` Linus Torvalds
2005-04-26 22:25         ` Linus Torvalds
2005-04-28  0:42           ` Petr Baudis
2005-04-26 21:26     ` Diego Calleja
2005-04-26 20:31   ` Daniel Barkalow
2005-04-26 20:44     ` Tom Lord

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).