I want to release a "git-1.0"

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* I want to release a "git-1.0"
@ 2005-05-30 20:00 Linus Torvalds
  2005-05-30 20:33 ` jeff millar
                   ` (10 more replies)
  0 siblings, 11 replies; 64+ messages in thread
From: Linus Torvalds @ 2005-05-30 20:00 UTC (permalink / raw)
  To: Git Mailing List

Ok, I'm at the point where I really think it's getting close to a 1.0, and
make another tar-ball etc. I obviously feel that it's already way superior
to CVS, but I also realize that somebody who is used to CVS may not 
actually realize that very easily.

So before I do a 1.0 release, I want to write some stupid git tutorial for
a complete beginner that has only used CVS before, with a real example of
how to use raw git, and along those lines I actually want the thing to
show how to do something useful.

So before I do that, is there something people think is just too hard for
somebody coming from the CVS world to understand? I already realized that
the "git-write-tree" + "git-commit-tree" interfaces were just _too_ hard
to put into a sane tutorial.

I was showing off raw git to Steve Chamberlain yesterday, and showing it
to him made some things pretty obvious - one of them being that
"git-init-db" really needed to set up the initial refs etc). So I wrote
this silly "git-commit-script" to make it at least half-way palatable, but
what else do people feel is "too hard"?

I think I'll move the "cvs2git" script thing to git proper before the 1.0 
release (again, in order to have the tutorial able to show what to do if 
you already have an existing CVS tree), what else?

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
@ 2005-05-30 20:33 ` jeff millar
  2005-05-30 20:49 ` Nicolas Pitre
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 64+ messages in thread
From: jeff millar @ 2005-05-30 20:33 UTC (permalink / raw)
  To: Linus Torvalds, git

Linus Torvalds wrote:

>So before I do that, is there something people think is just too hard for
>somebody coming from the CVS world to understand? 
>
I'm a fairly clueless cvs user, trying to use cg/git as a way to track a 
single
user project...using cogito, because that's easier, right?

The usage pattern that causing me problems right now.

cg-init a whole directory tree (trying with /etc and a software project 
directory)
note that too many files got included (*.cache, *.backup, *.o, binaries, 
etc)
want to stop tracking them, cg-rm also removes the file, don't want that.

What's the best way to stop tracking files?

jeff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
  2005-05-30 20:33 ` jeff millar
@ 2005-05-30 20:49 ` Nicolas Pitre
  2005-06-01  6:52   ` Junio C Hamano
  2005-05-30 20:59 ` I want to release a "git-1.0" Junio C Hamano
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 64+ messages in thread
From: Nicolas Pitre @ 2005-05-30 20:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Mon, 30 May 2005, Linus Torvalds wrote:

> 
> Ok, I'm at the point where I really think it's getting close to a 1.0, and
> make another tar-ball etc.

Any chance you could merge my latest mkdelta patch _please_ ???

I just posted it twice in the last 4 days and it still didn't appear in 
your repository.

Again, the current version of mkdelta in your tree has a bug that can 
screw things up, and it is fixed in the latest patch of course.

Nicolas

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
  2005-05-30 20:33 ` jeff millar
  2005-05-30 20:49 ` Nicolas Pitre
@ 2005-05-30 20:59 ` Junio C Hamano
  2005-05-30 21:07 ` Junio C Hamano
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-05-30 20:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

I'd really appreciate if you reconsider diff-* -O for inclusion
before 1.0 happens.  It is probably the lowest impact among the
diffcore family.

Don't I deserve it ;-)?


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
                   ` (2 preceding siblings ...)
  2005-05-30 20:59 ` I want to release a "git-1.0" Junio C Hamano
@ 2005-05-30 21:07 ` Junio C Hamano
  2005-05-30 22:11 ` David Greaves
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-05-30 21:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I was showing off raw git to Steve Chamberlain yesterday, and showing it
LT> to him made some things pretty obvious - one of them being that
LT> "git-init-db" really needed to set up the initial refs etc). So I wrote
LT> this silly "git-commit-script" to make it at least half-way palatable, but
LT> what else do people feel is "too hard"?

I think you need to clarify your intended audience first before
soliciting "list of things that would help CVS user to convert
to GIT".  Specifically, which variant of GIT you are talking
about.

I think you are talking about using the bare Plumbing.  I
suspect that some of the things you said "too hard" may be
coming from the fact that you did not use Cogito in the "showing
off" you did.  I imagine Cogito users do not experience the
trouble you felt with git-init-db, since I presume they would
rather use cg-init which IIUIC sets up the .git/refs structure
for its taste.

Having said that, I am in the same camp as you are in, in that
the (secondary) goal of my involvement in this project so far
has been to make the bare Plumbing confortable enough to use, to
make the choice of Porcelain more or less irrelevant.  As such,
I am all for such a tutorial to convert CVS people to Plumbing
GIT.

Not that I'd volunteer writing big part of such a document.  I
suck at documentation, not just math ;-).

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
                   ` (3 preceding siblings ...)
  2005-05-30 21:07 ` Junio C Hamano
@ 2005-05-30 22:11 ` David Greaves
  2005-05-30 22:12 ` Dave Jones
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 64+ messages in thread
From: David Greaves @ 2005-05-30 22:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds wrote:

>So before I do a 1.0 release, I want to write some stupid git tutorial for
>a complete beginner that has only used CVS before, with a real example of
>how to use raw git, and along those lines I actually want the thing to
>show how to do something useful.
>
>So before I do that, is there something people think is just too hard for
>somebody coming from the CVS world to understand? I already realized that
>the "git-write-tree" + "git-commit-tree" interfaces were just _too_ hard
>to put into a sane tutorial.
>
>I was showing off raw git to Steve Chamberlain yesterday, and showing it
>to him made some things pretty obvious - one of them being that
>"git-init-db" really needed to set up the initial refs etc). So I wrote
>this silly "git-commit-script" to make it at least half-way palatable, but
>what else do people feel is "too hard"?
>
>I think I'll move the "cvs2git" script thing to git proper before the 1.0 
>release (again, in order to have the tutorial able to show what to do if 
>you already have an existing CVS tree), what else?
>  
>

It seems to me that a tutorial for end users is inappropriate.
You should be writing a tutorial for porcelain implementors :)

Anyway, a while back I split the commands into manipulation and
interrogation and then into ancillary commands and scripts. Do you
actually agree with this grouping?
http://www.kernel.org/pub/software/scm/git/docs/git.html
It may help to position who should be doing what.

Also, if you're writing a git-init-script, it may be that you're simply
scripting common processes and could helpfully maintain consistency by
either pulling some of the really trivial Cogito scripts (cg-init,
cg-add, cg-rm) into the core 'ancillary' area or suggesting
modifications to Cogito as the current 'best of breed' implementation of
the low-level git usage process. Cogito also 'fixes' some useability
issues such as using "git-update-cache --add" == "cg-add"
I know you _can_ use git as an end user - but it seems that it's
designed to be used by plumbers.

Oh, I'd also like to see something along the lines of my cg-Xignore
before git hits 1.0

On the tutorial side - yesterday I started pulling together stuff from
the list about merging to complete the README where it says [ fixme:
talk about resolving merges here ]

I haven't done much other than collect some discussion from the list and
the text from git-read-tree.txt.
I do think this area needs more explanation as the whole 'stage' thing
is pretty alien to CVS.
I also noted a few people asking "so I did this merge - what do I do now?"

The working directory/cache/repository is also confusing sometimes -
especially when the cache and working-dir unexpectedly don't match.

I also see in my notes: "improve the docs around update-cache."

David

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
                   ` (4 preceding siblings ...)
  2005-05-30 22:11 ` David Greaves
@ 2005-05-30 22:12 ` Dave Jones
  2005-05-30 22:55   ` Dmitry Torokhov
  2005-05-31  0:52   ` Linus Torvalds
  2005-05-30 22:19 ` Ryan Anderson
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 64+ messages in thread
From: Dave Jones @ 2005-05-30 22:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Mon, May 30, 2005 at 01:00:42PM -0700, Linus Torvalds wrote:
 > 
 > Ok, I'm at the point where I really think it's getting close to a 1.0, and
 > make another tar-ball etc. I obviously feel that it's already way superior
 > to CVS, but I also realize that somebody who is used to CVS may not 
 > actually realize that very easily.
 > 
 > So before I do a 1.0 release, I want to write some stupid git tutorial for
 > a complete beginner that has only used CVS before, with a real example of
 > how to use raw git, and along those lines I actually want the thing to
 > show how to do something useful.
 > 
 > So before I do that, is there something people think is just too hard for
 > somebody coming from the CVS world to understand? I already realized that
 > the "git-write-tree" + "git-commit-tree" interfaces were just _too_ hard
 > to put into a sane tutorial.
 > 
 > I was showing off raw git to Steve Chamberlain yesterday, and showing it
 > to him made some things pretty obvious - one of them being that
 > "git-init-db" really needed to set up the initial refs etc). So I wrote
 > this silly "git-commit-script" to make it at least half-way palatable, but
 > what else do people feel is "too hard"?

I finally got around to actually trying to use git to maintain the
cpufreq repository the last few days after reading Jeff Garzik's mini-howto[1]

It's not particularly complicated, but the number one thing that's bugged me is this..

# commit changes
GIT_AUTHOR_NAME="John Doe"		\
    GIT_AUTHOR_EMAIL="jdoe@foo.com"	\
    GIT_COMMITTER_NAME="Jeff Garzik"	\
    GIT_COMMITTER_EMAIL="jgarzik@pobox.com"	\
    git-commit-tree `git-write-tree`	\
    -p $(cat .git/HEAD )			\
    < changelog.txt			\
    > .git/HEAD

For merging a lot of csets, thats a lot of typing per cset. So my .bashrc
now sets up GIT_COMMITTER_NAME & GIT_COMMITTER_EMAIL, because I don't
foresee myself changing either of those anytime soon, which takes it down
to
    GIT_AUTHOR_NAME="John Doe"      \
    GIT_AUTHOR_EMAIL="jdoe@foo.com" \
    git-commit-tree `git-write-tree`    \
    -p $(cat .git/HEAD )            \
    < changelog.txt         \
    > .git/HEAD

per-cset.  Maybe I have early on-set dementia, but the number of times
I've typoed those two remaining environment variables is bizarre.
I must've hit every known combination possible in my merge of ~30 patches.

I could make the latter 4 lines of the above a shell alias to save some
typing, but those shell vars still bug me. Hmm, maybe I could create a
wrapper that splits a "Dave Jones <davej@redhat.com" style string into two vars.

I realise you've got a nifty bunch of tools to apply a whole mbox of
patches, but that's not ideal if all of my patches aren't in mboxes
(some I create myself and toss in my spool, some I pull from bugzilla etc..)

Typos aside, the other thing that seems non-intuitive is the splitting up
of the patch & changelog comment into seperate files during the patch-apply
stage.

Maybe your new git-commit-script wonder-tool fixes up all these problems
already, I'll take a look after food.

Its pretty nifty stuff, but for merging a lot of patches in non-mbox format,
either I'm doing something wrong, or its, well.. painful.

		Dave

[1] http://lkml.org/lkml/2005/5/26/11/index.html


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
                   ` (5 preceding siblings ...)
  2005-05-30 22:12 ` Dave Jones
@ 2005-05-30 22:19 ` Ryan Anderson
  2005-05-31  0:58   ` Linus Torvalds
  2005-05-30 22:32 ` Chris Wedgwood
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 64+ messages in thread
From: Ryan Anderson @ 2005-05-30 22:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Mon, May 30, 2005 at 01:00:42PM -0700, Linus Torvalds wrote:
> 
> I think I'll move the "cvs2git" script thing to git proper before the 1.0 
> release (again, in order to have the tutorial able to show what to do if 
> you already have an existing CVS tree), what else?

Umm, why do you maintain two seperate "git" related trees?

Why not merge all of git-tools in, in a tools/ subdirectory?

I've been meaning to ask the same question about "gitweb" for that
matter.  The distributions that want seperate packages for dependency
reasons can handle that easily inside one tree, anyway, I believe.

I'd guess part of this is a holdover from the fact that you needed an
independent tree for BitKeeper, but does it still make sense?

-- 

Ryan Anderson
  sometimes Pug Majere

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
                   ` (6 preceding siblings ...)
  2005-05-30 22:19 ` Ryan Anderson
@ 2005-05-30 22:32 ` Chris Wedgwood
  2005-05-30 23:56   ` Chris Wedgwood
  2005-05-31  1:06   ` Linus Torvalds
  2005-05-31  0:19 ` Petr Baudis
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 64+ messages in thread
From: Chris Wedgwood @ 2005-05-30 22:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Mon, May 30, 2005 at 01:00:42PM -0700, Linus Torvalds wrote:

> So before I do that, is there something people think is just too
> hard for somebody coming from the CVS world to understand? I already
> realized that the "git-write-tree" + "git-commit-tree" interfaces
> were just _too_ hard to put into a sane tutorial.

I'm still at a loss how to do the equivalent of annotate.  I know a
couple of front ends can do this but I have no idea what command line
magic would be equivalent.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 22:12 ` Dave Jones
@ 2005-05-30 22:55   ` Dmitry Torokhov
  2005-05-30 23:15     ` Junio C Hamano
  2005-05-30 23:23     ` Dmitry Torokhov
  2005-05-31  0:52   ` Linus Torvalds
  1 sibling, 2 replies; 64+ messages in thread
From: Dmitry Torokhov @ 2005-05-30 22:55 UTC (permalink / raw)
  To: git; +Cc: Dave Jones, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 648 bytes --]

On Monday 30 May 2005 17:12, Dave Jones wrote:
> I realise you've got a nifty bunch of tools to apply a whole mbox of
> patches, but that's not ideal if all of my patches aren't in mboxes
> (some I create myself and toss in my spool, some I pull from bugzilla etc..)

I mercilessly hacked Linus's scripts from git-tools repo to work with
non-mailbox patches, maybe you can make use of them too. Note that
stripspace.c is not changed in any way whatsoever and mailsplit.c was
changed to handle my personal preference of having patch description
in the form of:

Input: make blah blah change
---
 
And Linus's script would eat that line.

-- 
Dmitry

[-- Attachment #2: applypatch --]
[-- Type: application/x-shellscript, Size: 888 bytes --]

[-- Attachment #3: apply_parsed_patch --]
[-- Type: application/x-shellscript, Size: 2123 bytes --]

[-- Attachment #4: stripspace.c --]
[-- Type: text/x-csrc, Size: 786 bytes --]

#include <stdio.h>
#include <string.h>
#include <ctype.h>

/*
 * Remove empty lines from the beginning and end.
 *
 * Turn multiple consecutive empty lines into just one
 * empty line.
 */
static void cleanup(char *line)
{
	int len = strlen(line);

	if (len > 1 && line[len-1] == '\n') {
		do {
			unsigned char c = line[len-2];
			if (!isspace(c))
				break;
			line[len-2] = '\n';
			len--;
			line[len] = 0;
		} while (len > 1);
	}
}

int main(int argc, char **argv)
{
	int empties = -1;
	char line[1024];

	while (fgets(line, sizeof(line), stdin)) {
		cleanup(line);

		/* Not just an empty line? */
		if (line[0] != '\n') {
			if (empties > 0)
				putchar('\n');
			empties = 0;
			fputs(line, stdout);
			continue;
		}
		if (empties < 0)
			continue;
		empties++;
	}
	return 0;
}

[-- Attachment #5: mailsplit.c --]
[-- Type: text/x-csrc, Size: 2526 bytes --]

/*
 * Totally braindamaged mbox splitter program.
 *
 * It just splits a mbox into a list of files: "0001" "0002" ..
 * so you can process them further from there.
 */
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#include <assert.h>

static int usage(void)
{
	fprintf(stderr, "mailsplit <mbox> <directory>\n");
	exit(1);
}

static int linelen(const char *map, unsigned long size)
{
	int len = 0, c;

	do {
		c = *map;
		map++;
		size--;
		len++;
	} while (size && c != '\n');
	return len;
}

static int is_from_line(const char *line, int len)
{
	const char *colon;

	if (len < 20 || memcmp("From ", line, 5))
		return 0;

	colon = line + len - 2;
	line += 5;
	for (;;) {
		if (colon < line)
			return 0;
		if (*--colon == ':')
			break;
	}

	if (!isdigit(colon[-4]) ||
	    !isdigit(colon[-2]) ||
	    !isdigit(colon[-1]) ||
	    !isdigit(colon[ 1]) ||
	    !isdigit(colon[ 2]))
		return 0;

	/* year */
	if (strtol(colon+3, NULL, 10) <= 90)
		return 0;

	/* Ok, close enough */
	return 1;
}

static int parse_email(const void *map, unsigned long size)
{
	unsigned long offset;

	if (size < 6 || memcmp("From ", map, 5))
		goto corrupt;

	/* Make sure we don't trigger on this first line */
	map++; size--; offset=1;

	/*
	 * Search for a line beginning with "From ", and 
	 * having smething that looks like a date format.
	 */
	do {
		int len = linelen(map, size);
		if (is_from_line(map, len))
			return offset;
		map += len;
		size -= len;
		offset += len;
	} while (size);
	return offset;

corrupt:
	fprintf(stderr, "corrupt mailbox\n");
	exit(1);
}

int main(int argc, char **argv)
{
	int fd, nr;
	struct stat st;
	unsigned long size;
	void *map;

	if (argc != 3)
		usage();
	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror(argv[1]);
		exit(1);
	}
	if (chdir(argv[2]) < 0)
		usage();
	if (fstat(fd, &st) < 0) {
		perror("stat");
		exit(1);
	}
	size = st.st_size;
	map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
	if (-1 == (int)(long)map) {
		perror("mmap");
		exit(1);
	}
	close(fd);
	nr = 0;
	do {
		char name[10];
		unsigned long len = parse_email(map, size);
		assert(len <= size);
		sprintf(name, "%04d", ++nr);
		fd = open(name, O_WRONLY | O_CREAT | O_EXCL, 0600);
		if (fd < 0) {
			perror(name);
			exit(1);
		}
		if (write(fd, map, len) != len) {
			perror("write");
			exit(1);
		}
		close(fd);
		map += len;
		size -= len;
	} while (size > 0);
	return 0;
}

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 22:55   ` Dmitry Torokhov
@ 2005-05-30 23:15     ` Junio C Hamano
  2005-05-30 23:23     ` Dmitry Torokhov
  1 sibling, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-05-30 23:15 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds

On a related topic of making bare Plumbing a bit easier to use,
here is what I use to prepare patches, one patch per file, to be
sent to Linus via e-mail.

Usage:

  $ git-format-patch-script HEAD linus

Assuming that "linus" is the tip of the tree from Linus
(typically stored in .git/branches/linus if you use Cogito), and
HEAD is your additions on top of it, the above command will
produce patches in the format you have been seeing on this list
from me, one file per commit, in .patches/XXXX-patch-title.txt
file.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
sed -e 's/^X//' >git-format-patch-script <<\EOF
X#!/bin/sh
X#
X# Copyright (c) 2005 Junio C Hamano
X#
Xjunio="$1"
Xlinus="$2"
X
Xtmp=.tmp-series$$
Xtrap 'rm -f $tmp-*' 0 1 2 3 15
X
Xseries=$tmp-series
X
XtitleScript='
X	1,/^$/d
X	: loop
X	/^$/b loop
X	s/[^-a-z.A-Z_0-9]/-/g
X	s/^--*//g
X	s/--*$//g
X	s/---*/-/g
X	s/$/.txt/
X        s/\.\.\.*/\./g
X	q
X'
XO=
Xif test -f .git/patch-order
Xthen
X	O=-O.git/patch-order
Xfi
Xgit-rev-list "$junio" "$linus" >$series
Xtotal=`wc -l <$series`
Xi=$total
Xwhile read commit
Xdo
X    title=`git-cat-file commit "$commit" | sed -e "$titleScript"`
X    num=`printf "%d/%d" $i $total`
X    file=`printf '%04d-%s' $i "$title"`
X    i=`expr "$i" - 1`
X    echo "$file"
X    {
X	mailScript='
X	1,/^$/d
X	: loop
X	/^$/b loop
X	s|^|[PATCH '"$num"'] |
X	: body
X	p
X	n
X	b body'
X
X	git-cat-file commit "$commit" | sed -ne "$mailScript"
X	echo '---'
X	git-diff-tree -p $O "$commit" | diffstat -p1
X	echo
X	git-diff-tree -p $O "$commit"
X    } >".patches/$file"
Xdone <$series
EOF


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 22:55   ` Dmitry Torokhov
  2005-05-30 23:15     ` Junio C Hamano
@ 2005-05-30 23:23     ` Dmitry Torokhov
  1 sibling, 0 replies; 64+ messages in thread
From: Dmitry Torokhov @ 2005-05-30 23:23 UTC (permalink / raw)
  To: git; +Cc: Dave Jones, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 776 bytes --]

On Monday 30 May 2005 17:55, Dmitry Torokhov wrote:
> On Monday 30 May 2005 17:12, Dave Jones wrote:
> > I realise you've got a nifty bunch of tools to apply a whole mbox of
> > patches, but that's not ideal if all of my patches aren't in mboxes
> > (some I create myself and toss in my spool, some I pull from bugzilla etc..)
> 
> I mercilessly hacked Linus's scripts from git-tools repo to work with
> non-mailbox patches, maybe you can make use of them too. Note that
> stripspace.c is not changed in any way whatsoever and mailsplit.c was
> changed to handle my personal preference of having patch description
> in the form of:
> 
> Input: make blah blah change
> ---
>  
> And Linus's script would eat that line.
> 

Oops, make it mailinfo.c, not mailsplit.c

-- 
Dmitry

[-- Attachment #2: mailinfo.c --]
[-- Type: text/x-csrc, Size: 5691 bytes --]

/*
 * Another stupid program, this one parsing the headers of an
 * email to figure out authorship and subject
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

static FILE *cmitmsg, *patchfile, *filelist;

static char line[1000];
static char date[1000];
static char name[1000];
static char email[1000];
static char subject[1000];

static char *sanity_check(char *name, char *email)
{
	int len = strlen(name);
	if (len < 3 || len > 60)
		return email;
	if (strchr(name, '@') || strchr(name, '<') || strchr(name, '>'))
		return email;
	return name;
}

static int handle_from(char *line)
{
	char *at = strchr(line, '@');
	char *dst;

	if (!at)
		return 0;

/*
* If we already have one email, don't take any confusing lines
*/
	if (*email && strchr(at + 1, '@'))
		return 0;

	while (at > line) {
		char c = at[-1];
		if (isspace(c) || c == '<')
			break;
		at--;
	}
	dst = email;
	for (;;) {
		unsigned char c = *at;
		if (!c || c == '>' || isspace(c))
			break;
		*at++ = ' ';
		*dst++ = c;
	}
	*dst++ = 0;

	at = line + strlen(line);
	while (at > line) {
		unsigned char c = *--at;
		if (isalnum(c))
			break;
		*at = 0;
	}

	at = line;
	for (;;) {
		unsigned char c = *at;
		if (!c)
			break;
		if (isalnum(c))
			break;
		at++;
	}

	at = sanity_check(at, email);

	strcpy(name, at);
	return 1;
}

static void handle_date(char *line)
{
	strcpy(date, line);
}

static void handle_subject(char *line)
{
	strcpy(subject, line);
}

static void add_subject_line(char *line)
{
	while (isspace(*line))
		line++;
	*--line = ' ';
	strcat(subject, line);
}

static int check_special_line(char *line, int len)
{
	static int cont = -1;
	if (!memcmp(line, "From:", 5) && isspace(line[5])) {
		handle_from(line + 6);
		cont = 0;
		return 1;
	}
	if (!memcmp(line, "Date:", 5) && isspace(line[5])) {
		handle_date(line + 6);
		cont = 0;
		return 1;
	}
	if (!memcmp(line, "Subject:", 8) && isspace(line[8])) {
		handle_subject(line + 9);
		cont = 1;
		return 1;
	}
	if (isspace(*line)) {
		switch (cont) {
		case 0:
			fprintf(stderr,
				"I don't do 'Date:' or 'From:' line continuations\n");
			break;
		case 1:
			add_subject_line(line);
			return 1;
		default:
			break;
		}
	}
	cont = -1;
	return 0;
}

static char *cleanup_subject(char *subject)
{
	for (;;) {
		char *p;
		int len, remove;
		switch (*subject) {
		case 'r':
		case 'R':
			if (!memcmp("e:", subject + 1, 2)) {
				subject += 3;
				continue;
			}
			break;
		case ' ':
		case '\t':
		case ':':
			subject++;
			continue;

		case '[':
			p = strchr(subject, ']');
			if (!p) {
				subject++;
				continue;
			}
			len = strlen(p);
			remove = p - subject;
			if (remove <= len * 2) {
				subject = p + 1;
				continue;
			}
			break;
		}
		return subject;
	}
}

static void cleanup_space(char *buf)
{
	unsigned char c;
	while ((c = *buf) != 0) {
		buf++;
		if (isspace(c)) {
			buf[-1] = ' ';
			c = *buf;
			while (isspace(c)) {
				int len = strlen(buf);
				memmove(buf, buf + 1, len);
				c = *buf;
			}
		}
	}
}

/*
* Hacky hacky. This depends not only on -p1, but on
* filenames not having some special characters in them,
* like tilde.
*/
static void show_filename(char *line)
{
	int len;
	char *name = strchr(line, '/');

	if (!name || !isspace(*line))
		return;
	name++;
	len = 0;
	for (;;) {
		unsigned char c = name[len];
		switch (c) {
		default:
			len++;
			continue;

		case 0:
		case ' ':
		case '\t':
		case '\n':
			break;

/* patch tends to special-case these things.. */
		case '~':
			break;
		}
		break;
	}
/* remove ".orig" from the end - common patch behaviour */
	if (len > 5 && !memcmp(name + len - 5, ".orig", 5))
		len -= 5;
	if (!len)
		return;
	fprintf(filelist, "%.*s\n", len, name);
}

static void handle_rest(void)
{
	char *sub = cleanup_subject(subject);
	cleanup_space(name);
	cleanup_space(date);
	cleanup_space(email);
	cleanup_space(sub);
	printf("Author: %s\nEmail: %s\nSubject: %s\nDate: %s\n\n", name, email,
	       sub, date);
	FILE *out = cmitmsg;

	do {
/* Track filename information from the patch.. */
		if (!memcmp("---", line, 3)) {
			out = patchfile;
			show_filename(line + 3);
		}

		if (!memcmp("+++", line, 3))
			show_filename(line + 3);

		fputs(line, out);
	} while (fgets(line, sizeof(line), stdin) != NULL);

	if (out == cmitmsg) {
		fprintf(stderr, "No patch found\n");
		exit(1);
	}

	fclose(cmitmsg);
	fclose(patchfile);
}

static int eatspace(char *line)
{
	int len = strlen(line);
	while (len > 0 && isspace(line[len - 1]))
		line[--len] = 0;
	return len;
}

static void handle_body(void)
{
	int has_from = 0;

/* First line of body can be a From: */
	while (fgets(line, sizeof(line), stdin) != NULL) {
		int len = eatspace(line);
		if (!len)
			continue;
		if (!memcmp("From:", line, 5) && isspace(line[5])) {
			if (!has_from && handle_from(line + 6)) {
				has_from = 1;
				continue;
			}
		}
		line[len] = '\n';
		handle_rest();
		break;
	}
}

static void usage(void)
{
	fprintf(stderr, "mailinfo msg-file path-file filelist-file < email\n");
	exit(1);
}

int main(int argc, char **argv)
{
	int mail_patch = 0;

	if (argc != 4)
		usage();
	cmitmsg = fopen(argv[1], "w");
	if (!cmitmsg) {
		perror(argv[1]);
		exit(1);
	}
	patchfile = fopen(argv[2], "w");
	if (!patchfile) {
		perror(argv[2]);
		exit(1);
	}
	filelist = fopen(argv[3], "w");
	if (!filelist) {
		perror(argv[3]);
		exit(1);
	}
	while (fgets(line, sizeof(line), stdin) != NULL) {
		int len = eatspace(line);
		if (!len) {
			if (!mail_patch)
				fputs("\n", cmitmsg);

			handle_body();
			break;
		}
		if (check_special_line(line, len)) {
			mail_patch = 1;
			rewind(cmitmsg);
		}
		if (!mail_patch) {
			line[len] = '\n';
			fputs(line, cmitmsg);
		}
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 22:32 ` Chris Wedgwood
@ 2005-05-30 23:56   ` Chris Wedgwood
  2005-05-31  1:06   ` Linus Torvalds
  1 sibling, 0 replies; 64+ messages in thread
From: Chris Wedgwood @ 2005-05-30 23:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Mon, May 30, 2005 at 03:32:42PM -0700, Chris Wedgwood wrote:

> I'm still at a loss how to do the equivalent of annotate.  I know a
> couple of front ends can do this but I have no idea what command line
> magic would be equivalent.

A few people asked what does this now.  Git Tracker does, a (random)
example of which might be:

  http://www.tglx.de/cgi-bin/gittracker/annotate/tracker-linux/torvalds/linux-2.6.git/mm/mmap.c?blob=de54acd9942f9929004921042721df5cdfe2b6c7

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
                   ` (7 preceding siblings ...)
  2005-05-30 22:32 ` Chris Wedgwood
@ 2005-05-31  0:19 ` Petr Baudis
  2005-05-31 13:45 ` Eric W. Biederman
  2005-06-02 19:43 ` CVS migration section to the tutorial Junio C Hamano
  10 siblings, 0 replies; 64+ messages in thread
From: Petr Baudis @ 2005-05-31  0:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Dear diary, on Mon, May 30, 2005 at 10:00:42PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> Ok, I'm at the point where I really think it's getting close to a 1.0, and
> make another tar-ball etc. I obviously feel that it's already way superior
> to CVS, but I also realize that somebody who is used to CVS may not 
> actually realize that very easily.

Can we (well, me) count on the output format of the git commands being
stabilized now and not change in a backwards-incompatible way from now
on? I would like to finally remove the git itself from Cogito, but for
that I have to be able to rely on the fact that as long as the user has
git version >=N, it will work (assuming that Cogito is bugless ;-).

> So before I do a 1.0 release, I want to write some stupid git tutorial for
> a complete beginner that has only used CVS before, with a real example of
> how to use raw git, and along those lines I actually want the thing to
> show how to do something useful.

Is there actually much point in using raw git directly? You don't
usually invoke the syscalls directly from the user programs either (and
you usually actually use stdio for the casual stuff). I guess the
raw git usage can get quite long and tiresome sometimes.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 22:12 ` Dave Jones
  2005-05-30 22:55   ` Dmitry Torokhov
@ 2005-05-31  0:52   ` Linus Torvalds
  1 sibling, 0 replies; 64+ messages in thread
From: Linus Torvalds @ 2005-05-31  0:52 UTC (permalink / raw)
  To: Dave Jones; +Cc: Git Mailing List



On Mon, 30 May 2005, Dave Jones wrote:
>
>     GIT_AUTHOR_NAME="John Doe"      \
>     GIT_AUTHOR_EMAIL="jdoe@foo.com" \
>     git-commit-tree `git-write-tree`    \
>     -p $(cat .git/HEAD )            \
>     < changelog.txt         \
>     > .git/HEAD

You _really_ want to script this.

Also, I'd seriously suggest you avoid using ".git/HEAD" _and_ writing to 
.git/HEAD in the same command. Maybe it works, maybe it doesn't.

So script it with something like

	#!/bin/sh
	export GIT_AUTHOR_NAME="$1"
	export GIT_AUTHOR_EMAIL="$2"
	tree=$(git-write-tree) || exit 1
	commit=$(git-commit-tree $tree -p HEAD) || exit 1
	echo $commit > .git/HEAD

and now you can just do

	commit-as "John Doe" "jdoe@foo.com" < changelog.txt

or something like that.

The git commands really are designed to be scripted.

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 22:19 ` Ryan Anderson
@ 2005-05-31  0:58   ` Linus Torvalds
  0 siblings, 0 replies; 64+ messages in thread
From: Linus Torvalds @ 2005-05-31  0:58 UTC (permalink / raw)
  To: Ryan Anderson; +Cc: Git Mailing List

On Mon, 30 May 2005, Ryan Anderson wrote:
> 
> Umm, why do you maintain two seperate "git" related trees?

Well, my "tools" thing really isn't git proper, and may not make much 
sense in the git distribution.

That said, I'm actually moving things into git as they turn useful. For 
example, I move the "stripspace" program into git (which means it got 
renamed into "git-stripspace", since it ended up being useful for the 
stand-alone git-commit-scripts too. 

But how many non-Linux projects really apply mailboxes of patches? It 
doesn't seem to be very "core".

> Why not merge all of git-tools in, in a tools/ subdirectory?

I'll think about it. It does look like at least about half of the git
tools end up being pretty core.

> I've been meaning to ask the same question about "gitweb" for that
> matter.

Well, there the issue definitely boils down to "different maintainers". I 
don't want to connect things that don't need to be connected. 

> I'd guess part of this is a holdover from the fact that you needed an
> independent tree for BitKeeper, but does it still make sense?

Well, I see the "tools" thing really as my private tools that may or may
not make sense for anybody else. Even the cvs2git thing is just so
_stupid_, since I bet you can do it quite cleanly in perl without having
that strange "convert cvsps output into a shellscript" stage (admittedly,
it was _really_ convenient for debugging to have that separate stage, so
while it looks a bit hacky, it ended up being very powerful).

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 22:32 ` Chris Wedgwood
  2005-05-30 23:56   ` Chris Wedgwood
@ 2005-05-31  1:06   ` Linus Torvalds
  2005-06-01  2:11     ` Junio C Hamano
  1 sibling, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2005-05-31  1:06 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Git Mailing List

On Mon, 30 May 2005, Chris Wedgwood wrote:
> 
> I'm still at a loss how to do the equivalent of annotate.  I know a
> couple of front ends can do this but I have no idea what command line
> magic would be equivalent.

There isn't any. It's actually pretty nasty to do, following history 
backwards and keeping track of lines as they are added. I know how, I'm 
just really lazy and hoping somebody else will do it, since I really end 
up not caring that much myself.

I notice that Thomas Gleixner seems to have one, but that one is based on 
a database, and doesn't look usable as a standalone command..

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
                   ` (8 preceding siblings ...)
  2005-05-31  0:19 ` Petr Baudis
@ 2005-05-31 13:45 ` Eric W. Biederman
  2005-06-01  3:04   ` Linus Torvalds
  2005-06-02 19:43 ` CVS migration section to the tutorial Junio C Hamano
  10 siblings, 1 reply; 64+ messages in thread
From: Eric W. Biederman @ 2005-05-31 13:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@osdl.org> writes:

> Ok, I'm at the point where I really think it's getting close to a 1.0, and
> make another tar-ball etc. I obviously feel that it's already way superior
> to CVS, but I also realize that somebody who is used to CVS may not 
> actually realize that very easily.

I way behind the power curve on learning git at this point but
one piece of the puzzle that CVS has that I don't believe git does
are multiple people committing to the same repository, especially
remotely.  I don't see that as a down side of git but it is a common
way people CVS so it is worth documenting.

Eric

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-31  1:06   ` Linus Torvalds
@ 2005-06-01  2:11     ` Junio C Hamano
  2005-06-01  2:25       ` David Lang
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01  2:11 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Mon, 30 May 2005, Chris Wedgwood wrote:
>> 
>> I'm still at a loss how to do the equivalent of annotate.  I know a
>> couple of front ends can do this but I have no idea what command line
>> magic would be equivalent.

LT> There isn't any. It's actually pretty nasty to do, following history 
LT> backwards and keeping track of lines as they are added. I know how, I'm 
LT> just really lazy and hoping somebody else will do it, since I really end 
LT> up not caring that much myself.

LT> I notice that Thomas Gleixner seems to have one, but that one is based on 
LT> a database, and doesn't look usable as a standalone command..

Here is my quick-and-dirty one done in Perl.  This is dog-slow
and not suited for interactive use, but its algorithm should
handle the merges, renames and complete rewrites correctly.

Its sample output for:

    $ blame.perl HEAD git-commit-script

look like this (I've edited the SHA1 and names to make it a bit
shorter, but it still does not fit on my 80-column terminal X-<).

For each line in the version in the HEAD, it outputs (TAB
separated) the SHA1 of the commit that is responsible for the
line to be there, author, commiter, line number in the version
of the guity commit and the filename in the guilty commit (this
file could have been renamed in which case this may not match
the name of the file the script was originally asked to
annotate).  It shows that 9th line was what was in
a3e870f2... commit as 11th line done by Linus, for example.

:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	1	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	2	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	3	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	4	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	5	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	6	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	7	git-commit-script
:2036d841...	Junio C....@cox.net>	Linus T....osdl.org>	8	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	11	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	12	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	13	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	14	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	15	git-commit-script

------------
A blame script for use by higher-level annotate tools.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff -u a/blame.perl b/blame.perl
--- /dev/null
+++ b/blame.perl
@@ -0,0 +1,400 @@
+#!/usr/bin/perl -w
+
+use strict;
+
+package main;
+$::debug = 0;
+
+sub read_blob {
+    my $sha1 = shift;
+    my $fh = undef;
+    my $result;
+    local ($/) = undef;
+    open $fh, '-|', 'git-cat-file', 'blob', $sha1
+	or die "cannot read blob $sha1";
+    $result = join('', <$fh>);
+    close $fh
+	or die "failure while closing pipe to git-cat-file";
+    return $result;
+}
+
+sub read_diff_raw {
+    my ($parent, $filename) = @_;
+    my $fh = undef;
+    local ($/) = "\0";
+    my @result = (); 
+    my ($meta, $status, $sha1_1, $sha1_2, $file1, $file2);
+    print STDERR "* diff-cache $parent\n" if $::debug;
+    open $fh, '-|', 'git-diff-cache', '-B', '-C', '--cached', '-z', $parent
+	or die "cannot read git-diff-cache with $parent";
+    while (defined ($meta = <$fh>)) {
+	chomp($meta);
+	(undef, undef, $sha1_1, $sha1_2, $status) = split(/ /, $meta);
+	$file1 = <$fh>;
+	chomp($file1);
+	if ($status =~ /^[CR]/) {
+	    $file2 = <$fh>;
+	    chomp($file2);
+	} elsif ($status =~ /^D/) {
+	    next;
+	} else {
+	    $file2 = $file1;
+	}
+	if ($file2 eq $filename) {
+	    push @result, [$status, $sha1_1, $sha1_2, $file1, $file2];
+	}
+    }
+    close $fh
+	or die "failure while closing pipe to git-diff-cache";
+    return @result;
+}
+
+sub write_temp_blob {
+    my ($sha1, $temp) = @_;
+    my $fh = undef;
+    my $blob = read_blob($sha1);
+    open $fh, '>', $temp
+	or die "cannot open temporary file $temp";
+    print $fh $blob;
+    close($fh);
+}
+
+package Git::Patch;
+sub new {
+    my ($class, $sha1_1, $sha1_2) = @_;
+    my $self = bless [], $class;
+    my $fh = undef;
+    ::write_temp_blob($sha1_1, "/tmp/blame-$$-1");
+    ::write_temp_blob($sha1_2, "/tmp/blame-$$-2");
+    open $fh, '-|', 'diff', '-u0', "/tmp/blame-$$-1", "/tmp/blame-$$-2"
+	or die "cannot read diff";
+    while (<$fh>) {
+	if (/^\@\@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? \@\@/) {
+	    push @$self, [$1, (defined $2 ? $2 : 1),
+			  $3, (defined $4 ? $4 : 1)];
+	}
+    }
+    close $fh;
+    unlink "/tmp/blame-$$-1", "/tmp/blame-$$-2";
+    return $self;
+}
+
+sub find_parent_line {
+    my ($self, $commit_lineno) = @_;
+    my $ofs = 0;
+    for (@$self) {
+	my ($line_1, $len_1, $line_2, $len_2) = @$_;
+	if ($commit_lineno < $line_2) {
+	    return $commit_lineno - $ofs;
+	}
+	if ($line_2 <= $commit_lineno && $commit_lineno < $line_2 + $len_2) {
+	    return -1; # changed by commit.
+	}
+	$ofs += ($len_1 - $len_2);
+    }
+    return $commit_lineno + $ofs;
+}
+
+package Git::Commit;
+sub new {
+    my $class = shift;
+    my $self = bless {
+	PARENT => [],
+	TREE => undef,
+	AUTHOR => undef,
+	COMMITTER => undef,
+    }, $class;
+    my $commit_sha1 = shift;
+    $self->{SHA1} = $commit_sha1;
+    my $fh = undef;
+    open $fh, '-|', 'git-cat-file', 'commit', $commit_sha1
+	or die "cannot read commit object $commit_sha1";
+    while (<$fh>) {
+	chomp;
+	if (/^tree ([0-9a-f]{40})$/) { $self->{TREE} = $1; }
+	elsif (/^parent ([0-9a-f]{40})$/) { push @{$self->{PARENT}}, $1; }
+	elsif (/^author ([^>]+>)/) { $self->{AUTHOR} = $1; }
+	elsif (/^committer ([^>]+>)/) { $self->{COMMITTER} = $1; }
+    }
+    close $fh
+	or die "failure while closing pipe to git-cat-file";
+    return $self;
+}
+
+sub find_file {
+    my ($commit, $path) = @_;
+    my $result = undef;
+    my $fh = undef;
+    local ($/) = "\0";
+    open $fh, '-|', 'git-ls-tree', '-z', '-r', '-d', $commit->{TREE}, $path
+	or die "cannot read git-ls-tree $commit->{TREE}";
+    while (<$fh>) {
+	chomp;
+	if (/^[0-7]{6} blob ([0-9a-f]{40})	(.*)$/) {
+	    if ($2 ne $path) {
+		die "$2 ne $path???";
+	    }
+	    $result = $1;
+	    last;
+	}
+    }
+    close $fh
+	or die "failure while closing pipe to git-ls-tree";
+    return $result;
+}
+
+package Git::Blame;
+sub new {
+    my $class = shift;
+    my $self = bless {
+	LINE => [],
+	UNKNOWN => undef,
+	WORK => [],
+    }, $class;
+    my $commit = shift;
+    my $filename = shift;
+    my $sha1 = $commit->find_file($filename);
+    my $blob = ::read_blob($sha1);
+    my @blob = (split(/\n/, $blob));
+    for (my $i = 0; $i < @blob; $i++) {
+	$self->{LINE}[$i] = +{
+	    COMMIT => $commit,
+	    FOUND => undef,
+	    FILENAME => $filename,
+	    LINENO => ($i + 1),
+	};
+    }
+    $self->{UNKNOWN} = scalar @blob;
+    push @{$self->{WORK}}, [$commit, $filename];
+    return $self;
+}
+
+sub print {
+    my $self = shift;
+    my $line_termination = shift;
+    for (my $i = 0; $i < @{$self->{LINE}}; $i++) {
+	my $l = $self->{LINE}[$i];
+	print ($l->{FOUND} ? ':' : '?');;
+	print "$l->{COMMIT}->{SHA1}	";
+	print "$l->{COMMIT}->{AUTHOR}	";
+	print "$l->{COMMIT}->{COMMITTER}	";
+	print "$l->{LINENO}	$l->{FILENAME}";
+	print $line_termination;
+    }
+}
+
+sub take_responsibility {
+    my ($self, $commit) = @_;
+    for (my $i = 0; $i < @{$self->{LINE}}; $i++) {
+	my $l = $self->{LINE}[$i];
+	if (! $l->{FOUND} && ($l->{COMMIT}->{SHA1} eq $commit->{SHA1})) {
+	    $l->{FOUND} = 1;
+	    $self->{UNKNOWN}--;
+	}
+    }
+}
+
+sub blame_parent {
+    my ($self, $commit, $parent, $filename) = @_;
+    my @diff = ::read_diff_raw($parent->{SHA1}, $filename);
+    my $filename_in_parent;
+    my $passed_blame_to_parent = undef;
+    if (@diff == 0) {
+	# We have not touched anything.  Blame parent for everything
+	# that we are suspected for.
+	for (my $i = 0; $i < @{$self->{LINE}}; $i++) {
+	    my $l = $self->{LINE}[$i];
+	    if (! $l->{FOUND} && ($l->{COMMIT}->{SHA1} eq $commit->{SHA1})) {
+		$l->{COMMIT} = $parent;
+		$passed_blame_to_parent = 1;
+	    }
+	}
+	$filename_in_parent = $filename;
+    }
+    elsif (@diff != 1) {
+	# This should not happen.
+	for (@diff) {
+	    print "** @$_\n";
+	}
+	die "Oops";
+    }
+    else {
+	my ($status, $sha1_1, $sha1_2, $file1, $file2) = @{$diff[0]};
+	print STDERR "** $status $file1 $file2\n" if $::debug;
+	if ($status =~ /N/) {
+	    # Either some of other parents created it, or we did.
+	    # At this point the only thing we know is that this
+	    # parent is not responsible for it.
+	    ;
+	}
+	else {
+	    my $patch = Git::Patch->new($sha1_1, $sha1_2);
+	    $filename_in_parent = $file1;
+	    for (my $i = 0; $i < @{$self->{LINE}}; $i++) {
+		my $l = $self->{LINE}[$i];
+		if (! $l->{FOUND} && $l->{COMMIT}->{SHA1} eq $commit->{SHA1}) {
+		    # We are suspected to have introduced this line.
+		    # Does it exist in the parent?
+		    my $lineno = $l->{LINENO};
+		    my $parent_line = $patch->find_parent_line($lineno);
+		    if ($parent_line < 0) {
+			# No, we may be the guilty ones, or some other
+			# parent might be.  We do not assign blame to
+			# ourselves here yet.
+			;
+		    }
+		    else {
+			# This line is coming from the parent, so pass
+			# blame to it.
+			$l->{COMMIT} = $parent;
+			$l->{FILENAME} = $file1;
+			$l->{LINENO} = $parent_line;
+			$passed_blame_to_parent = 1;
+		    }
+		}
+	    }
+	}
+    }
+    if ($passed_blame_to_parent && $self->{UNKNOWN}) {
+	unshift @{$self->{WORK}},
+	[$parent, $filename_in_parent];
+    }
+}
+
+sub assign {
+    my ($self, $commit, $filename) = @_;
+    # We do read-tree of the current commit and diff-cache
+    # with each parents, instead of running diff-tree.  This
+    # is because diff-tree does not look for copies hard enough.
+    #
+    print STDERR "* read-tree  $commit->{SHA1}\n" if $::debug;
+    system('git-read-tree', '-m', $commit->{SHA1});
+    for my $parent (@{$commit->{PARENT}}) {
+	$self->blame_parent($commit, Git::Commit->new($parent), $filename);
+    }
+    $self->take_responsibility($commit);
+}
+
+sub assign_blame {
+    my ($self) = @_;
+    while ($self->{UNKNOWN} && @{$self->{WORK}}) {
+	my $wk = shift @{$self->{WORK}};
+	my ($commit, $filename) = @$wk;
+	$self->assign($commit, $filename);
+    }
+}
+
+
+
+################################################################
+package main;
+my $usage = "blame [-z] <commit> filename";
+my $line_termination = "\n";
+
+$::ENV{GIT_INDEX_FILE} = "/tmp/blame-$$-index";
+unlink($::ENV{GIT_INDEX_FILE});
+
+if ($ARGV[0] eq '-z') {
+    $line_termination = "\0";
+    shift;
+}
+
+if (@ARGV != 2) {
+    die $usage;
+}
+
+my $head_commit = Git::Commit->new($ARGV[0]);
+my $filename = $ARGV[1];
+my $blame = Git::Blame->new($head_commit, $filename);
+
+$blame->assign_blame();
+$blame->print($line_termination);
+
+unlink($::ENV{GIT_INDEX_FILE});
+
+__END__
+
+How does this work, and what do we do about merges?
+
+The algorithm considers that the first parent is our main line of
+development and treats it somewhat special than other parents.  So we
+pass on the blame to the first parent if a line has not changed from
+it.  For lines that have changed from the first parent, we must have
+either inherited that change from some other parent, or it could have
+been merge conflict resolution edit we did on our own.
+
+The following picture illustrates how we pass on and assign blames.
+
+In the sample, the original O was forked into A and B and then merged
+into M.  Line 1, 2, and 4 did not change.  Line 3 and 5 are changed in
+A, and Line 5 and 6 are changed in B.  M made its own decision to
+resolve merge conflicts at Line 5 to something different from A and B:
+
+                A: 1 2 T 4 T 6
+               /               \ 
+O: 1 2 3 4 5 6                  M: 1 2 T 4 M S
+               \               / 
+                B: 1 2 3 4 S S
+
+In the following picture, each line is annotated with a blame letter.
+A lowercase blame (e.g. "a" for "1") means that commit or its ancestor
+is the guilty party but we do not know which particular ancestor is
+responsible for the change yet.  An uppercase blame means that we know
+that commit is the guilty party.
+
+First we look at M (the HEAD) and initialize Git::Blame->{LINE} like
+this:
+
+             M: 1 2 T 4 M S
+                m m m m m m
+
+That is, we know all lines are results of modification made by some
+ancestor of M, so we assign lowercase 'm' to all of them.
+
+Then we examine our first parent A.  Throughout the algorithm, we are
+always only interested in the lines we are the suspect, but this being
+the initial round, we are the suspect for all of them.  We notice that
+1 2 T 4 are the same as the parent A, so we pass the blame for these
+four lines to A.  M and S are different from A, so we leave them as
+they are (note that we do not immediately take the blame for them):
+
+             M: 1 2 T 4 M S
+                a a a a m m
+
+Next we go on to examine parent B.  Again, we are only interested in
+the lines we are still the suspect (i.e. M and S).  We notice S is
+something we inherited from B, so we pass the blame on to it, like
+this:
+
+             M: 1 2 T 4 M S
+                a a a a m b
+
+Once we exhausted the parents, we look at the results and take
+responsibility for the remaining ones that we are still the suspect:
+
+             M: 1 2 T 4 M S
+                a a a a M b
+
+We are done with M.  And we know commits A and B need to be examined
+further, so we do them recursively.  When we look at A, we again only
+look at the lines that A is the suspect:
+
+             A: 1 2 T 4 T 6
+                a a a a M b
+
+Among 1 2 T 4, comparing against its parent O, we notice 1 2 4 are
+the same so pass the blame for those lines to O:
+
+             A: 1 2 T 4 T 6
+                o o a o M b
+
+A is a non-merge commit; we have already exhausted the parents and
+take responsibility for the remaining ones that A is the suspect:
+
+             A: 1 2 T 4 T 6
+                o o A o M b
+
+We go on like this and the final result would become:
+
+             O: 1 2 3 4 5 6
+                O O A O M B


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01  2:11     ` Junio C Hamano
@ 2005-06-01  2:25       ` David Lang
  2005-06-01  4:53         ` Junio C Hamano
  0 siblings, 1 reply; 64+ messages in thread
From: David Lang @ 2005-06-01  2:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Chris Wedgwood, Linus Torvalds, Git Mailing List

On Tue, 31 May 2005, Junio C Hamano wrote:

>>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
>
> LT> On Mon, 30 May 2005, Chris Wedgwood wrote:
>>>
>>> I'm still at a loss how to do the equivalent of annotate.  I know a
>>> couple of front ends can do this but I have no idea what command line
>>> magic would be equivalent.
>
> LT> There isn't any. It's actually pretty nasty to do, following history
> LT> backwards and keeping track of lines as they are added. I know how, I'm
> LT> just really lazy and hoping somebody else will do it, since I really end
> LT> up not caring that much myself.
>
> LT> I notice that Thomas Gleixner seems to have one, but that one is based on
> LT> a database, and doesn't look usable as a standalone command..
>
> Here is my quick-and-dirty one done in Perl.  This is dog-slow
> and not suited for interactive use, but its algorithm should
> handle the merges, renames and complete rewrites correctly.

Hmm, thinking out loud. would it help to look at the deltify scripts and 
let them find the major chunks and then look in detail only when that 
fails?

David Lang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-31 13:45 ` Eric W. Biederman
@ 2005-06-01  3:04   ` Linus Torvalds
  2005-06-01  4:06     ` Junio C Hamano
                       ` (5 more replies)
  0 siblings, 6 replies; 64+ messages in thread
From: Linus Torvalds @ 2005-06-01  3:04 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Git Mailing List

On Tue, 31 May 2005, Eric W. Biederman wrote:
> 
> I way behind the power curve on learning git at this point but
> one piece of the puzzle that CVS has that I don't believe git does
> are multiple people committing to the same repository, especially
> remotely.  I don't see that as a down side of git but it is a common
> way people CVS so it is worth documenting.

It's actually one thing git doesn't do per se.

You have to do a "git-pull-script" from the common repository side, 
there's no "git-push-script". Ugly.

Anyway, I wrote just a _very_ introductory thing in
Documentation/tutorial.txt, I'll try to update and expand on it later. It
basically has a really stupid example of "how to set up a new project".

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01  3:04   ` Linus Torvalds
@ 2005-06-01  4:06     ` Junio C Hamano
  2005-06-02 23:54       ` [PATCH] Fix -B "very-different" logic Junio C Hamano
  2005-06-01  6:28     ` I want to release a "git-1.0" Junio C Hamano
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01  4:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Anyway, I wrote just a _very_ introductory thing in
LT> Documentation/tutorial.txt, I'll try to update and expand on
LT> it later. It basically has a really stupid example of "how
LT> to set up a new project".

I've spotted a couple of typos which I will leave others to fix,
but there is one thing I am to blame.

    (Btw, current versions of git will consider the change in question to be
    so big that it's considered a whole new file, since the diff is actually
    bigger than the file.  So the helpful comments that git-commit-script
    tells you for this example will say that you deleted and re-created the
    file "a".  For a less contrieved example, these things are usually more
    obvious). 

Do you want me to do something about this with -B (and possibly
-C/-M), like skipping the comparison altogether if the file size
is smaller than, say, 1k bytes or something silly like that?  Or
not having special case for this kind of "contrived example"
preferrable?


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01  2:25       ` David Lang
@ 2005-06-01  4:53         ` Junio C Hamano
  2005-06-01 20:06           ` David Lang
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01  4:53 UTC (permalink / raw)
  To: David Lang; +Cc: Chris Wedgwood, Linus Torvalds, Git Mailing List

>>>>> "DL" == David Lang <david.lang@digitalinsight.com> writes:

DL> Hmm, thinking out loud. would it help to look at the deltify scripts
DL> and let them find the major chunks and then look in detail only when
DL> that fails?

It's unclear to me which part you are trying to help with
deltify algorithm [*1*].

Internally, git-diff-cache -B -C is used which does use the
deltify to locate complete rewrites, renames and copies (that's
why the script is so slow).  For passing on and assigning blames
line by line, parsing "diff --unified=0" output was a lot easier
for this script and that was what I did in this quick-and-dirty
version.

[Footnotes]

*1* David says "deltify" and Nico calls it "deltafy".  I am not
a native speaker so I cannot tell, but which one is correct?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01  3:04   ` Linus Torvalds
  2005-06-01  4:06     ` Junio C Hamano
@ 2005-06-01  6:28     ` Junio C Hamano
  2005-06-01 22:00     ` Daniel Barkalow
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01  6:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Anyway, I wrote just a _very_ introductory thing in
LT> Documentation/tutorial.txt, I'll try to update and expand on it later. It
LT> basically has a really stupid example of "how to set up a new project".

Linus,

        I was following your "tutorial" and saw the last step
(git-whatchanged) showing the HEAD commit and diff _twice_.

You got me _WORRIED_!!!

I knew it uses your faviorite diff-tree command and I was the
most likely suspect who broke it.  And I remember you were
understandably unhappy last time I broke it (the "diff-tree -s"
problem).

It turns out that the example in the tutorial was bad.  Here is
a fix.  It is so obvious that I do not think it deserves a
sign-off nor credit.  Please just fold it into your edit next
time you update the tutorial.

---
cd /opt/packrat/playpen/public/in-place/git/git.junio/
jit-diff : Documentation
# - linus: git-apply --stat: limit lines to 79 characters
# + (working tree)
diff --git a/Documentation/tutorial.txt b/Documentation/tutorial.txt
--- a/Documentation/tutorial.txt
+++ b/Documentation/tutorial.txt
@@ -401,7 +401,7 @@ activity.
 To see the whole history of our pitiful little git-tutorial project, we
 can do

-	git-whatchanged -p --root HEAD
+	git-whatchanged -p --root

 (the "--root" flag is a flag to git-diff-tree to tell it to show the
 initial aka "root" commit as a diff too), and you will see exactly what

Compilation finished at Tue May 31 23:12:32

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-05-30 20:49 ` Nicolas Pitre
@ 2005-06-01  6:52   ` Junio C Hamano
  2005-06-01  8:24     ` [PATCH] Add -d flag to git-pull-* family Junio C Hamano
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01  6:52 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Git Mailing List

I just remembered that I mentioned potential problems with non
rsync pulls with delta objects, especially when the git-*-pull
commands are used in "things only close to the tip" mode,
i.e. without "-a" option.  Do you think we should do something
about it before GIT 1.0 happens?  

It may be enough if we just tell people not to deltify their
public non-rsync repositories in the documentation.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH] Add -d flag to git-pull-* family.
  2005-06-01  6:52   ` Junio C Hamano
@ 2005-06-01  8:24     ` Junio C Hamano
  2005-06-01 14:39       ` Nicolas Pitre
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01  8:24 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Git Mailing List

When a remote repository is deltified, we need to get the
objects that a deltified object we want to obtain is based upon.
Since checking representation type of all objects we retreive
from remote side may be costly, this is made into a separate
option -d; -a implies it for convenience and safety.

Rsync transport does not have this problem since it fetches
everything the remote side has.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

Documentation/git-http-pull.txt  |    4 +++-
Documentation/git-local-pull.txt |    4 +++-
Documentation/git-rpull.txt      |    4 +++-
http-pull.c                      |    5 ++++-
local-pull.c                     |    5 ++++-
pull.c                           |   15 +++++++++++++++
pull.h                           |    3 +++
rpull.c                          |    5 ++++-
8 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-http-pull.txt b/Documentation/git-http-pull.txt
--- a/Documentation/git-http-pull.txt
+++ b/Documentation/git-http-pull.txt
@@ -9,7 +9,7 @@ git-http-pull - Downloads a remote GIT r
 
 SYNOPSIS
 --------
-'git-http-pull' [-c] [-t] [-a] [-v] commit-id url
+'git-http-pull' [-c] [-t] [-a] [-v] [-d] commit-id url
 
 DESCRIPTION
 -----------
@@ -17,6 +17,8 @@ Downloads a remote GIT repository via HT
 
 -c::
 	Get the commit objects.
+-d::
+	Get objects that deltified objects are based upon.
 -t::
 	Get trees associated with the commit objects.
 -a::
diff --git a/Documentation/git-local-pull.txt b/Documentation/git-local-pull.txt
--- a/Documentation/git-local-pull.txt
+++ b/Documentation/git-local-pull.txt
@@ -9,7 +9,7 @@ git-local-pull - Duplicates another GIT 
 
 SYNOPSIS
 --------
-'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path
+'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path
 
 DESCRIPTION
 -----------
@@ -19,6 +19,8 @@ OPTIONS
 -------
 -c::
 	Get the commit objects.
+-d::
+	Get objects that deltified objects are based upon.
 -t::
 	Get trees associated with the commit objects.
 -a::
diff --git a/Documentation/git-rpull.txt b/Documentation/git-rpull.txt
--- a/Documentation/git-rpull.txt
+++ b/Documentation/git-rpull.txt
@@ -10,7 +10,7 @@ git-rpull - Pulls from a remote reposito
 
 SYNOPSIS
 --------
-'git-rpull' [-c] [-t] [-a] [-v] commit-id url
+'git-rpull' [-c] [-t] [-a] [-v] [-d] commit-id url
 
 DESCRIPTION
 -----------
@@ -21,6 +21,8 @@ OPTIONS
 -------
 -c::
 	Get the commit objects.
+-d::
+	Get objects that deltified objects are based upon.
 -t::
 	Get trees associated with the commit objects.
 -a::
diff --git a/http-pull.c b/http-pull.c
--- a/http-pull.c
+++ b/http-pull.c
@@ -103,17 +103,20 @@ int main(int argc, char **argv)
 			get_tree = 1;
 		} else if (argv[arg][1] == 'c') {
 			get_history = 1;
+		} else if (argv[arg][1] == 'd') {
+			get_delta = 1;
 		} else if (argv[arg][1] == 'a') {
 			get_all = 1;
 			get_tree = 1;
 			get_history = 1;
+			get_delta = 1;
 		} else if (argv[arg][1] == 'v') {
 			get_verbosely = 1;
 		}
 		arg++;
 	}
 	if (argc < arg + 2) {
-		usage("git-http-pull [-c] [-t] [-a] [-v] commit-id url");
+		usage("git-http-pull [-c] [-t] [-a] [-d] [-v] commit-id url");
 		return 1;
 	}
 	commit_id = argv[arg];
diff --git a/local-pull.c b/local-pull.c
--- a/local-pull.c
+++ b/local-pull.c
@@ -74,7 +74,7 @@ int fetch(unsigned char *sha1)
 }
 
 static const char *local_pull_usage = 
-"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path";
+"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path";
 
 /* 
  * By default we only use file copy.
@@ -92,10 +92,13 @@ int main(int argc, char **argv)
 			get_tree = 1;
 		else if (argv[arg][1] == 'c')
 			get_history = 1;
+		else if (argv[arg][1] == 'd')
+			get_delta = 1;
 		else if (argv[arg][1] == 'a') {
 			get_all = 1;
 			get_tree = 1;
 			get_history = 1;
+			get_delta = 1;
 		}
 		else if (argv[arg][1] == 'l')
 			use_link = 1;
diff --git a/pull.c b/pull.c
--- a/pull.c
+++ b/pull.c
@@ -6,6 +6,7 @@
 
 int get_tree = 0;
 int get_history = 0;
+int get_delta = 0;
 int get_all = 0;
 int get_verbosely = 0;
 static unsigned char current_commit_sha1[20];
@@ -37,6 +38,20 @@ static int make_sure_we_have_it(const ch
 	status = fetch(sha1);
 	if (status && what)
 		report_missing(what, sha1);
+	if (get_delta) {
+		unsigned long mapsize, size;
+		void *map, *buf;
+		char type[20];
+
+		map = map_sha1_file(sha1, &mapsize);
+		if (map) {
+			buf = unpack_sha1_file(map, mapsize, type, &size);
+			munmap(map, mapsize);
+			if (buf && !strcmp(type, "delta"))
+				status = make_sure_we_have_it(what, buf);
+			free(buf);
+		}
+	}
 	return status;
 }
 
diff --git a/pull.h b/pull.h
--- a/pull.h
+++ b/pull.h
@@ -13,6 +13,9 @@ extern int get_history;
 /** Set to fetch the trees in the commit history. **/
 extern int get_all;
 
+/* Set to fetch the base of delta objects.*/
+extern int get_delta;
+
 /* Set to be verbose */
 extern int get_verbosely;
 
diff --git a/rpull.c b/rpull.c
--- a/rpull.c
+++ b/rpull.c
@@ -27,17 +27,20 @@ int main(int argc, char **argv)
 			get_tree = 1;
 		} else if (argv[arg][1] == 'c') {
 			get_history = 1;
+		} else if (argv[arg][1] == 'd') {
+			get_delta = 1;
 		} else if (argv[arg][1] == 'a') {
 			get_all = 1;
 			get_tree = 1;
 			get_history = 1;
+			get_delta = 1;
 		} else if (argv[arg][1] == 'v') {
 			get_verbosely = 1;
 		}
 		arg++;
 	}
 	if (argc < arg + 2) {
-		usage("git-rpull [-c] [-t] [-a] [-v] commit-id url");
+		usage("git-rpull [-c] [-t] [-a] [-v] [-d] commit-id url");
 		return 1;
 	}
 	commit_id = argv[arg];
------------------------------------------------


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Add -d flag to git-pull-* family.
  2005-06-01  8:24     ` [PATCH] Add -d flag to git-pull-* family Junio C Hamano
@ 2005-06-01 14:39       ` Nicolas Pitre
  2005-06-01 16:00         ` Junio C Hamano
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolas Pitre @ 2005-06-01 14:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Wed, 1 Jun 2005, Junio C Hamano wrote:

> When a remote repository is deltified, we need to get the
> objects that a deltified object we want to obtain is based upon.
> Since checking representation type of all objects we retreive
> from remote side may be costly, this is made into a separate
> option -d; -a implies it for convenience and safety.

I wonder if making this optional makes sense.  In fact, if you believe 
having the option is useful then it should probably be the other 
way around i.e. to _not_ look at deltas when it is specified.  Otherwise 
you'll end up with an incoherent repository.

To minimize the cost a lot it could be possible to uncompress just the 
first 40 bytes or so which is enough to determine if the object is a 
delta and if so what object it is against.

What do you think?

Nicolas

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Add -d flag to git-pull-* family.
  2005-06-01 14:39       ` Nicolas Pitre
@ 2005-06-01 16:00         ` Junio C Hamano
       [not found]           ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net>
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01 16:00 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Git Mailing List

>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:

NP> What do you think?

What you say makes a lot more sense than my quick hack on both
counts.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01  4:53         ` Junio C Hamano
@ 2005-06-01 20:06           ` David Lang
  2005-06-01 20:16             ` C. Scott Ananian
  2005-06-01 23:03             ` Junio C Hamano
  0 siblings, 2 replies; 64+ messages in thread
From: David Lang @ 2005-06-01 20:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Chris Wedgwood, Linus Torvalds, Git Mailing List

On Tue, 31 May 2005, Junio C Hamano wrote:

>>>>>> "DL" == David Lang <david.lang@digitalinsight.com> writes:
>
> DL> Hmm, thinking out loud. would it help to look at the deltify scripts
> DL> and let them find the major chunks and then look in detail only when
> DL> that fails?
>
> It's unclear to me which part you are trying to help with
> deltify algorithm [*1*].

I was thinking that the speedups (only look for similar sized files, etc) 
would help narrow the search. Also each chunk that's different should be 
able to be able to be annotated as a chunk, instead of by individual line

> Internally, git-diff-cache -B -C is used which does use the
> deltify to locate complete rewrites, renames and copies (that's
> why the script is so slow).  For passing on and assigning blames
> line by line, parsing "diff --unified=0" output was a lot easier
> for this script and that was what I did in this quick-and-dirty
> version.

I was under the impressin that the deltafy stuff was significantly faster 
then you are suggeting that it is here

> [Footnotes]
>
> *1* David says "deltify" and Nico calls it "deltafy".  I am not
> a native speaker so I cannot tell, but which one is correct?

Nico is correct

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01 20:06           ` David Lang
@ 2005-06-01 20:16             ` C. Scott Ananian
  2005-06-02  0:43               ` Nicolas Pitre
  2005-06-01 23:03             ` Junio C Hamano
  1 sibling, 1 reply; 64+ messages in thread
From: C. Scott Ananian @ 2005-06-01 20:16 UTC (permalink / raw)
  To: David Lang
  Cc: Junio C Hamano, Chris Wedgwood, Linus Torvalds, Git Mailing List

On Wed, 1 Jun 2005, David Lang wrote:

>> *1* David says "deltify" and Nico calls it "deltafy".  I am not
>> a native speaker so I cannot tell, but which one is correct?
>
> Nico is correct

Au contraire.  The common *pronunciation* may be 'delta-fy', but the 
correct spelling should be 'deltify'.  The google oracle agrees (1,440 vs 
54) as does the spelling of the svnadmin command.  (Of course, what google 
is really measuring is relative frequency of 'git' vs 'svn'.)

$ grep '[^if]fy$' /usr/dict/american-english-large

shows that the only vowels other than 'i' which preced the '-fy' morpheme 
are 'e's, and they only appear in words like 'liquefy' where the root has 
been substantially altered.  Most sources (eg
http://www.southampton.liunet.edu/academic/pau/course/websuf.htm#IFYVERB
) list the morpheme as '-ify'.  See
     http://m-w.com/cgi-bin/dictionary?book=Dictionary&va=ify
and compare
     http://m-w.com/cgi-bin/dictionary?book=Dictionary&va=fy

Contrary to David's assertion, David is right.
  --scott

United Nations KMPLEBE AMTHUG AVBRANDY UNIFRUIT chemical agent tonight 
ZPSEMANTIC ODYOKE struggle PBCABOOSE FJDEFLECT CLOWER MKSEARCH ZRBRIEF
                          ( http://cscott.net/ )

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01  3:04   ` Linus Torvalds
  2005-06-01  4:06     ` Junio C Hamano
  2005-06-01  6:28     ` I want to release a "git-1.0" Junio C Hamano
@ 2005-06-01 22:00     ` Daniel Barkalow
  2005-06-01 23:05       ` Junio C Hamano
  2005-06-03  9:47       ` Petr Baudis
  2005-06-02  7:15     ` Eric W. Biederman
                       ` (2 subsequent siblings)
  5 siblings, 2 replies; 64+ messages in thread
From: Daniel Barkalow @ 2005-06-01 22:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Eric W. Biederman, Git Mailing List

On Tue, 31 May 2005, Linus Torvalds wrote:

> On Tue, 31 May 2005, Eric W. Biederman wrote:
> > 
> > I way behind the power curve on learning git at this point but
> > one piece of the puzzle that CVS has that I don't believe git does
> > are multiple people committing to the same repository, especially
> > remotely.  I don't see that as a down side of git but it is a common
> > way people CVS so it is worth documenting.
> 
> It's actually one thing git doesn't do per se.
> 
> You have to do a "git-pull-script" from the common repository side, 
> there's no "git-push-script". Ugly.

It shouldn't be hard to do one, except that locking with rsync is going to
be a pain. I had a patch to make it work with the rpush/rpull pair, but I
didn't get its dependancies in at the time. I can dust those patches off
again if you want that functionality included.

The patches are essentially:

 - make the transport protocol handle things other than objects
 - library procedure for locking atomic update of refs files
 - fetching refs in general
 - rpull/rpush that updates a specified ref file atomically

At least the first would be very nice to get in before 1.0, since it is an
incompatible change to the protocol.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01 20:06           ` David Lang
  2005-06-01 20:16             ` C. Scott Ananian
@ 2005-06-01 23:03             ` Junio C Hamano
  1 sibling, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01 23:03 UTC (permalink / raw)
  To: David Lang; +Cc: Chris Wedgwood, Linus Torvalds, Git Mailing List

>>>>> "DL" == David Lang <david.lang@digitalinsight.com> writes:

>> Internally, git-diff-cache -B -C is used which does use the
>> deltify to locate complete rewrites, renames and copies (that's
>> why the script is so slow).  For passing on and assigning blames
>> line by line, parsing "diff --unified=0" output was a lot easier
>> for this script and that was what I did in this quick-and-dirty
>> version.

DL> I was under the impressin that the deltafy stuff was significantly
DL> faster then you are suggeting that it is here

I perhaps phrased it poorly.

The slow part is not a single delta operation, but having to run
many delta operations between all combinations of rename/copy
candidates, which is O(n * m) where n is the number of newly
created files (counting "broken" ones created by -B flag) and m
is the number of (deleted, modified and unmodified) files in the
original tree.



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01 22:00     ` Daniel Barkalow
@ 2005-06-01 23:05       ` Junio C Hamano
  2005-06-03  9:47       ` Petr Baudis
  1 sibling, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-01 23:05 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, Eric W. Biederman, Git Mailing List

>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:

DB> It shouldn't be hard to do one, except that locking with
DB> rsync is going to be a pain. I had a patch to make it work
DB> with the rpush/rpull pair, but I didn't get its dependancies
DB> in at the time. I can dust those patches off again if you
DB> want that functionality included.

Talking about pulls, wouldn't it be nicer to (re)name it to
git-ssh-pull, for consistency with others, especially before we
hit 1.0?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01 20:16             ` C. Scott Ananian
@ 2005-06-02  0:43               ` Nicolas Pitre
  2005-06-02  1:14                 ` Brian O'Mahoney
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolas Pitre @ 2005-06-02  0:43 UTC (permalink / raw)
  To: C. Scott Ananian
  Cc: David Lang, Junio C Hamano, Chris Wedgwood, Linus Torvalds,
	Git Mailing List

On Wed, 1 Jun 2005, C. Scott Ananian wrote:

> On Wed, 1 Jun 2005, David Lang wrote:
> 
> > > *1* David says "deltify" and Nico calls it "deltafy".  I am not
> > > a native speaker so I cannot tell, but which one is correct?
> > 
> > Nico is correct
> 
> Au contraire.  The common *pronunciation* may be 'delta-fy', but the correct
> spelling should be 'deltify'.

Ainsi soit-il alors.

I'm not a native english speaker either so I defer to anyone with better 
english knowledge.


Nicolas

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Handle deltified object correctly in git-*-pull family.
       [not found]           ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net>
@ 2005-06-02  0:47             ` Nicolas Pitre
       [not found]             ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net>
  2005-06-02  0:58             ` [PATCH] Handle deltified object correctly in git-*-pull family Linus Torvalds
  2 siblings, 0 replies; 64+ messages in thread
From: Nicolas Pitre @ 2005-06-02  0:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds,
	Daniel Barkalow <barkalow@iabervon.org> Git Mailing List

On Wed, 1 Jun 2005, Junio C Hamano wrote:

> *** Dan and Nico, could you check this for correctness?  I've
> *** tested it with a deltified core GIT repository and pulling
> *** with local-pull from there.  I have verified that a pull
> *** that fails with -d flag retrieves the right base-object to
> *** complete a deltified ones.

The delta part looks fine to me.


Nicolas

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Stop inflating the whole SHA1 file only to check size.
       [not found]             ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net>
@ 2005-06-02  0:51               ` Nicolas Pitre
  2005-06-02  1:32                 ` Junio C Hamano
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolas Pitre @ 2005-06-02  0:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds,
	Daniel Barkalow <barkalow@iabervon.org> Git Mailing List

On Wed, 1 Jun 2005, Junio C Hamano wrote:

> Using the new unpack_sha1_file_partial() function, stop
> inflating the whole SHA1 file when rename detector wants to know
> only the filesize.

Beware.  If you have delta objects you'll get the size of the delta 
itself and not the final object size, unless you recurse until a non 
delta object is found.


Nicolas

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Handle deltified object correctly in git-*-pull family.
       [not found]           ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net>
  2005-06-02  0:47             ` [PATCH] Handle deltified object correctly in git-*-pull family Nicolas Pitre
       [not found]             ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net>
@ 2005-06-02  0:58             ` Linus Torvalds
  2005-06-02  1:43               ` Junio C Hamano
  2 siblings, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2005-06-02  0:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nicolas Pitre,
	Daniel Barkalow <barkalow@iabervon.org> Git Mailing List



On Wed, 1 Jun 2005, Junio C Hamano wrote:
> 
> *** Linus, I have a hook in sha1_file.c to let me figure out the
> *** size of the SHA1 file without fully expanding it.  This
> *** patch does not use it, but you already know where I am
> *** heading, so please leave it there ;-). 

Argh. This is just adding conceptual complexity without any real 
advantage.

Why not just split out the current "unpack_sha1_file()" into two stages: 
"unpack_sha1_header()" and the rest.

Then you can just decide to call "unpack_sha1_header()" when you want the 
header information.

Hmm. I just committed something like that. If you want to just see the 
type of an object, you can map the object in memory, and just do

	z_stream stream;
	char buffer[100];

	if (unpack_sha1_header(&stream, map, mapsize, buffer, sizeof(buffer) < 0)
		return NULL;
	if (sscanf(buffer, %10s %lu", type, size) != 0)
		return NULL;
	.. there you have it ..

which is a lot simpler than worrying about callbacks etc.

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-02  0:43               ` Nicolas Pitre
@ 2005-06-02  1:14                 ` Brian O'Mahoney
  0 siblings, 0 replies; 64+ messages in thread
From: Brian O'Mahoney @ 2005-06-02  1:14 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: C. Scott Ananian, David Lang, Junio C Hamano, Chris Wedgwood,
	Linus Torvalds, Git Mailing List

Neither are _correct_, both are slang and a new word, but
English is good at that,

represent via a delta, would be traditional,

but 'deltify' sounds nicer.

Nicolas Pitre wrote:
> On Wed, 1 Jun 2005, C. Scott Ananian wrote:
> 
> 
>>On Wed, 1 Jun 2005, David Lang wrote:
>>
>>
>>>>*1* David says "deltify" and Nico calls it "deltafy".  I am not
>>>>a native speaker so I cannot tell, but which one is correct?
>>>
>>>Nico is correct
>>
>>Au contraire.  The common *pronunciation* may be 'delta-fy', but the correct
>>spelling should be 'deltify'.
> 
> 
> Ainsi soit-il alors.
> 
> I'm not a native english speaker either so I defer to anyone with better 
> english knowledge.
> 
> 
> Nicolas
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

-- 
mit freundlichen Grüßen, Brian.

Dr. Brian O'Mahoney
Mobile +41 (0)79 334 8035 Email: omb@bluewin.ch
Bleicherstrasse 25, CH-8953 Dietikon, Switzerland
PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D  F5 14 39 C9 6D 38 56 D5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Stop inflating the whole SHA1 file only to check size.
  2005-06-02  0:51               ` [PATCH] Stop inflating the whole SHA1 file only to check size Nicolas Pitre
@ 2005-06-02  1:32                 ` Junio C Hamano
  0 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-02  1:32 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Linus Torvalds,
	Daniel Barkalow <barkalow@iabervon.org> Git Mailing List

>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:

NP> On Wed, 1 Jun 2005, Junio C Hamano wrote:
>> Using the new unpack_sha1_file_partial() function, stop
>> inflating the whole SHA1 file when rename detector wants to know
>> only the filesize.

NP> Beware.

You are right.  I cannot believe how stupid I am, falling into
this trap _just_ _after_ looking at the delta stuff X-<.

Linus please drop that one.




^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Handle deltified object correctly in git-*-pull family.
  2005-06-02  0:58             ` [PATCH] Handle deltified object correctly in git-*-pull family Linus Torvalds
@ 2005-06-02  1:43               ` Junio C Hamano
  0 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-02  1:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nicolas Pitre,
	Daniel Barkalow <barkalow@iabervon.org> Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Why not just split out the current "unpack_sha1_file()" into two stages: 
LT> "unpack_sha1_header()" and the rest.
LT> which is a lot simpler than worrying about callbacks etc.

Alright.



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01  3:04   ` Linus Torvalds
                       ` (2 preceding siblings ...)
  2005-06-01 22:00     ` Daniel Barkalow
@ 2005-06-02  7:15     ` Eric W. Biederman
  2005-06-02  8:32       ` Kay Sievers
  2005-06-02 14:52       ` Linus Torvalds
  2005-06-02 12:02     ` [PATCH] several typos in tutorial Alexey Nezhdanov
  2005-06-02 23:40     ` I want to release a "git-1.0" Adam Kropelin
  5 siblings, 2 replies; 64+ messages in thread
From: Eric W. Biederman @ 2005-06-02  7:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@osdl.org> writes:

> Anyway, I wrote just a _very_ introductory thing in
> Documentation/tutorial.txt, I'll try to update and expand on it later. It
> basically has a really stupid example of "how to set up a new project".

So I need to do a git checkout of the latest version of git to
read the tutorial?  So I can figure out how to use git?

Catch 22? :)

Eric

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-02  7:15     ` Eric W. Biederman
@ 2005-06-02  8:32       ` Kay Sievers
  2005-06-02 14:52       ` Linus Torvalds
  1 sibling, 0 replies; 64+ messages in thread
From: Kay Sievers @ 2005-06-02  8:32 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, Git Mailing List

On Thu, Jun 02, 2005 at 01:15:59AM -0600, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > Anyway, I wrote just a _very_ introductory thing in
> > Documentation/tutorial.txt, I'll try to update and expand on it later. It
> > basically has a really stupid example of "how to set up a new project".
> 
> So I need to do a git checkout of the latest version of git to
> read the tutorial?  So I can figure out how to use git?

No problem: :)
  http://www.kernel.org/git/?p=git/git.git;a=blob;f=Documentation/tutorial.txt

Kay

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH] several typos in tutorial
  2005-06-01  3:04   ` Linus Torvalds
                       ` (3 preceding siblings ...)
  2005-06-02  7:15     ` Eric W. Biederman
@ 2005-06-02 12:02     ` Alexey Nezhdanov
  2005-06-02 12:41       ` Vincent Hanquez
  2005-06-02 23:40     ` I want to release a "git-1.0" Adam Kropelin
  5 siblings, 1 reply; 64+ messages in thread
From: Alexey Nezhdanov @ 2005-06-02 12:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Alexey Nezhdanov

Signed-off-by: Alexey Nezhdanov <snake@penza-gsm.ru>
---
diff --git a/Documentation/tutorial.txt b/Documentation/tutorial.txt
--- a/Documentation/tutorial.txt
+++ b/Documentation/tutorial.txt
@@ -298,7 +298,7 @@ have committed something, we can also le
 
 Unlike "git-diff-files", which showed the difference between the index
 file and the working directory, "git-diff-cache" shows the differences
-between a committed _tree_ and the index file.  In other words,
+between a committed _tree_ and the working directory.  In other words,
 git-diff-cache wants a tree to be diffed against, and before we did the
 commit, we couldn't do that, because we didn't have anything to diff
 against. 
@@ -423,8 +423,8 @@ With that, you should now be having some
 can explore on your own.
 
 
-	Copoying archives
-	-----------------
+	Copying archives
+	----------------
 
 Git arhives are normally totally self-sufficient, and it's worth noting
 that unlike CVS, for example, there is no separate notion of


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] several typos in tutorial
  2005-06-02 12:02     ` [PATCH] several typos in tutorial Alexey Nezhdanov
@ 2005-06-02 12:41       ` Vincent Hanquez
  2005-06-02 12:45         ` Alexey Nezhdanov
  0 siblings, 1 reply; 64+ messages in thread
From: Vincent Hanquez @ 2005-06-02 12:41 UTC (permalink / raw)
  To: Alexey Nezhdanov; +Cc: Linus Torvalds, Git Mailing List

On Thu, Jun 02, 2005 at 04:02:07PM +0400, Alexey Nezhdanov wrote:
>  Git arhives are normally totally self-sufficient, and it's worth noting
       ^^^^^^^
and one more here

-- 
Vincent Hanquez

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] several typos in tutorial
  2005-06-02 12:41       ` Vincent Hanquez
@ 2005-06-02 12:45         ` Alexey Nezhdanov
  2005-06-02 12:51           ` Vincent Hanquez
  0 siblings, 1 reply; 64+ messages in thread
From: Alexey Nezhdanov @ 2005-06-02 12:45 UTC (permalink / raw)
  To: git; +Cc: Vincent Hanquez, Linus Torvalds

On thursday, 02 June 2005 16:41 Vincent Hanquez wrote:
> On Thu, Jun 02, 2005 at 04:02:07PM +0400, Alexey Nezhdanov wrote:
> >  Git arhives are normally totally self-sufficient, and it's worth noting
>
>        ^^^^^^^
> and one more here
Why? It's ok to speak about many [existing] archives here.

-- 
Respectfully
Alexey Nezhdanov


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] several typos in tutorial
  2005-06-02 12:45         ` Alexey Nezhdanov
@ 2005-06-02 12:51           ` Vincent Hanquez
  2005-06-02 12:56             ` Alexey Nezhdanov
  2005-06-02 13:00             ` Alexey Nezhdanov
  0 siblings, 2 replies; 64+ messages in thread
From: Vincent Hanquez @ 2005-06-02 12:51 UTC (permalink / raw)
  To: Alexey Nezhdanov; +Cc: git, Linus Torvalds

On Thu, Jun 02, 2005 at 04:45:15PM +0400, Alexey Nezhdanov wrote:
> On thursday, 02 June 2005 16:41 Vincent Hanquez wrote:
> > On Thu, Jun 02, 2005 at 04:02:07PM +0400, Alexey Nezhdanov wrote:
> > >  Git arhives are normally totally self-sufficient, and it's worth noting
> >
> >        ^^^^^^^
> > and one more here
> Why? It's ok to speak about many [existing] archives here.

it's missing a 'c'

-- 
Vincent Hanquez

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] several typos in tutorial
  2005-06-02 12:51           ` Vincent Hanquez
@ 2005-06-02 12:56             ` Alexey Nezhdanov
  2005-06-02 13:00             ` Alexey Nezhdanov
  1 sibling, 0 replies; 64+ messages in thread
From: Alexey Nezhdanov @ 2005-06-02 12:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Vincent Hanquez

Signed-off-by: Alexey Nezhdanov <snake@penza-gsm.ru>
---
diff --git a/Documentation/tutorial.txt b/Documentation/tutorial.txt
--- a/Documentation/tutorial.txt
+++ b/Documentation/tutorial.txt
@@ -298,7 +298,7 @@ have committed something, we can also le
 
 Unlike "git-diff-files", which showed the difference between the index
 file and the working directory, "git-diff-cache" shows the differences
-between a committed _tree_ and the index file.  In other words,
+between a committed _tree_ and the working directory.  In other words,
 git-diff-cache wants a tree to be diffed against, and before we did the
 commit, we couldn't do that, because we didn't have anything to diff
 against. 
@@ -423,10 +423,10 @@ With that, you should now be having some
 can explore on your own.
 
 
-	Copoying archives
-	-----------------
+	Copying archives
+	----------------
 
-Git arhives are normally totally self-sufficient, and it's worth noting
+Git archives are normally totally self-sufficient, and it's worth noting
 that unlike CVS, for example, there is no separate notion of
 "repository" and "working tree".  A git repository normally _is_ the
 working tree, with the local git information hidden in the ".git"


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] several typos in tutorial
  2005-06-02 12:51           ` Vincent Hanquez
  2005-06-02 12:56             ` Alexey Nezhdanov
@ 2005-06-02 13:00             ` Alexey Nezhdanov
  1 sibling, 0 replies; 64+ messages in thread
From: Alexey Nezhdanov @ 2005-06-02 13:00 UTC (permalink / raw)
  To: git; +Cc: Vincent Hanquez

On thursday, 02 June 2005 16:51 Vincent Hanquez wrote:
> On Thu, Jun 02, 2005 at 04:45:15PM +0400, Alexey Nezhdanov wrote:
> > On thursday, 02 June 2005 16:41 Vincent Hanquez wrote:
> > > On Thu, Jun 02, 2005 at 04:02:07PM +0400, Alexey Nezhdanov wrote:
> > > >  Git arhives are normally totally self-sufficient, and it's worth
> > > > noting
> > >
> > >        ^^^^^^^
> > > and one more here
> >
> > Why? It's ok to speak about many [existing] archives here.
>
> it's missing a 'c'
ok :)

-- 
Respectfully
Alexey Nezhdanov


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-02  7:15     ` Eric W. Biederman
  2005-06-02  8:32       ` Kay Sievers
@ 2005-06-02 14:52       ` Linus Torvalds
  1 sibling, 0 replies; 64+ messages in thread
From: Linus Torvalds @ 2005-06-02 14:52 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Git Mailing List



On Thu, 2 Jun 2005, Eric W. Biederman wrote:
> 
> So I need to do a git checkout of the latest version of git to
> read the tutorial?  So I can figure out how to use git?

Just use the gitweb thing, it's easy to read off there..

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* CVS migration section to the tutorial.
  2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
                   ` (9 preceding siblings ...)
  2005-05-31 13:45 ` Eric W. Biederman
@ 2005-06-02 19:43 ` Junio C Hamano
  10 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-02 19:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

I think a section to discuss "I am used to doing 'cvs xxx' to solve
this problem, how do I do that in GIT" would be a good idea.  Here is
an example to talk about "cvs annotate".

------------

CVS annotate.

The core GIT itself does not do "cvs annotate" equivalent, but it has
something much nicer.

Let's step back a bit and think about the reason why you would want to
do "cvs annotate a-file.c" to begin with.

	- Are you really interested in _all_ the lines in that file?

	- Are you interested in lines _only_ in that file and do not
          care if the file was created by renaming from a different
          file?

You would use "cvs annotate" on a file when you have trouble with a
function (or even a single "if" statement in that function) that
happens to be defined in the file, which does not do what you want it
to do.  And you would want to find out why it was written in that way,
because you are about to modify it to suit your needs, and at the same
time you do not want to break its current callers.  For that, you want
to find out why the original author did things that way in the
original context.  That's why you want "cvs annotate".  So your answer
to the first question _should_ be "no".  You do not care about the
whole file, only a segment of it.

Also, in the original context, the same statement might have appeared
at first in a different file and later the file was renamed to
"a-file.c".  Or the entire program may have constructs similar to the
"if" statement you are having trouble with in different places, that
you are still not aware of.  So your answer to the second question
_should_ be "no" as well.

As an example, assuming that you have this piece code that you are
interested in in the HEAD version:

	if (frotz) {
		nitfol();
	}

you would use git-rev-list and git-diff-tree like this:

	$ git-rev-list HEAD |
	  git-diff-tree --stdin -v -p -S'if (frotz) {
		nitfol();
	}'

We have already talked about the "--stdin" form of git-diff-tree
command that reads the list of commits and compares each commit with
its parents.  What the -S flag and its argument does is called
"pickaxe", a tool for software archaeologists.  When "pickaxe" is
used, git-diff-tree command outputs differences between two commits
only if one tree has the specified string in a file and the
corresponding file in the other tree does not.  The above example
looks for a commit that has the "if" statement in it in a file, but
its parent commit does not have it in the same shape in the
corresponding file (or the other way around, where the parent has it
and the commit does not), and the differences between them are shown,
along with the commit message (thanks to the -v flag).  It does not
show anything for commits that do not touch this "if" statement.

To make things more interesting, you can give the -C flag to
git-diff-tree, like this:

	$ git-rev-list HEAD |
	  git-diff-tree --stdin -v -p -C -S'if (frotz) {
		nitfol();
	}'

When the -C flag is used, file renames and copies are followed.  So if
the "if" statement in question happens to be in "a-file.c" in the
current HEAD commit, even if the file was originally called "o-file.c"
and then renamed in an earlier commit, or if the file was created by
copying an existing "o-file.c" in an earlier commit, you will not lose
track.  If the "if" statement did not change across such rename or
copy, then the commit that does rename or copy would not show in the
output, and if the "if" statement was modified while the file was
still called "o-file.c", it would find the commit that changed the
statement when it was in "o-file.c".

[ BTW, the current versions of "git-diff-tree -C" is not eager enough
  to find copies, and it will miss the fact that a-file.c was created
  by copying o-file.c unless o-file.c was somehow changed in the same
  commit.]

To make things even more interesting, you can use the --pickaxe-all
flag in addition to the -S flag.  This causes the differences from all
the files contained in those two commits, not just the differences
between the files that contain this changed "if" statement:

	$ git-rev-list HEAD |
	  git-diff-tree --stdin -v -p -C -S'if (frotz) {
		nitfol();
	}' --pickaxe-all

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01  3:04   ` Linus Torvalds
                       ` (4 preceding siblings ...)
  2005-06-02 12:02     ` [PATCH] several typos in tutorial Alexey Nezhdanov
@ 2005-06-02 23:40     ` Adam Kropelin
  2005-06-03  0:06       ` Linus Torvalds
  5 siblings, 1 reply; 64+ messages in thread
From: Adam Kropelin @ 2005-06-02 23:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds wrote:
> Anyway, I wrote just a _very_ introductory thing in
> Documentation/tutorial.txt, I'll try to update and expand on it later. 
> It
> basically has a really stupid example of "how to set up a new 
> project".

I've been working my way thru the tutorial, trying to up my git clue 
level a bit. One part where things start to go a bit pear-shaped for me 
is in the description of git-diff-files vs. git-diff-cache. The tutorial 
takes pains to emphasize the difference between "working directory 
contents", "index file", and "committed tree", and I'm on board with 
that. What confuses me is the following:

> Unlike "git-diff-files", which showed the difference between the index
> file and the working directory, "git-diff-cache" shows the differences
> between a committed _tree_ and the index file.
> ...
> [example where git-diff-cache shows difference between working
> directory and committed tree]
> ...
> "git-diff-cache" also has a specific flag "--cached", which is used to
> tell it to show the differences purely with the index file, and ignore
> the current working directory state entirely

The example and the description of --cached seem to contradict the first 
sentence's description the tool's purpose in life. If it shows you 
differences between a committed tree and the index file, why is it 
looking in my working directory at all? In order to get the behavior the 
first sentence describes you actually have to use --cached.

Am I on right track?

--Adam

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH] Fix -B "very-different" logic.
  2005-06-01  4:06     ` Junio C Hamano
@ 2005-06-02 23:54       ` Junio C Hamano
  2005-06-03  0:21         ` Linus Torvalds
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2005-06-02 23:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

>>>>> "JCH" == Junio C Hamano <junkio@cox.net> writes:

>     (Btw, current versions of git will consider the change
>     in question to be so big that it's considered a whole
>     new file, since the diff is actually bigger than the
>     file. 

JCH> Do you want me to do something about this with -B (and possibly
JCH> -C/-M), like skipping the comparison altogether if the file size
JCH> is smaller than, say, 1k bytes or something silly like that?  Or
JCH> not having special case for this kind of "contrived example"
JCH> preferrable?

I was looking at the -B code.  The reason it thinks change is
too big is because xdelta tells us to reconstruct the
destination by all new literal bytes in this small string case.
There is not much I can do about it.

However I think the diffcore-break algorithm itself was basing
its "very_different" computation on numbers somewhat bogus.  It
was counting newly inserted bytes into account, but amount of
those bytes should not make any difference when determining if
the change is a complete rewrite.

I suspect that -M/-C heuristics has similar (if not the same)
issues, but I would like to address that separately.

Here is a proposed fix for -B.  It also tells diffcore-break not
to break a file smaller than 400 bytes.  I did not make this
number configurable, since that would be too many knobs to
tweak.  If somebody feels strong enough about it, it can be made
into an option later, but for now that size "feels" reasonable.

  -- >8 -- cut here -- >8 --

------------
What we are interested in here is how much the original source
material remains in the final result, and it does not really
matter how much new contents are added as part of the edit.  If
you remove 97 lines from an original 100-line document, it does
not matter if you add 47 lines of your own to make a 50-line
document, or if you add 997 lines to make a 1000-line document.
Either way, you did a complete rewrite.

Earlier code counted both new material and deletions to detect
complete rewrites.  This patch fixes it.  With its default
setting, it detects three such complete rewrites in the core-GIT
repository.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 count-delta.h    |    1 +
 diffcore.h       |    4 ++-
 count-delta.c    |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 diffcore-break.c |   55 +++++++++++++++---------------------------
 4 files changed, 92 insertions(+), 38 deletions(-)

diff --git a/count-delta.h b/count-delta.h
--- a/count-delta.h
+++ b/count-delta.h
@@ -5,5 +5,6 @@
 #define COUNT_DELTA_H
 
 unsigned long count_delta(void *, unsigned long);
+unsigned long count_excluded_source_material(void *, unsigned long);
 
 #endif
diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -10,9 +10,9 @@
  */
 #define MAX_SCORE 60000
 #define DEFAULT_RENAME_SCORE 30000 /* rename/copy similarity minimum (50%) */
-#define DEFAULT_BREAK_SCORE  59400 /* minimum for break to happen (99%)*/
+#define DEFAULT_BREAK_SCORE  48000 /* minimum for break to happen    (80%) */
 
-#define RENAME_DST_MATCHED 01
+#define DIFF_MINIMUM_BREAK   400 /* minimum size of source that -B breaks */
 
 struct diff_filespec {
 	unsigned char sha1[20];
diff --git a/count-delta.c b/count-delta.c
--- a/count-delta.c
+++ b/count-delta.c
@@ -93,3 +93,73 @@ unsigned long count_delta(void *delta_bu
 		return 0;
 	return (src_size - copied_from_source) + added_literal;
 }
+
+
+/*
+ * What we are interested in here is how much the original source
+ * material remains in the final result, and it does not really matter
+ * how much new contents are added as part of the edit.  If you remove
+ * 97 lines from an original 100-line document, it does not matter if
+ * you add 47 lines of your own to make a 50-line document, or if you
+ * add 997 lines to make a 1000-line document.  Either way, you did a
+ * complete rewrite.
+ *
+ * Note.  We do not interprete delta fully.  Instead, we look at xdelta
+ * instructions that copy bytes from the source, and count those copied
+ * bytes.  Subtracting this number from the original source size yields
+ * the number of bytes not used from the source material.  In the above
+ * example, this number corresponds to 97-line (but we count in bytes).
+ */
+unsigned long count_excluded_source_material(void *delta_buf,
+					     unsigned long delta_size)
+{
+	unsigned long copied_from_source;
+	const unsigned char *data, *top;
+	unsigned char cmd;
+	unsigned long src_size, dst_size, out;
+
+	/* the smallest delta size possible is 6 bytes */
+	if (delta_size < 6)
+		return UINT_MAX;
+
+	data = delta_buf;
+	top = delta_buf + delta_size;
+
+	src_size = get_hdr_size(&data);
+	dst_size = get_hdr_size(&data);
+
+	copied_from_source = out = 0;
+	while (data < top) {
+		cmd = *data++;
+		if (cmd & 0x80) {
+			unsigned long cp_off = 0, cp_size = 0;
+			if (cmd & 0x01) cp_off = *data++;
+			if (cmd & 0x02) cp_off |= (*data++ << 8);
+			if (cmd & 0x04) cp_off |= (*data++ << 16);
+			if (cmd & 0x08) cp_off |= (*data++ << 24);
+			if (cmd & 0x10) cp_size = *data++;
+			if (cmd & 0x20) cp_size |= (*data++ << 8);
+			if (cp_size == 0) cp_size = 0x10000;
+
+			if (cmd & 0x40)
+				/* copy from dst */
+				;
+			else
+				copied_from_source += cp_size;
+			out += cp_size;
+		} else {
+			/* write literal into dst */
+			out += cmd;
+			data += cmd;
+		}
+	}
+
+	/* sanity check */
+	if (data != top || out != dst_size)
+		return UINT_MAX;
+
+	if (src_size < copied_from_source)
+		/* we ended up overcounting and underflowed; I dunno why */
+		return 0;
+	return src_size - copied_from_source;
+}
diff --git a/diffcore-break.c b/diffcore-break.c
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -13,63 +13,46 @@ static int very_different(struct diff_fi
 {
 	/* dst is recorded as a modification of src.  Are they so
 	 * different that we are better off recording this as a pair
-	 * of delete and create?  min_score is the minimum amount of
-	 * new material that must exist in the dst and not in src for
-	 * the pair to be considered a complete rewrite, and recommended
-	 * to be set to a very high value, 99% or so.
+	 * of delete and create?
 	 *
-	 * The value we return represents the amount of new material
-	 * that is in dst and not in src.  We return 0 when we do not
-	 * want to get the filepair broken.
+	 * We base the score on the amount of material originally from
+	 * src that still remains in the dst.  If src was 100-line
+	 * file among which only 3-line remains in the dst, then it is
+	 * a complete rewrite with 97% "change", and it does not
+	 * matter if the resulting file is a 15-line file or a
+	 * 2000-line file.  On the other hand, if 40-line remains
+	 * among those 100-lines, even if the resulting file is a
+	 * 2000-lines file, it still is an edit with 60% "change",
+	 * which may sound counter-intuitive at first but that is the
+	 * right number to use.
 	 */
+
 	void *delta;
-	unsigned long delta_size, base_size;
+	unsigned long delta_size;
 
 	if (!S_ISREG(src->mode) || !S_ISREG(dst->mode))
 		return 0; /* leave symlink rename alone */
 
-	if (diff_populate_filespec(src, 1) || diff_populate_filespec(dst, 1))
-		return 0; /* error but caught downstream */
-
-	delta_size = ((src->size < dst->size) ?
-		      (dst->size - src->size) : (src->size - dst->size));
-
-	/* Notice that we use max of src and dst as the base size,
-	 * unlike rename similarity detection.  This is so that we do
-	 * not mistake a large addition as a complete rewrite.
-	 */
-	base_size = ((src->size < dst->size) ? dst->size : src->size);
-
-	/*
-	 * If file size difference is too big compared to the
-	 * base_size, we declare this a complete rewrite.
-	 */
-	if (base_size * min_score < delta_size * MAX_SCORE)
-		return MAX_SCORE;
-
 	if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
 		return 0; /* error but caught downstream */
 
+	if (src->size < DIFF_MINIMUM_BREAK)
+		return 0; /* Too small to consider breaking */
+
 	delta = diff_delta(src->data, src->size,
 			   dst->data, dst->size,
 			   &delta_size);
 
-	/* A delta that has a lot of literal additions would have
-	 * big delta_size no matter what else it does.
-	 */
-	if (base_size * min_score < delta_size * MAX_SCORE)
-		return MAX_SCORE;
-
 	/* Estimate the edit size by interpreting delta. */
-	delta_size = count_delta(delta, delta_size);
+	delta_size = count_excluded_source_material(delta, delta_size);
 	free(delta);
 	if (delta_size == UINT_MAX)
 		return 0; /* error in delta computation */
 
-	if (base_size < delta_size)
+	if (src->size < delta_size)
 		return MAX_SCORE;
 
-	return delta_size * MAX_SCORE / base_size; 
+	return delta_size * MAX_SCORE / src->size;
 }
 
 void diffcore_break(int min_score)
------------


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-02 23:40     ` I want to release a "git-1.0" Adam Kropelin
@ 2005-06-03  0:06       ` Linus Torvalds
  2005-06-03  0:47         ` Linus Torvalds
  0 siblings, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2005-06-03  0:06 UTC (permalink / raw)
  To: Adam Kropelin; +Cc: Git Mailing List



On Thu, 2 Jun 2005, Adam Kropelin wrote:
> What confuses me is the following:

Yeah, I'll try to clarify.

git-diff-cache can show the difference between a tree and either the index 
_or_ the working directory. Will fix up.

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Fix -B "very-different" logic.
  2005-06-02 23:54       ` [PATCH] Fix -B "very-different" logic Junio C Hamano
@ 2005-06-03  0:21         ` Linus Torvalds
  2005-06-03  1:33           ` Junio C Hamano
  0 siblings, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2005-06-03  0:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Thu, 2 Jun 2005, Junio C Hamano wrote:
> 
> However I think the diffcore-break algorithm itself was basing
> its "very_different" computation on numbers somewhat bogus.  It
> was counting newly inserted bytes into account, but amount of
> those bytes should not make any difference when determining if
> the change is a complete rewrite.

Careful. 

I think the amount of new code _should_ matter. Otherwise, an old empty
file would always be considered the source of a new file, since the diff 
doesn't remove anything. Similarly, just because we have a boilerplate 
file shouldn't make that always be considered a "wonderful source", when 
people add the real meat to it.

So I think you're on the right track, but I don't think you should
entirely dismiss "lots of stuff added" as a reason for a "break". I think
that if the new stuff is _much_ larger than the old stuff, it might as
well be considered a rewrite.

In particular, let's say that I used to have two files:

	a.c - small helper functions
	b.c - the "meat" of the thing

and I end up deciding that I might as well collapse it all into one file, 
a.c. What happens? There's almost no deletes from a.c, but there's a lot 
of new code in it. 

Wouldn't it be _better_ if you considered the new "a.c" a new file, so 
that you might notice that it's actually _closer_ to the old removed "b.c" 
than the old "a.c"?

See what I'm saying?

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-03  0:06       ` Linus Torvalds
@ 2005-06-03  0:47         ` Linus Torvalds
  2005-06-03  1:34           ` Adam Kropelin
  0 siblings, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2005-06-03  0:47 UTC (permalink / raw)
  To: Adam Kropelin; +Cc: Git Mailing List



On Thu, 2 Jun 2005, Linus Torvalds wrote:
>
> Yeah, I'll try to clarify.

Adam, do you find the current version a bit more clear on this?

		Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] Fix -B "very-different" logic.
  2005-06-03  0:21         ` Linus Torvalds
@ 2005-06-03  1:33           ` Junio C Hamano
  2005-06-03  8:32             ` [PATCH 0/4] " Junio C Hamano
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2005-06-03  1:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Careful. 

LT> I think the amount of new code _should_ matter. Otherwise, an old empty
LT> file would always be considered the source of a new file, since the diff 
LT> doesn't remove anything. Similarly, just because we have a boilerplate 
LT> file shouldn't make that always be considered a "wonderful source", when 
LT> people add the real meat to it.

Yes, I agree that rename/copy logic should use different
heuristics from the one I proposed for breaking.

It is my assumption that people in practice tend to make only
small edits after a rename/copy just to adjust things like:

 - filenames mentioned in the comment of the file itself,

 - include paths that refer other files if the file was
   moved/copied from a different directory,

 - names of functions and variables.

and making sure there would not be too much new stuff is quite
useful to detect rename/copy source correctly as the current
similarity estimator in diffcore-rename does.  I do not intend
to touch that.

The boilderplate example you mention is a very good reason not
to dismiss the amount of new material when doing rename/copy
detection.

LT> In particular, let's say that I used to have two files:

LT> 	a.c - small helper functions
LT> 	b.c - the "meat" of the thing

LT> and I end up deciding that I might as well collapse it all into one file, 
LT> a.c. What happens? There's almost no deletes from a.c, but there's a lot 
LT> of new code in it. 

LT> See what I'm saying?

Yes.  I think I do.

When git-diff-tree -B -C runs your example, it feeds diffcore
with these:

  :100644 100644 sha1-a-helper-only sha1-a-and-meat M   a.c
  :100644 000000 sha1-b-stale-meat  0{40}           D   b.c

The ideal diffcore-break breaks a.c because it looks at
insertions as well:

  :100644 000000 sha1-a-helper-only 0{40}           D   a.c
  :000000 100644 0{40}              sha1-a-and-meat N   a.c
  :100644 000000 sha1-b-stale-meat  0{40}           D   b.c

Then diffcore-rename notices that sha1-b-stale-meat is better
match than sha1-a-helper-only to produce sha1-a-and-meat, and
resolves the above to:

  :100644 100644 sha1-b-stale-meat  sha1-a-and-meat R   b.c	a.c

Up to this point is just a demonstration that I see your point.

But I still want to keep the example I gave in the original
commit message.  Suppose you did not have b.c file under version
control, and did the same operation.  I.e. a.c acquired a lot of
good stuff.  git-diff-tree -B -C feeds:

  :100644 100644 sha1-a-helper-only sha1-a-and-meat M   a.c

which is broken into:

  :100644 000000 sha1-a-helper-only 0{40}           D   a.c
  :000000 100644 0{40}              sha1-a-and-meat N   a.c

Unfortunately, in this case nobody absorbs these pairs.  I want
to allow you to add 1000 lines of new stuff to a file (which was
originally 100 lines long) as long as you do not remove too many
lines from the original 100 lines without triggering "this is a
rewrite" logic in this case.  So after rename/copy runs, we need
to match these up and merge them back into the original.

  :100644 100644 sha1-a-helper-only sha1-a-and-meat M   a.c

We should carry a bit more information about broken entries than
we currently do.  We would break a pair based on both deletion
and insertion, just like the current code (i.e. without the
patch you are responding to) does.  But when we do break a pair,
we need to mark them if the "new" side have enough original
source material remaining.  If we have such mark to tell us that
"these were broken but there are a good chunk of source material
remaining", the clean-up phase, to run after diffcore-rename
finishes, should be able to notice surviving broken pairs and
merge them back accordingly.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-03  0:47         ` Linus Torvalds
@ 2005-06-03  1:34           ` Adam Kropelin
  0 siblings, 0 replies; 64+ messages in thread
From: Adam Kropelin @ 2005-06-03  1:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds wrote:
> On Thu, 2 Jun 2005, Linus Torvalds wrote:
>>
>> Yeah, I'll try to clarify.
>
> Adam, do you find the current version a bit more clear on this?

Absolutely. I especially like the new digression explaining that 
the --cached flag controls where file _content_ is fetched from and 
reinforcing that the index file always governs which files are involved 
in the diff.

Thanks!

--Adam

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 0/4] Fix -B "very-different" logic.
  2005-06-03  1:33           ` Junio C Hamano
@ 2005-06-03  8:32             ` Junio C Hamano
  2005-06-03  8:36               ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano
                                 ` (3 more replies)
  0 siblings, 4 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-03  8:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

I am sending the following four patch series:

        [PATCH 1/4] Tweak count-delta interface
        [PATCH 2/4] diff: Fix docs and add -O to diff-helper.
        [PATCH 3/4] diff: Clean up diff_scoreopt_parse().
        [PATCH 4/4] diff: Update -B heuristics.

The first three are preparations and cleanups I found necessary
while I was working on the last one, which is the gem of this
series.  It addresses the concerns you raised in your message
"Careful." while keeping the semantics I wanted to have "if you
keep 97 lines out of original 100-line document, it does not
matter if the end result is a 110-line or 1000-line document.
You did not do a rewrite."

You may have to remove the warning about git-status with this
change, though.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/4] Tweak count-delta interface
  2005-06-03  8:32             ` [PATCH 0/4] " Junio C Hamano
@ 2005-06-03  8:36               ` Junio C Hamano
  2005-06-03  8:36               ` [PATCH 2/4] diff: Fix docs and add -O to diff-helper Junio C Hamano
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-03  8:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Make it return copied source and insertion separately, so that
later implementation of heuristics can use them more flexibly.

This does not change the heuristics implemented in
diffcore-rename nor diffcore-break in any way.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 count-delta.h     |    3 ++-
 diffcore.h        |    2 --
 count-delta.c     |   30 ++++++++++++++++--------------
 diffcore-break.c  |   15 +++++++++++----
 diffcore-rename.c |   15 +++++++++++----
 5 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/count-delta.h b/count-delta.h
--- a/count-delta.h
+++ b/count-delta.h
@@ -4,6 +4,7 @@
 #ifndef COUNT_DELTA_H
 #define COUNT_DELTA_H
 
-unsigned long count_delta(void *, unsigned long);
+int count_delta(void *, unsigned long,
+		unsigned long *src_copied, unsigned long *literal_added);
 
 #endif
diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -12,8 +12,6 @@
 #define DEFAULT_RENAME_SCORE 30000 /* rename/copy similarity minimum (50%) */
 #define DEFAULT_BREAK_SCORE  59400 /* minimum for break to happen (99%)*/
 
-#define RENAME_DST_MATCHED 01
-
 struct diff_filespec {
 	unsigned char sha1[20];
 	char *path;
diff --git a/count-delta.c b/count-delta.c
--- a/count-delta.c
+++ b/count-delta.c
@@ -29,15 +29,18 @@ static unsigned long get_hdr_size(const 
 /*
  * NOTE.  We do not _interpret_ delta fully.  As an approximation, we
  * just count the number of bytes that are copied from the source, and
- * the number of literal data bytes that are inserted.  Number of
- * bytes that are _not_ copied from the source is deletion, and number
- * of inserted literal bytes are addition, so sum of them is what we
- * return.  xdelta can express an edit that copies data inside of the
- * destination which originally came from the source.  We do not count
- * that in the following routine, so we are undercounting the source
- * material that remains in the final output that way.
+ * the number of literal data bytes that are inserted.
+ *
+ * Number of bytes that are _not_ copied from the source is deletion,
+ * and number of inserted literal bytes are addition, so sum of them
+ * is the extent of damage.  xdelta can express an edit that copies
+ * data inside of the destination which originally came from the
+ * source.  We do not count that in the following routine, so we are
+ * undercounting the source material that remains in the final output
+ * that way.
  */
-unsigned long count_delta(void *delta_buf, unsigned long delta_size)
+int count_delta(void *delta_buf, unsigned long delta_size,
+		unsigned long *src_copied, unsigned long *literal_added)
 {
 	unsigned long copied_from_source, added_literal;
 	const unsigned char *data, *top;
@@ -46,7 +49,7 @@ unsigned long count_delta(void *delta_bu
 
 	/* the smallest delta size possible is 6 bytes */
 	if (delta_size < 6)
-		return UINT_MAX;
+		return -1;
 
 	data = delta_buf;
 	top = delta_buf + delta_size;
@@ -83,13 +86,12 @@ unsigned long count_delta(void *delta_bu
 
 	/* sanity check */
 	if (data != top || out != dst_size)
-		return UINT_MAX;
+		return -1;
 
 	/* delete size is what was _not_ copied from source.
 	 * edit size is that and literal additions.
 	 */
-	if (src_size + added_literal < copied_from_source)
-		/* we ended up overcounting and underflowed */
-		return 0;
-	return (src_size - copied_from_source) + added_literal;
+	*src_copied = copied_from_source;
+	*literal_added = added_literal;
+	return 0;
 }
diff --git a/diffcore-break.c b/diffcore-break.c
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -23,7 +23,7 @@ static int very_different(struct diff_fi
 	 * want to get the filepair broken.
 	 */
 	void *delta;
-	unsigned long delta_size, base_size;
+	unsigned long delta_size, base_size, src_copied, literal_added;
 
 	if (!S_ISREG(src->mode) || !S_ISREG(dst->mode))
 		return 0; /* leave symlink rename alone */
@@ -61,10 +61,17 @@ static int very_different(struct diff_fi
 		return MAX_SCORE;
 
 	/* Estimate the edit size by interpreting delta. */
-	delta_size = count_delta(delta, delta_size);
+	if (count_delta(delta, delta_size, &src_copied, &literal_added)) {
+		free(delta);
+		return 0;
+	}
 	free(delta);
-	if (delta_size == UINT_MAX)
-		return 0; /* error in delta computation */
+
+	/* Extent of damage */
+	if (src->size + literal_added < src_copied)
+		delta_size = 0;
+	else
+		delta_size = (src->size - src_copied) + literal_added;
 
 	if (base_size < delta_size)
 		return MAX_SCORE;
diff --git a/diffcore-rename.c b/diffcore-rename.c
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -135,7 +135,7 @@ static int estimate_similarity(struct di
 	 * call into this function in that case.
 	 */
 	void *delta;
-	unsigned long delta_size, base_size;
+	unsigned long delta_size, base_size, src_copied, literal_added;
 	int score;
 
 	/* We deal only with regular files.  Symlink renames are handled
@@ -174,10 +174,17 @@ static int estimate_similarity(struct di
 		return 0;
 
 	/* Estimate the edit size by interpreting delta. */
-	delta_size = count_delta(delta, delta_size);
-	free(delta);
-	if (delta_size == UINT_MAX)
+	if (count_delta(delta, delta_size, &src_copied, &literal_added)) {
+		free(delta);
 		return 0;
+	}
+	free(delta);
+
+	/* Extent of damage */
+	if (src->size + literal_added < src_copied)
+		delta_size = 0;
+	else
+		delta_size = (src->size - src_copied) + literal_added;
 
 	/*
 	 * Now we will give some score to it.  100% edit gets 0 points
------------


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/4] diff: Fix docs and add -O to diff-helper.
  2005-06-03  8:32             ` [PATCH 0/4] " Junio C Hamano
  2005-06-03  8:36               ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano
@ 2005-06-03  8:36               ` Junio C Hamano
  2005-06-03  8:37               ` [PATCH 3/4] diff: Clean up diff_scoreopt_parse() Junio C Hamano
  2005-06-03  8:40               ` [PATCH 4/4] diff: Update -B heuristics Junio C Hamano
  3 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-03  8:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

This patch updates diff documentation and usage strings:

 - clarify the semantics of -R.  It is not "output in reverse";
   rather, it is "I will feed diff backwards".  Semantically
   they are different when -C is involved.

 - describe -O in usage strings of diff-* brothers.  It was
   implemented, documented but not described in usage text.

Also it adds -O to diff-helper.  Like -S (and unlike -M/-C/-B),
this option can work on sanitized diff-raw output produced by
the diff-* brothers.  While we are at it, the call it makes to
diffcore is cleaned up to use the diffcore_std() like everybody
else, and the declaration for the low level diffcore routines
are moved from diff.h (public) to diffcore.h (private between
diff.c and diffcore backends).

Signed-off-by: Junio C Hamano <junkio@cox.net> 
---

 Documentation/git-diff-cache.txt  |    3 ++-
 Documentation/git-diff-files.txt  |    3 ++-
 Documentation/git-diff-helper.txt |    5 ++++-
 Documentation/git-diff-tree.txt   |    2 +-
 diff.h                            |   10 +---------
 diffcore.h                        |    6 ++++++
 diff-cache.c                      |    2 +-
 diff-files.c                      |    2 +-
 diff-helper.c                     |   25 ++++++++++++++-----------
 diff-tree.c                       |    2 +-
 10 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/Documentation/git-diff-cache.txt b/Documentation/git-diff-cache.txt
--- a/Documentation/git-diff-cache.txt
+++ b/Documentation/git-diff-cache.txt
@@ -57,7 +57,8 @@ OPTIONS
 	<orderfile>, which has one shell glob pattern per line.
 
 -R::
-	Output diff in reverse.
+	Swap two inputs; that is, show differences from cache or
+	on-disk file to tree contents.
 
 --cached::
 	do not consider the on-disk file at all
diff --git a/Documentation/git-diff-files.txt b/Documentation/git-diff-files.txt
--- a/Documentation/git-diff-files.txt
+++ b/Documentation/git-diff-files.txt
@@ -27,7 +27,8 @@ OPTIONS
 	Remain silent even on nonexisting files
 
 -R::
-	Output diff in reverse.
+	Swap two inputs; that is, show differences from on-disk files
+	to cache contents.
 
 -B::
 	Break complete rewrite changes into pairs of delete and create.
diff --git a/Documentation/git-diff-helper.txt b/Documentation/git-diff-helper.txt
--- a/Documentation/git-diff-helper.txt
+++ b/Documentation/git-diff-helper.txt
@@ -9,7 +9,7 @@ git-diff-helper - Generates patch format
 
 SYNOPSIS
 --------
-'git-diff-helper' [-z] [-S<string>]
+'git-diff-helper' [-z] [-S<string>] [-O<orderfile>]
 
 DESCRIPTION
 -----------
@@ -24,6 +24,9 @@ OPTIONS
 -S<string>::
 	Look for differences that contains the change in <string>.
 
+-O<orderfile>::
+	Output the patch in the order specified in the
+	<orderfile>, which has one shell glob pattern per line.
 
 See Also
 --------
diff --git a/Documentation/git-diff-tree.txt b/Documentation/git-diff-tree.txt
--- a/Documentation/git-diff-tree.txt
+++ b/Documentation/git-diff-tree.txt
@@ -43,7 +43,7 @@ OPTIONS
 	Detect copies as well as renames.
 
 -R::
-	Output diff in reverse.
+	Swap two input trees.
 
 -S<string>::
 	Look for differences that contains the change in <string>.
diff --git a/diff.h b/diff.h
--- a/diff.h
+++ b/diff.h
@@ -35,21 +35,13 @@ extern int diff_scoreopt_parse(const cha
 #define DIFF_SETUP_REVERSE      	1
 #define DIFF_SETUP_USE_CACHE		2
 #define DIFF_SETUP_USE_SIZE_CACHE	4
+
 extern void diff_setup(int flags);
 
 #define DIFF_DETECT_RENAME	1
 #define DIFF_DETECT_COPY	2
 
-extern void diffcore_rename(int rename_copy, int minimum_score);
-
 #define DIFF_PICKAXE_ALL	1
-extern void diffcore_pickaxe(const char *needle, int opts);
-
-extern void diffcore_pathspec(const char **pathspec);
-
-extern void diffcore_order(const char *orderfile);
-
-extern void diffcore_break(int max_score);
 
 extern void diffcore_std(const char **paths,
 			 int detect_rename, int rename_score,
diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -73,6 +73,12 @@ extern struct diff_filepair *diff_queue(
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
+extern void diffcore_pathspec(const char **pathspec);
+extern void diffcore_break(int);
+extern void diffcore_rename(int rename_copy, int);
+extern void diffcore_pickaxe(const char *needle, int opts);
+extern void diffcore_order(const char *orderfile);
+
 #define DIFF_DEBUG 0
 #if DIFF_DEBUG
 void diff_debug_filespec(struct diff_filespec *, int, const char *);
diff --git a/diff-cache.c b/diff-cache.c
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -157,7 +157,7 @@ static void mark_merge_entries(void)
 }
 
 static char *diff_cache_usage =
-"git-diff-cache [-p] [-r] [-z] [-m] [-M] [-C] [-R] [-S<string>] [--cached] <tree-ish> [<path>...]";
+"git-diff-cache [-p] [-r] [-z] [-m] [-M] [-C] [-R] [-S<string>] [-O<orderfile>] [--cached] <tree-ish> [<path>...]";
 
 int main(int argc, const char **argv)
 {
diff --git a/diff-files.c b/diff-files.c
--- a/diff-files.c
+++ b/diff-files.c
@@ -7,7 +7,7 @@
 #include "diff.h"
 
 static const char *diff_files_usage =
-"git-diff-files [-p] [-q] [-r] [-z] [-M] [-C] [-R] [-S<string>] [paths...]";
+"git-diff-files [-p] [-q] [-r] [-z] [-M] [-C] [-R] [-S<string>] [-O<orderfile>] [paths...]";
 
 static int diff_output_format = DIFF_FORMAT_HUMAN;
 static int detect_rename = 0;
diff --git a/diff-helper.c b/diff-helper.c
--- a/diff-helper.c
+++ b/diff-helper.c
@@ -7,11 +7,22 @@
 
 static const char *pickaxe = NULL;
 static int pickaxe_opts = 0;
+static const char *orderfile = NULL;
 static int line_termination = '\n';
 static int inter_name_termination = '\t';
 
+static void flush_them(int ac, const char **av)
+{
+	diffcore_std(av + 1,
+		     0, 0, /* no renames */
+		     pickaxe, pickaxe_opts,
+		     -1, /* no breaks */
+		     orderfile);
+	diff_flush(DIFF_FORMAT_PATCH, 0);
+}
+
 static const char *diff_helper_usage =
-	"git-diff-helper [-z] [-S<string>] paths...";
+	"git-diff-helper [-z] [-S<string>] [-O<orderfile>] paths...";
 
 int main(int ac, const char **av) {
 	struct strbuf sb;
@@ -131,17 +142,9 @@ int main(int ac, const char **av) {
 					  new_path);
 			continue;
 		}
-		if (1 < ac)
-			diffcore_pathspec(av + 1);
-		if (pickaxe)
-			diffcore_pickaxe(pickaxe, pickaxe_opts);
-		diff_flush(DIFF_FORMAT_PATCH, 0);
+		flush_them(ac, av);
 		printf(garbage_flush_format, sb.buf);
 	}
-	if (1 < ac)
-		diffcore_pathspec(av + 1);
-	if (pickaxe)
-		diffcore_pickaxe(pickaxe, pickaxe_opts);
-	diff_flush(DIFF_FORMAT_PATCH, 0);
+	flush_them(ac, av);
 	return 0;
 }
diff --git a/diff-tree.c b/diff-tree.c
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -397,7 +397,7 @@ static int diff_tree_stdin(char *line)
 }
 
 static char *diff_tree_usage =
-"git-diff-tree [-p] [-r] [-z] [--stdin] [-M] [-C] [-R] [-S<string>] [-m] [-s] [-v] [-t] <tree-ish> <tree-ish>";
+"git-diff-tree [-p] [-r] [-z] [--stdin] [-M] [-C] [-R] [-S<string>] [-O<orderfile>] [-m] [-s] [-v] [-t] <tree-ish> <tree-ish>";
 
 int main(int argc, const char **argv)
 {
------------


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 3/4] diff: Clean up diff_scoreopt_parse().
  2005-06-03  8:32             ` [PATCH 0/4] " Junio C Hamano
  2005-06-03  8:36               ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano
  2005-06-03  8:36               ` [PATCH 2/4] diff: Fix docs and add -O to diff-helper Junio C Hamano
@ 2005-06-03  8:37               ` Junio C Hamano
  2005-06-03  8:40               ` [PATCH 4/4] diff: Update -B heuristics Junio C Hamano
  3 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-03  8:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

This cleans up diff_scoreopt_parse() function that is used to
parse the fractional notation -B, -C and -M option takes.  The
callers are modified to check for errors and complain.  Earlier
they silently ignored malformed input and falled back on the
default.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 diff-cache.c      |    9 ++++++---
 diff-files.c      |   15 +++++++++++----
 diff-tree.c       |    9 ++++++---
 diff.c            |   39 +++++++++++++++++++++++++++++++++++++++
 diffcore-rename.c |   18 ------------------
 5 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/diff-cache.c b/diff-cache.c
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -191,17 +191,20 @@ int main(int argc, const char **argv)
 			continue;
 		}
 		if (!strncmp(arg, "-B", 2)) {
-			diff_break_opt = diff_scoreopt_parse(arg);
+			if ((diff_break_opt = diff_scoreopt_parse(arg)) == -1)
+				usage(diff_cache_usage);
 			continue;
 		}
 		if (!strncmp(arg, "-M", 2)) {
 			detect_rename = DIFF_DETECT_RENAME;
-			diff_score_opt = diff_scoreopt_parse(arg);
+			if ((diff_score_opt = diff_scoreopt_parse(arg)) == -1)
+				usage(diff_cache_usage);
 			continue;
 		}
 		if (!strncmp(arg, "-C", 2)) {
 			detect_rename = DIFF_DETECT_COPY;
-			diff_score_opt = diff_scoreopt_parse(arg);
+			if ((diff_score_opt = diff_scoreopt_parse(arg)) == -1)
+				usage(diff_cache_usage);
 			continue;
 		}
 		if (!strcmp(arg, "-z")) {
diff --git a/diff-files.c b/diff-files.c
--- a/diff-files.c
+++ b/diff-files.c
@@ -61,14 +61,21 @@ int main(int argc, const char **argv)
 			orderfile = argv[1] + 2;
 		else if (!strcmp(argv[1], "--pickaxe-all"))
 			pickaxe_opts = DIFF_PICKAXE_ALL;
-		else if (!strncmp(argv[1], "-B", 2))
-			diff_break_opt = diff_scoreopt_parse(argv[1]);
+		else if (!strncmp(argv[1], "-B", 2)) {
+			if ((diff_break_opt =
+			     diff_scoreopt_parse(argv[1])) == -1)
+				usage(diff_files_usage);
+		}
 		else if (!strncmp(argv[1], "-M", 2)) {
-			diff_score_opt = diff_scoreopt_parse(argv[1]);
+			if ((diff_score_opt =
+			     diff_scoreopt_parse(argv[1])) == -1)
+				usage(diff_files_usage);
 			detect_rename = DIFF_DETECT_RENAME;
 		}
 		else if (!strncmp(argv[1], "-C", 2)) {
-			diff_score_opt = diff_scoreopt_parse(argv[1]);
+			if ((diff_score_opt =
+			     diff_scoreopt_parse(argv[1])) == -1)
+				usage(diff_files_usage);
 			detect_rename = DIFF_DETECT_COPY;
 		}
 		else
diff --git a/diff-tree.c b/diff-tree.c
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -459,16 +459,19 @@ int main(int argc, const char **argv)
 		}
 		if (!strncmp(arg, "-M", 2)) {
 			detect_rename = DIFF_DETECT_RENAME;
-			diff_score_opt = diff_scoreopt_parse(arg);
+			if ((diff_score_opt = diff_scoreopt_parse(arg)) == -1)
+				usage(diff_tree_usage);
 			continue;
 		}
 		if (!strncmp(arg, "-C", 2)) {
 			detect_rename = DIFF_DETECT_COPY;
-			diff_score_opt = diff_scoreopt_parse(arg);
+			if ((diff_score_opt = diff_scoreopt_parse(arg)) == -1)
+				usage(diff_tree_usage);
 			continue;
 		}
 		if (!strncmp(arg, "-B", 2)) {
-			diff_break_opt = diff_scoreopt_parse(arg);
+			if ((diff_break_opt = diff_scoreopt_parse(arg)) == -1)
+				usage(diff_tree_usage);
 			continue;
 		}
 		if (!strcmp(arg, "-z")) {
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -589,6 +589,45 @@ void diff_setup(int flags)
 	
 }
 
+static int parse_num(const char **cp_p)
+{
+	int num, scale, ch, cnt;
+	const char *cp = *cp_p;
+
+	cnt = num = 0;
+	scale = 1;
+	while ('0' <= (ch = *cp) && ch <= '9') {
+		if (cnt++ < 5) {
+			/* We simply ignore more than 5 digits precision. */
+			scale *= 10;
+			num = num * 10 + ch - '0';
+		}
+		*cp++;
+	}
+	*cp_p = cp;
+
+	/* user says num divided by scale and we say internally that
+	 * is MAX_SCORE * num / scale.
+	 */
+	return (MAX_SCORE * num / scale);
+}
+
+int diff_scoreopt_parse(const char *opt)
+{
+	int opt1, cmd;
+
+	if (*opt++ != '-')
+		return -1;
+	cmd = *opt++;
+	if (cmd != 'M' && cmd != 'C' && cmd != 'B')
+		return -1; /* that is not a -M, -C nor -B option */
+
+	opt1 = parse_num(&opt);
+	if (*opt != 0)
+		return -1;
+	return opt1;
+}
+
 struct diff_queue_struct diff_queued_diff;
 
 void diff_q(struct diff_queue_struct *queue, struct diff_filepair *dp)
diff --git a/diffcore-rename.c b/diffcore-rename.c
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -229,24 +229,6 @@ static int score_compare(const void *a_,
 	return b->score - a->score;
 }
 
-int diff_scoreopt_parse(const char *opt)
-{
-	int diglen, num, scale, i;
-	if (opt[0] != '-' || (opt[1] != 'M' && opt[1] != 'C' && opt[1] != 'B'))
-		return -1; /* that is not a -M, -C nor -B option */
-	diglen = strspn(opt+2, "0123456789");
-	if (diglen == 0 || strlen(opt+2) != diglen)
-		return 0; /* use default */
-	sscanf(opt+2, "%d", &num);
-	for (i = 0, scale = 1; i < diglen; i++)
-		scale *= 10;
-
-	/* user says num divided by scale and we say internally that
-	 * is MAX_SCORE * num / scale.
-	 */
-	return MAX_SCORE * num / scale;
-}
-
 void diffcore_rename(int detect_rename, int minimum_score)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
------------


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 4/4] diff: Update -B heuristics.
  2005-06-03  8:32             ` [PATCH 0/4] " Junio C Hamano
                                 ` (2 preceding siblings ...)
  2005-06-03  8:37               ` [PATCH 3/4] diff: Clean up diff_scoreopt_parse() Junio C Hamano
@ 2005-06-03  8:40               ` Junio C Hamano
  3 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2005-06-03  8:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

As Linus pointed out on the mailing list discussion, -B should
break a files that has many inserts even if it still keeps
enough of the original contents, so that the broken pieces can
later be matched with other files by -M or -C.  However, if such
a broken pair does not get picked up by -M or -C, we would want
to apply different criteria; namely, regardless of the amount of
new material in the result, the determination of "rewrite"
should be done by looking at the amount of original material
still left in the result.  If you still have the original 97
lines from a 100-line document, it does not matter if you add
your own 13 lines to make a 110-line document, or if you add 903
lines to make a 1000-line document.  It is not a rewrite but an
in-place edit.  On the other hand, if you did lose 97 lines from
the original, it does not matter if you added 27 lines to make a
30-line document or if you added 997 lines to make a 1000-line
document.  You did a complete rewrite in either case.

This patch introduces a post-processing phase that runs after
diffcore-rename matches up broken pairs diffcore-break creates.
The purpose of this post-processing is to pick up these broken
pieces and merge them back into in-place modifications.  For
this, the score parameter -B option takes is changed into a pair
of numbers, and it takes "-B99/80" format when fully spelled
out.  The first number is the minimum amount of "edit" (same
definition as what diffcore-rename uses, which is "sum of
deletion and insertion") that a modification needs to have to be
broken, and the second number is the minimum amount of "delete"
a surviving broken pair must have to avoid being merged back
together.  It can be abbreviated to "-B" to use default for
both, "-B9" or "-B9/" to use 90% for "edit" but default (80%)
for merge avoidance, or "-B/75" to use default (99%) "edit" and
75% for merge avoidance.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 diffcore.h       |   11 ++
 diff.c           |   18 ++++
 diffcore-break.c |  240 +++++++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 225 insertions(+), 44 deletions(-)

diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -8,9 +8,19 @@
  * (e.g. diffcore-rename, diffcore-pickaxe).  Never include this header
  * in anything else.
  */
+
+/* We internally use unsigned short as the score value,
+ * and rely on an int capable to hold 32-bits.  -B can take
+ * -Bmerge_score/break_score format and the two scores are
+ * passed around in one int (high 16-bit for merge and low 16-bit
+ * for break).
+ */
 #define MAX_SCORE 60000
 #define DEFAULT_RENAME_SCORE 30000 /* rename/copy similarity minimum (50%) */
 #define DEFAULT_BREAK_SCORE  59400 /* minimum for break to happen (99%)*/
+#define DEFAULT_MERGE_SCORE  48000 /* maximum for break-merge to happen (80%)*/
+
+#define MINIMUM_BREAK_SIZE     400 /* do not break a file smaller than this */
 
 struct diff_filespec {
 	unsigned char sha1[20];
@@ -76,6 +86,7 @@ extern void diff_q(struct diff_queue_str
 extern void diffcore_pathspec(const char **pathspec);
 extern void diffcore_break(int);
 extern void diffcore_rename(int rename_copy, int);
+extern void diffcore_merge_broken(void);
 extern void diffcore_pickaxe(const char *needle, int opts);
 extern void diffcore_order(const char *orderfile);
 
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -614,7 +614,7 @@ static int parse_num(const char **cp_p)
 
 int diff_scoreopt_parse(const char *opt)
 {
-	int opt1, cmd;
+	int opt1, opt2, cmd;
 
 	if (*opt++ != '-')
 		return -1;
@@ -623,9 +623,21 @@ int diff_scoreopt_parse(const char *opt)
 		return -1; /* that is not a -M, -C nor -B option */
 
 	opt1 = parse_num(&opt);
+	if (cmd != 'B')
+		opt2 = 0;
+	else {
+		if (*opt == 0)
+			opt2 = 0;
+		else if (*opt != '/')
+			return -1; /* we expect -B80/99 or -B80 */
+		else {
+			opt++;
+			opt2 = parse_num(&opt);
+		}
+	}
 	if (*opt != 0)
 		return -1;
-	return opt1;
+	return opt1 | (opt2 << 16);
 }
 
 struct diff_queue_struct diff_queued_diff;
@@ -955,6 +967,8 @@ void diffcore_std(const char **paths,
 		diffcore_break(break_opt);
 	if (detect_rename)
 		diffcore_rename(detect_rename, rename_score);
+	if (0 <= break_opt)
+		diffcore_merge_broken();
 	if (pickaxe)
 		diffcore_pickaxe(pickaxe, pickaxe_opts);
 	if (orderfile)
diff --git a/diffcore-break.c b/diffcore-break.c
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -7,28 +7,58 @@
 #include "delta.h"
 #include "count-delta.h"
 
-static int very_different(struct diff_filespec *src,
-			  struct diff_filespec *dst,
-			  int min_score)
+static int should_break(struct diff_filespec *src,
+			struct diff_filespec *dst,
+			int break_score,
+			int *merge_score_p)
 {
 	/* dst is recorded as a modification of src.  Are they so
 	 * different that we are better off recording this as a pair
-	 * of delete and create?  min_score is the minimum amount of
-	 * new material that must exist in the dst and not in src for
-	 * the pair to be considered a complete rewrite, and recommended
-	 * to be set to a very high value, 99% or so.
-	 *
-	 * The value we return represents the amount of new material
-	 * that is in dst and not in src.  We return 0 when we do not
-	 * want to get the filepair broken.
+	 * of delete and create?
+	 *
+	 * There are two criteria used in this algorithm.  For the
+	 * purposes of helping later rename/copy, we take both delete
+	 * and insert into account and estimate the amount of "edit".
+	 * If the edit is very large, we break this pair so that
+	 * rename/copy can pick the pieces up to match with other
+	 * files.
+	 *
+	 * On the other hand, we would want to ignore inserts for the
+	 * pure "complete rewrite" detection.  As long as most of the
+	 * existing contents were removed from the file, it is a
+	 * complete rewrite, and if sizable chunk from the original
+	 * still remains in the result, it is not a rewrite.  It does
+	 * not matter how much or how little new material is added to
+	 * the file.
+	 *
+	 * The score we leave for such a broken filepair uses the
+	 * latter definition so that later clean-up stage can find the
+	 * pieces that should not have been broken according to the
+	 * latter definition after rename/copy runs, and merge the
+	 * broken pair that have a score lower than given criteria
+	 * back together.  The break operation itself happens
+	 * according to the former definition.
+	 *
+	 * The minimum_edit parameter tells us when to break (the
+	 * amount of "edit" required for us to consider breaking the
+	 * pair).  We leave the amount of deletion in *merge_score_p
+	 * when we return.
+	 *
+	 * The value we return is 1 if we want the pair to be broken,
+	 * or 0 if we do not.
 	 */
 	void *delta;
 	unsigned long delta_size, base_size, src_copied, literal_added;
+	int to_break = 0;
+
+	*merge_score_p = 0; /* assume no deletion --- "do not break"
+			     * is the default.
+			     */
 
 	if (!S_ISREG(src->mode) || !S_ISREG(dst->mode))
 		return 0; /* leave symlink rename alone */
 
-	if (diff_populate_filespec(src, 1) || diff_populate_filespec(dst, 1))
+	if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
 		return 0; /* error but caught downstream */
 
 	delta_size = ((src->size < dst->size) ?
@@ -40,53 +70,95 @@ static int very_different(struct diff_fi
 	 */
 	base_size = ((src->size < dst->size) ? dst->size : src->size);
 
-	/*
-	 * If file size difference is too big compared to the
-	 * base_size, we declare this a complete rewrite.
-	 */
-	if (base_size * min_score < delta_size * MAX_SCORE)
-		return MAX_SCORE;
-
-	if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
-		return 0; /* error but caught downstream */
-
 	delta = diff_delta(src->data, src->size,
 			   dst->data, dst->size,
 			   &delta_size);
 
-	/* A delta that has a lot of literal additions would have
-	 * big delta_size no matter what else it does.
-	 */
-	if (base_size * min_score < delta_size * MAX_SCORE)
-		return MAX_SCORE;
-
 	/* Estimate the edit size by interpreting delta. */
-	if (count_delta(delta, delta_size, &src_copied, &literal_added)) {
+	if (count_delta(delta, delta_size,
+			&src_copied, &literal_added)) {
 		free(delta);
-		return 0;
+		return 0; /* we cannot tell */
 	}
 	free(delta);
 
-	/* Extent of damage */
-	if (src->size + literal_added < src_copied)
-		delta_size = 0;
+	/* Compute merge-score, which is "how much is removed
+	 * from the source material".  The clean-up stage will
+	 * merge the surviving pair together if the score is
+	 * less than the minimum, after rename/copy runs.
+	 */
+	if (src->size <= src_copied)
+		delta_size = 0; /* avoid wrapping around */
+	else
+		delta_size = src->size - src_copied;
+	*merge_score_p = delta_size * MAX_SCORE / src->size;
+	
+	/* Extent of damage, which counts both inserts and
+	 * deletes.
+	 */
+	if (src->size + literal_added <= src_copied)
+		delta_size = 0; /* avoid wrapping around */
 	else
 		delta_size = (src->size - src_copied) + literal_added;
+	
+	/* We break if the edit exceeds the minimum.
+	 * i.e. (break_score / MAX_SCORE < delta_size / base_size)
+	 */
+	if (break_score * base_size < delta_size * MAX_SCORE)
+		to_break = 1;
 
-	if (base_size < delta_size)
-		return MAX_SCORE;
-
-	return delta_size * MAX_SCORE / base_size; 
+	return to_break;
 }
 
-void diffcore_break(int min_score)
+void diffcore_break(int break_score)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	struct diff_queue_struct outq;
+
+	/* When the filepair has this much edit (insert and delete),
+	 * it is first considered to be a rewrite and broken into a
+	 * create and delete filepair.  This is to help breaking a
+	 * file that had too much new stuff added, possibly from
+	 * moving contents from another file, so that rename/copy can
+	 * match it with the other file.
+	 *
+	 * int break_score; we reuse incoming parameter for this.
+	 */
+
+	/* After a pair is broken according to break_score and
+	 * subjected to rename/copy, both of them may survive intact,
+	 * due to lack of suitable rename/copy peer.  Or, the caller
+	 * may be calling us without using rename/copy.  When that
+	 * happens, we merge the broken pieces back into one
+	 * modification together if the pair did not have more than
+	 * this much delete.  For this computation, we do not take
+	 * insert into account at all.  If you start from a 100-line
+	 * file and delete 97 lines of it, it does not matter if you
+	 * add 27 lines to it to make a new 30-line file or if you add
+	 * 997 lines to it to make a 1000-line file.  Either way what
+	 * you did was a rewrite of 97%.  On the other hand, if you
+	 * delete 3 lines, keeping 97 lines intact, it does not matter
+	 * if you add 3 lines to it to make a new 100-line file or if
+	 * you add 903 lines to it to make a new 1000-line file.
+	 * Either way you did a lot of additions and not a rewrite.
+	 * This merge happens to catch the latter case.  A merge_score
+	 * of 80% would be a good default value (a broken pair that
+	 * has score lower than merge_score will be merged back
+	 * together).
+	 */
+	int merge_score;
 	int i;
 
-	if (!min_score)
-		min_score = DEFAULT_BREAK_SCORE;
+	/* See comment on DEFAULT_BREAK_SCORE and
+	 * DEFAULT_MERGE_SCORE in diffcore.h
+	 */
+	merge_score = (break_score >> 16) & 0xFFFF;
+	break_score = (break_score & 0xFFFF);
+
+	if (!break_score)
+		break_score = DEFAULT_BREAK_SCORE;
+	if (!merge_score)
+		merge_score = DEFAULT_MERGE_SCORE;
 
 	outq.nr = outq.alloc = 0;
 	outq.queue = NULL;
@@ -101,12 +173,22 @@ void diffcore_break(int min_score)
 		if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two) &&
 		    !S_ISDIR(p->one->mode) && !S_ISDIR(p->two->mode) &&
 		    !strcmp(p->one->path, p->two->path)) {
-			score = very_different(p->one, p->two, min_score);
-			if (min_score <= score) {
+			if (should_break(p->one, p->two,
+					 break_score, &score)) {
 				/* Split this into delete and create */
 				struct diff_filespec *null_one, *null_two;
 				struct diff_filepair *dp;
 
+				/* Set score to 0 for the pair that
+				 * needs to be merged back together
+				 * should they survive rename/copy.
+				 * Also we do not want to break very
+				 * small files.
+				 */
+				if ((score < merge_score) ||
+				    (p->one->size < MINIMUM_BREAK_SIZE))
+					score = 0;
+
 				/* deletion of one */
 				null_one = alloc_filespec(p->one->path);
 				dp = diff_queue(&outq, p->one, null_one);
@@ -132,3 +214,77 @@ void diffcore_break(int min_score)
 
 	return;
 }
+
+static void merge_broken(struct diff_filepair *p,
+			 struct diff_filepair *pp,
+			 struct diff_queue_struct *outq)
+{
+	/* p and pp are broken pairs we want to merge */
+	struct diff_filepair *c = p, *d = pp;
+	if (DIFF_FILE_VALID(p->one)) {
+		/* this must be a delete half */
+		d = p; c = pp;
+	}
+	/* Sanity check */
+	if (!DIFF_FILE_VALID(d->one))
+		die("internal error in merge #1");
+	if (DIFF_FILE_VALID(d->two))
+		die("internal error in merge #2");
+	if (DIFF_FILE_VALID(c->one))
+		die("internal error in merge #3");
+	if (!DIFF_FILE_VALID(c->two))
+		die("internal error in merge #4");
+
+	diff_queue(outq, d->one, c->two);
+	diff_free_filespec_data(d->two);
+	diff_free_filespec_data(c->one);
+	free(d);
+	free(c);
+}
+
+void diffcore_merge_broken(void)
+{
+	struct diff_queue_struct *q = &diff_queued_diff;
+	struct diff_queue_struct outq;
+	int i, j;
+
+	outq.nr = outq.alloc = 0;
+	outq.queue = NULL;
+
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		if (!p)
+			/* we already merged this with its peer */
+			continue;
+		else if (p->broken_pair &&
+			 p->score == 0 &&
+			 !strcmp(p->one->path, p->two->path)) {
+			/* If the peer also survived rename/copy, then
+			 * we merge them back together.
+			 */
+			for (j = i + 1; j < q->nr; j++) {
+				struct diff_filepair *pp = q->queue[j];
+				if (pp->broken_pair &&
+				    p->score == 0 &&
+				    !strcmp(pp->one->path, pp->two->path) &&
+				    !strcmp(p->one->path, pp->two->path)) {
+					/* Peer survived.  Merge them */
+					merge_broken(p, pp, &outq);
+					q->queue[j] = NULL;
+					break;
+				}
+			}
+			if (q->nr <= j)
+				/* The peer did not survive, so we keep
+				 * it in the output.
+				 */
+				diff_q(&outq, p);
+		}
+		else
+			diff_q(&outq, p);
+	}
+	free(q->queue);
+	*q = outq;
+
+	return;
+}
------------


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-01 22:00     ` Daniel Barkalow
  2005-06-01 23:05       ` Junio C Hamano
@ 2005-06-03  9:47       ` Petr Baudis
  2005-06-03 15:09         ` Daniel Barkalow
  1 sibling, 1 reply; 64+ messages in thread
From: Petr Baudis @ 2005-06-03  9:47 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, Eric W. Biederman, Git Mailing List

Dear diary, on Thu, Jun 02, 2005 at 12:00:55AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> It shouldn't be hard to do one, except that locking with rsync is going to
> be a pain. I had a patch to make it work with the rpush/rpull pair, but I
> didn't get its dependancies in at the time.

Was that the patch I was replying to recently? It didn't seem to have
any dependencies.

> I can dust those patches off again if you want that functionality included.
> 
> The patches are essentially:
> 
>  - make the transport protocol handle things other than objects
>  - library procedure for locking atomic update of refs files
>  - fetching refs in general
>  - rpull/rpush that updates a specified ref file atomically
> 
> At least the first would be very nice to get in before 1.0, since it is an
> incompatible change to the protocol.

I would like to have this a lot too. Pulling tags now is a PITA, and I
definitively want to go in this way. So it will land at least in git-pb.
:-) (But that's a little troublesome if you say it's incompatible
change.)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: I want to release a "git-1.0"
  2005-06-03  9:47       ` Petr Baudis
@ 2005-06-03 15:09         ` Daniel Barkalow
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Barkalow @ 2005-06-03 15:09 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, Eric W. Biederman, Git Mailing List

On Fri, 3 Jun 2005, Petr Baudis wrote:

> Dear diary, on Thu, Jun 02, 2005 at 12:00:55AM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > It shouldn't be hard to do one, except that locking with rsync is going to
> > be a pain. I had a patch to make it work with the rpush/rpull pair, but I
> > didn't get its dependancies in at the time.
> 
> Was that the patch I was replying to recently? It didn't seem to have
> any dependencies.

The rpush/rpull changes were at the end of a series that you were replying
to the beginning of.

> > I can dust those patches off again if you want that functionality included.
> > 
> > The patches are essentially:
> > 
> >  - make the transport protocol handle things other than objects
> >  - library procedure for locking atomic update of refs files
> >  - fetching refs in general
> >  - rpull/rpush that updates a specified ref file atomically
> > 
> > At least the first would be very nice to get in before 1.0, since it is an
> > incompatible change to the protocol.
> 
> I would like to have this a lot too. Pulling tags now is a PITA, and I
> definitively want to go in this way. So it will land at least in git-pb.
> :-) (But that's a little troublesome if you say it's incompatible
> change.)

The ssh-based protocol has to change, because the current version doesn't
have any way of being extended. The first patch in the new set makes the
incompatible change without adding anything new (so as to be as
uncontroversial as possible), and now also adds a version number so that
future additions should be less of a big deal. The rest of the series will
add the transfer of refs to the transfer mechanism and the protocol.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2005-06-03 15:07 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
2005-05-30 20:33 ` jeff millar
2005-05-30 20:49 ` Nicolas Pitre
2005-06-01  6:52   ` Junio C Hamano
2005-06-01  8:24     ` [PATCH] Add -d flag to git-pull-* family Junio C Hamano
2005-06-01 14:39       ` Nicolas Pitre
2005-06-01 16:00         ` Junio C Hamano
     [not found]           ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net>
2005-06-02  0:47             ` [PATCH] Handle deltified object correctly in git-*-pull family Nicolas Pitre
     [not found]             ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net>
2005-06-02  0:51               ` [PATCH] Stop inflating the whole SHA1 file only to check size Nicolas Pitre
2005-06-02  1:32                 ` Junio C Hamano
2005-06-02  0:58             ` [PATCH] Handle deltified object correctly in git-*-pull family Linus Torvalds
2005-06-02  1:43               ` Junio C Hamano
2005-05-30 20:59 ` I want to release a "git-1.0" Junio C Hamano
2005-05-30 21:07 ` Junio C Hamano
2005-05-30 22:11 ` David Greaves
2005-05-30 22:12 ` Dave Jones
2005-05-30 22:55   ` Dmitry Torokhov
2005-05-30 23:15     ` Junio C Hamano
2005-05-30 23:23     ` Dmitry Torokhov
2005-05-31  0:52   ` Linus Torvalds
2005-05-30 22:19 ` Ryan Anderson
2005-05-31  0:58   ` Linus Torvalds
2005-05-30 22:32 ` Chris Wedgwood
2005-05-30 23:56   ` Chris Wedgwood
2005-05-31  1:06   ` Linus Torvalds
2005-06-01  2:11     ` Junio C Hamano
2005-06-01  2:25       ` David Lang
2005-06-01  4:53         ` Junio C Hamano
2005-06-01 20:06           ` David Lang
2005-06-01 20:16             ` C. Scott Ananian
2005-06-02  0:43               ` Nicolas Pitre
2005-06-02  1:14                 ` Brian O'Mahoney
2005-06-01 23:03             ` Junio C Hamano
2005-05-31  0:19 ` Petr Baudis
2005-05-31 13:45 ` Eric W. Biederman
2005-06-01  3:04   ` Linus Torvalds
2005-06-01  4:06     ` Junio C Hamano
2005-06-02 23:54       ` [PATCH] Fix -B "very-different" logic Junio C Hamano
2005-06-03  0:21         ` Linus Torvalds
2005-06-03  1:33           ` Junio C Hamano
2005-06-03  8:32             ` [PATCH 0/4] " Junio C Hamano
2005-06-03  8:36               ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano
2005-06-03  8:36               ` [PATCH 2/4] diff: Fix docs and add -O to diff-helper Junio C Hamano
2005-06-03  8:37               ` [PATCH 3/4] diff: Clean up diff_scoreopt_parse() Junio C Hamano
2005-06-03  8:40               ` [PATCH 4/4] diff: Update -B heuristics Junio C Hamano
2005-06-01  6:28     ` I want to release a "git-1.0" Junio C Hamano
2005-06-01 22:00     ` Daniel Barkalow
2005-06-01 23:05       ` Junio C Hamano
2005-06-03  9:47       ` Petr Baudis
2005-06-03 15:09         ` Daniel Barkalow
2005-06-02  7:15     ` Eric W. Biederman
2005-06-02  8:32       ` Kay Sievers
2005-06-02 14:52       ` Linus Torvalds
2005-06-02 12:02     ` [PATCH] several typos in tutorial Alexey Nezhdanov
2005-06-02 12:41       ` Vincent Hanquez
2005-06-02 12:45         ` Alexey Nezhdanov
2005-06-02 12:51           ` Vincent Hanquez
2005-06-02 12:56             ` Alexey Nezhdanov
2005-06-02 13:00             ` Alexey Nezhdanov
2005-06-02 23:40     ` I want to release a "git-1.0" Adam Kropelin
2005-06-03  0:06       ` Linus Torvalds
2005-06-03  0:47         ` Linus Torvalds
2005-06-03  1:34           ` Adam Kropelin
2005-06-02 19:43 ` CVS migration section to the tutorial Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).