* I want to release a "git-1.0" @ 2005-05-30 20:00 Linus Torvalds 2005-05-30 20:33 ` jeff millar ` (10 more replies) 0 siblings, 11 replies; 64+ messages in thread From: Linus Torvalds @ 2005-05-30 20:00 UTC (permalink / raw) To: Git Mailing List Ok, I'm at the point where I really think it's getting close to a 1.0, and make another tar-ball etc. I obviously feel that it's already way superior to CVS, but I also realize that somebody who is used to CVS may not actually realize that very easily. So before I do a 1.0 release, I want to write some stupid git tutorial for a complete beginner that has only used CVS before, with a real example of how to use raw git, and along those lines I actually want the thing to show how to do something useful. So before I do that, is there something people think is just too hard for somebody coming from the CVS world to understand? I already realized that the "git-write-tree" + "git-commit-tree" interfaces were just _too_ hard to put into a sane tutorial. I was showing off raw git to Steve Chamberlain yesterday, and showing it to him made some things pretty obvious - one of them being that "git-init-db" really needed to set up the initial refs etc). So I wrote this silly "git-commit-script" to make it at least half-way palatable, but what else do people feel is "too hard"? I think I'll move the "cvs2git" script thing to git proper before the 1.0 release (again, in order to have the tutorial able to show what to do if you already have an existing CVS tree), what else? Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds @ 2005-05-30 20:33 ` jeff millar 2005-05-30 20:49 ` Nicolas Pitre ` (9 subsequent siblings) 10 siblings, 0 replies; 64+ messages in thread From: jeff millar @ 2005-05-30 20:33 UTC (permalink / raw) To: Linus Torvalds, git Linus Torvalds wrote: >So before I do that, is there something people think is just too hard for >somebody coming from the CVS world to understand? > I'm a fairly clueless cvs user, trying to use cg/git as a way to track a single user project...using cogito, because that's easier, right? The usage pattern that causing me problems right now. cg-init a whole directory tree (trying with /etc and a software project directory) note that too many files got included (*.cache, *.backup, *.o, binaries, etc) want to stop tracking them, cg-rm also removes the file, don't want that. What's the best way to stop tracking files? jeff ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds 2005-05-30 20:33 ` jeff millar @ 2005-05-30 20:49 ` Nicolas Pitre 2005-06-01 6:52 ` Junio C Hamano 2005-05-30 20:59 ` I want to release a "git-1.0" Junio C Hamano ` (8 subsequent siblings) 10 siblings, 1 reply; 64+ messages in thread From: Nicolas Pitre @ 2005-05-30 20:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Mon, 30 May 2005, Linus Torvalds wrote: > > Ok, I'm at the point where I really think it's getting close to a 1.0, and > make another tar-ball etc. Any chance you could merge my latest mkdelta patch _please_ ??? I just posted it twice in the last 4 days and it still didn't appear in your repository. Again, the current version of mkdelta in your tree has a bug that can screw things up, and it is fixed in the latest patch of course. Nicolas ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:49 ` Nicolas Pitre @ 2005-06-01 6:52 ` Junio C Hamano 2005-06-01 8:24 ` [PATCH] Add -d flag to git-pull-* family Junio C Hamano 0 siblings, 1 reply; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 6:52 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Linus Torvalds, Git Mailing List I just remembered that I mentioned potential problems with non rsync pulls with delta objects, especially when the git-*-pull commands are used in "things only close to the tip" mode, i.e. without "-a" option. Do you think we should do something about it before GIT 1.0 happens? It may be enough if we just tell people not to deltify their public non-rsync repositories in the documentation. ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH] Add -d flag to git-pull-* family. 2005-06-01 6:52 ` Junio C Hamano @ 2005-06-01 8:24 ` Junio C Hamano 2005-06-01 14:39 ` Nicolas Pitre 0 siblings, 1 reply; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 8:24 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Linus Torvalds, Git Mailing List When a remote repository is deltified, we need to get the objects that a deltified object we want to obtain is based upon. Since checking representation type of all objects we retreive from remote side may be costly, this is made into a separate option -d; -a implies it for convenience and safety. Rsync transport does not have this problem since it fetches everything the remote side has. Signed-off-by: Junio C Hamano <junkio@cox.net> --- Documentation/git-http-pull.txt | 4 +++- Documentation/git-local-pull.txt | 4 +++- Documentation/git-rpull.txt | 4 +++- http-pull.c | 5 ++++- local-pull.c | 5 ++++- pull.c | 15 +++++++++++++++ pull.h | 3 +++ rpull.c | 5 ++++- 8 files changed, 39 insertions(+), 6 deletions(-) diff --git a/Documentation/git-http-pull.txt b/Documentation/git-http-pull.txt --- a/Documentation/git-http-pull.txt +++ b/Documentation/git-http-pull.txt @@ -9,7 +9,7 @@ git-http-pull - Downloads a remote GIT r SYNOPSIS -------- -'git-http-pull' [-c] [-t] [-a] [-v] commit-id url +'git-http-pull' [-c] [-t] [-a] [-v] [-d] commit-id url DESCRIPTION ----------- @@ -17,6 +17,8 @@ Downloads a remote GIT repository via HT -c:: Get the commit objects. +-d:: + Get objects that deltified objects are based upon. -t:: Get trees associated with the commit objects. -a:: diff --git a/Documentation/git-local-pull.txt b/Documentation/git-local-pull.txt --- a/Documentation/git-local-pull.txt +++ b/Documentation/git-local-pull.txt @@ -9,7 +9,7 @@ git-local-pull - Duplicates another GIT SYNOPSIS -------- -'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path +'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path DESCRIPTION ----------- @@ -19,6 +19,8 @@ OPTIONS ------- -c:: Get the commit objects. +-d:: + Get objects that deltified objects are based upon. -t:: Get trees associated with the commit objects. -a:: diff --git a/Documentation/git-rpull.txt b/Documentation/git-rpull.txt --- a/Documentation/git-rpull.txt +++ b/Documentation/git-rpull.txt @@ -10,7 +10,7 @@ git-rpull - Pulls from a remote reposito SYNOPSIS -------- -'git-rpull' [-c] [-t] [-a] [-v] commit-id url +'git-rpull' [-c] [-t] [-a] [-v] [-d] commit-id url DESCRIPTION ----------- @@ -21,6 +21,8 @@ OPTIONS ------- -c:: Get the commit objects. +-d:: + Get objects that deltified objects are based upon. -t:: Get trees associated with the commit objects. -a:: diff --git a/http-pull.c b/http-pull.c --- a/http-pull.c +++ b/http-pull.c @@ -103,17 +103,20 @@ int main(int argc, char **argv) get_tree = 1; } else if (argv[arg][1] == 'c') { get_history = 1; + } else if (argv[arg][1] == 'd') { + get_delta = 1; } else if (argv[arg][1] == 'a') { get_all = 1; get_tree = 1; get_history = 1; + get_delta = 1; } else if (argv[arg][1] == 'v') { get_verbosely = 1; } arg++; } if (argc < arg + 2) { - usage("git-http-pull [-c] [-t] [-a] [-v] commit-id url"); + usage("git-http-pull [-c] [-t] [-a] [-d] [-v] commit-id url"); return 1; } commit_id = argv[arg]; diff --git a/local-pull.c b/local-pull.c --- a/local-pull.c +++ b/local-pull.c @@ -74,7 +74,7 @@ int fetch(unsigned char *sha1) } static const char *local_pull_usage = -"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path"; +"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path"; /* * By default we only use file copy. @@ -92,10 +92,13 @@ int main(int argc, char **argv) get_tree = 1; else if (argv[arg][1] == 'c') get_history = 1; + else if (argv[arg][1] == 'd') + get_delta = 1; else if (argv[arg][1] == 'a') { get_all = 1; get_tree = 1; get_history = 1; + get_delta = 1; } else if (argv[arg][1] == 'l') use_link = 1; diff --git a/pull.c b/pull.c --- a/pull.c +++ b/pull.c @@ -6,6 +6,7 @@ int get_tree = 0; int get_history = 0; +int get_delta = 0; int get_all = 0; int get_verbosely = 0; static unsigned char current_commit_sha1[20]; @@ -37,6 +38,20 @@ static int make_sure_we_have_it(const ch status = fetch(sha1); if (status && what) report_missing(what, sha1); + if (get_delta) { + unsigned long mapsize, size; + void *map, *buf; + char type[20]; + + map = map_sha1_file(sha1, &mapsize); + if (map) { + buf = unpack_sha1_file(map, mapsize, type, &size); + munmap(map, mapsize); + if (buf && !strcmp(type, "delta")) + status = make_sure_we_have_it(what, buf); + free(buf); + } + } return status; } diff --git a/pull.h b/pull.h --- a/pull.h +++ b/pull.h @@ -13,6 +13,9 @@ extern int get_history; /** Set to fetch the trees in the commit history. **/ extern int get_all; +/* Set to fetch the base of delta objects.*/ +extern int get_delta; + /* Set to be verbose */ extern int get_verbosely; diff --git a/rpull.c b/rpull.c --- a/rpull.c +++ b/rpull.c @@ -27,17 +27,20 @@ int main(int argc, char **argv) get_tree = 1; } else if (argv[arg][1] == 'c') { get_history = 1; + } else if (argv[arg][1] == 'd') { + get_delta = 1; } else if (argv[arg][1] == 'a') { get_all = 1; get_tree = 1; get_history = 1; + get_delta = 1; } else if (argv[arg][1] == 'v') { get_verbosely = 1; } arg++; } if (argc < arg + 2) { - usage("git-rpull [-c] [-t] [-a] [-v] commit-id url"); + usage("git-rpull [-c] [-t] [-a] [-v] [-d] commit-id url"); return 1; } commit_id = argv[arg]; ------------------------------------------------ ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] Add -d flag to git-pull-* family. 2005-06-01 8:24 ` [PATCH] Add -d flag to git-pull-* family Junio C Hamano @ 2005-06-01 14:39 ` Nicolas Pitre 2005-06-01 16:00 ` Junio C Hamano 0 siblings, 1 reply; 64+ messages in thread From: Nicolas Pitre @ 2005-06-01 14:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List On Wed, 1 Jun 2005, Junio C Hamano wrote: > When a remote repository is deltified, we need to get the > objects that a deltified object we want to obtain is based upon. > Since checking representation type of all objects we retreive > from remote side may be costly, this is made into a separate > option -d; -a implies it for convenience and safety. I wonder if making this optional makes sense. In fact, if you believe having the option is useful then it should probably be the other way around i.e. to _not_ look at deltas when it is specified. Otherwise you'll end up with an incoherent repository. To minimize the cost a lot it could be possible to uncompress just the first 40 bytes or so which is enough to determine if the object is a delta and if so what object it is against. What do you think? Nicolas ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] Add -d flag to git-pull-* family. 2005-06-01 14:39 ` Nicolas Pitre @ 2005-06-01 16:00 ` Junio C Hamano [not found] ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net> 0 siblings, 1 reply; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 16:00 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Linus Torvalds, Git Mailing List >>>>> "NP" == Nicolas Pitre <nico@cam.org> writes: NP> What do you think? What you say makes a lot more sense than my quick hack on both counts. ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net>]
* Re: [PATCH] Handle deltified object correctly in git-*-pull family. [not found] ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net> @ 2005-06-02 0:47 ` Nicolas Pitre [not found] ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net> 2005-06-02 0:58 ` [PATCH] Handle deltified object correctly in git-*-pull family Linus Torvalds 2 siblings, 0 replies; 64+ messages in thread From: Nicolas Pitre @ 2005-06-02 0:47 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, Daniel Barkalow <barkalow@iabervon.org> Git Mailing List On Wed, 1 Jun 2005, Junio C Hamano wrote: > *** Dan and Nico, could you check this for correctness? I've > *** tested it with a deltified core GIT repository and pulling > *** with local-pull from there. I have verified that a pull > *** that fails with -d flag retrieves the right base-object to > *** complete a deltified ones. The delta part looks fine to me. Nicolas ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net>]
* Re: [PATCH] Stop inflating the whole SHA1 file only to check size. [not found] ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net> @ 2005-06-02 0:51 ` Nicolas Pitre 2005-06-02 1:32 ` Junio C Hamano 0 siblings, 1 reply; 64+ messages in thread From: Nicolas Pitre @ 2005-06-02 0:51 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, Daniel Barkalow <barkalow@iabervon.org> Git Mailing List On Wed, 1 Jun 2005, Junio C Hamano wrote: > Using the new unpack_sha1_file_partial() function, stop > inflating the whole SHA1 file when rename detector wants to know > only the filesize. Beware. If you have delta objects you'll get the size of the delta itself and not the final object size, unless you recurse until a non delta object is found. Nicolas ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] Stop inflating the whole SHA1 file only to check size. 2005-06-02 0:51 ` [PATCH] Stop inflating the whole SHA1 file only to check size Nicolas Pitre @ 2005-06-02 1:32 ` Junio C Hamano 0 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-02 1:32 UTC (permalink / raw) To: Nicolas Pitre Cc: Linus Torvalds, Daniel Barkalow <barkalow@iabervon.org> Git Mailing List >>>>> "NP" == Nicolas Pitre <nico@cam.org> writes: NP> On Wed, 1 Jun 2005, Junio C Hamano wrote: >> Using the new unpack_sha1_file_partial() function, stop >> inflating the whole SHA1 file when rename detector wants to know >> only the filesize. NP> Beware. You are right. I cannot believe how stupid I am, falling into this trap _just_ _after_ looking at the delta stuff X-<. Linus please drop that one. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] Handle deltified object correctly in git-*-pull family. [not found] ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net> 2005-06-02 0:47 ` [PATCH] Handle deltified object correctly in git-*-pull family Nicolas Pitre [not found] ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net> @ 2005-06-02 0:58 ` Linus Torvalds 2005-06-02 1:43 ` Junio C Hamano 2 siblings, 1 reply; 64+ messages in thread From: Linus Torvalds @ 2005-06-02 0:58 UTC (permalink / raw) To: Junio C Hamano Cc: Nicolas Pitre, Daniel Barkalow <barkalow@iabervon.org> Git Mailing List On Wed, 1 Jun 2005, Junio C Hamano wrote: > > *** Linus, I have a hook in sha1_file.c to let me figure out the > *** size of the SHA1 file without fully expanding it. This > *** patch does not use it, but you already know where I am > *** heading, so please leave it there ;-). Argh. This is just adding conceptual complexity without any real advantage. Why not just split out the current "unpack_sha1_file()" into two stages: "unpack_sha1_header()" and the rest. Then you can just decide to call "unpack_sha1_header()" when you want the header information. Hmm. I just committed something like that. If you want to just see the type of an object, you can map the object in memory, and just do z_stream stream; char buffer[100]; if (unpack_sha1_header(&stream, map, mapsize, buffer, sizeof(buffer) < 0) return NULL; if (sscanf(buffer, %10s %lu", type, size) != 0) return NULL; .. there you have it .. which is a lot simpler than worrying about callbacks etc. Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] Handle deltified object correctly in git-*-pull family. 2005-06-02 0:58 ` [PATCH] Handle deltified object correctly in git-*-pull family Linus Torvalds @ 2005-06-02 1:43 ` Junio C Hamano 0 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-02 1:43 UTC (permalink / raw) To: Linus Torvalds Cc: Nicolas Pitre, Daniel Barkalow <barkalow@iabervon.org> Git Mailing List >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> Why not just split out the current "unpack_sha1_file()" into two stages: LT> "unpack_sha1_header()" and the rest. LT> which is a lot simpler than worrying about callbacks etc. Alright. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds 2005-05-30 20:33 ` jeff millar 2005-05-30 20:49 ` Nicolas Pitre @ 2005-05-30 20:59 ` Junio C Hamano 2005-05-30 21:07 ` Junio C Hamano ` (7 subsequent siblings) 10 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-05-30 20:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List I'd really appreciate if you reconsider diff-* -O for inclusion before 1.0 happens. It is probably the lowest impact among the diffcore family. Don't I deserve it ;-)? ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds ` (2 preceding siblings ...) 2005-05-30 20:59 ` I want to release a "git-1.0" Junio C Hamano @ 2005-05-30 21:07 ` Junio C Hamano 2005-05-30 22:11 ` David Greaves ` (6 subsequent siblings) 10 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-05-30 21:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> I was showing off raw git to Steve Chamberlain yesterday, and showing it LT> to him made some things pretty obvious - one of them being that LT> "git-init-db" really needed to set up the initial refs etc). So I wrote LT> this silly "git-commit-script" to make it at least half-way palatable, but LT> what else do people feel is "too hard"? I think you need to clarify your intended audience first before soliciting "list of things that would help CVS user to convert to GIT". Specifically, which variant of GIT you are talking about. I think you are talking about using the bare Plumbing. I suspect that some of the things you said "too hard" may be coming from the fact that you did not use Cogito in the "showing off" you did. I imagine Cogito users do not experience the trouble you felt with git-init-db, since I presume they would rather use cg-init which IIUIC sets up the .git/refs structure for its taste. Having said that, I am in the same camp as you are in, in that the (secondary) goal of my involvement in this project so far has been to make the bare Plumbing confortable enough to use, to make the choice of Porcelain more or less irrelevant. As such, I am all for such a tutorial to convert CVS people to Plumbing GIT. Not that I'd volunteer writing big part of such a document. I suck at documentation, not just math ;-). ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds ` (3 preceding siblings ...) 2005-05-30 21:07 ` Junio C Hamano @ 2005-05-30 22:11 ` David Greaves 2005-05-30 22:12 ` Dave Jones ` (5 subsequent siblings) 10 siblings, 0 replies; 64+ messages in thread From: David Greaves @ 2005-05-30 22:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds wrote: >So before I do a 1.0 release, I want to write some stupid git tutorial for >a complete beginner that has only used CVS before, with a real example of >how to use raw git, and along those lines I actually want the thing to >show how to do something useful. > >So before I do that, is there something people think is just too hard for >somebody coming from the CVS world to understand? I already realized that >the "git-write-tree" + "git-commit-tree" interfaces were just _too_ hard >to put into a sane tutorial. > >I was showing off raw git to Steve Chamberlain yesterday, and showing it >to him made some things pretty obvious - one of them being that >"git-init-db" really needed to set up the initial refs etc). So I wrote >this silly "git-commit-script" to make it at least half-way palatable, but >what else do people feel is "too hard"? > >I think I'll move the "cvs2git" script thing to git proper before the 1.0 >release (again, in order to have the tutorial able to show what to do if >you already have an existing CVS tree), what else? > > It seems to me that a tutorial for end users is inappropriate. You should be writing a tutorial for porcelain implementors :) Anyway, a while back I split the commands into manipulation and interrogation and then into ancillary commands and scripts. Do you actually agree with this grouping? http://www.kernel.org/pub/software/scm/git/docs/git.html It may help to position who should be doing what. Also, if you're writing a git-init-script, it may be that you're simply scripting common processes and could helpfully maintain consistency by either pulling some of the really trivial Cogito scripts (cg-init, cg-add, cg-rm) into the core 'ancillary' area or suggesting modifications to Cogito as the current 'best of breed' implementation of the low-level git usage process. Cogito also 'fixes' some useability issues such as using "git-update-cache --add" == "cg-add" I know you _can_ use git as an end user - but it seems that it's designed to be used by plumbers. Oh, I'd also like to see something along the lines of my cg-Xignore before git hits 1.0 On the tutorial side - yesterday I started pulling together stuff from the list about merging to complete the README where it says [ fixme: talk about resolving merges here ] I haven't done much other than collect some discussion from the list and the text from git-read-tree.txt. I do think this area needs more explanation as the whole 'stage' thing is pretty alien to CVS. I also noted a few people asking "so I did this merge - what do I do now?" The working directory/cache/repository is also confusing sometimes - especially when the cache and working-dir unexpectedly don't match. I also see in my notes: "improve the docs around update-cache." David ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds ` (4 preceding siblings ...) 2005-05-30 22:11 ` David Greaves @ 2005-05-30 22:12 ` Dave Jones 2005-05-30 22:55 ` Dmitry Torokhov 2005-05-31 0:52 ` Linus Torvalds 2005-05-30 22:19 ` Ryan Anderson ` (4 subsequent siblings) 10 siblings, 2 replies; 64+ messages in thread From: Dave Jones @ 2005-05-30 22:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Mon, May 30, 2005 at 01:00:42PM -0700, Linus Torvalds wrote: > > Ok, I'm at the point where I really think it's getting close to a 1.0, and > make another tar-ball etc. I obviously feel that it's already way superior > to CVS, but I also realize that somebody who is used to CVS may not > actually realize that very easily. > > So before I do a 1.0 release, I want to write some stupid git tutorial for > a complete beginner that has only used CVS before, with a real example of > how to use raw git, and along those lines I actually want the thing to > show how to do something useful. > > So before I do that, is there something people think is just too hard for > somebody coming from the CVS world to understand? I already realized that > the "git-write-tree" + "git-commit-tree" interfaces were just _too_ hard > to put into a sane tutorial. > > I was showing off raw git to Steve Chamberlain yesterday, and showing it > to him made some things pretty obvious - one of them being that > "git-init-db" really needed to set up the initial refs etc). So I wrote > this silly "git-commit-script" to make it at least half-way palatable, but > what else do people feel is "too hard"? I finally got around to actually trying to use git to maintain the cpufreq repository the last few days after reading Jeff Garzik's mini-howto[1] It's not particularly complicated, but the number one thing that's bugged me is this.. # commit changes GIT_AUTHOR_NAME="John Doe" \ GIT_AUTHOR_EMAIL="jdoe@foo.com" \ GIT_COMMITTER_NAME="Jeff Garzik" \ GIT_COMMITTER_EMAIL="jgarzik@pobox.com" \ git-commit-tree `git-write-tree` \ -p $(cat .git/HEAD ) \ < changelog.txt \ > .git/HEAD For merging a lot of csets, thats a lot of typing per cset. So my .bashrc now sets up GIT_COMMITTER_NAME & GIT_COMMITTER_EMAIL, because I don't foresee myself changing either of those anytime soon, which takes it down to GIT_AUTHOR_NAME="John Doe" \ GIT_AUTHOR_EMAIL="jdoe@foo.com" \ git-commit-tree `git-write-tree` \ -p $(cat .git/HEAD ) \ < changelog.txt \ > .git/HEAD per-cset. Maybe I have early on-set dementia, but the number of times I've typoed those two remaining environment variables is bizarre. I must've hit every known combination possible in my merge of ~30 patches. I could make the latter 4 lines of the above a shell alias to save some typing, but those shell vars still bug me. Hmm, maybe I could create a wrapper that splits a "Dave Jones <davej@redhat.com" style string into two vars. I realise you've got a nifty bunch of tools to apply a whole mbox of patches, but that's not ideal if all of my patches aren't in mboxes (some I create myself and toss in my spool, some I pull from bugzilla etc..) Typos aside, the other thing that seems non-intuitive is the splitting up of the patch & changelog comment into seperate files during the patch-apply stage. Maybe your new git-commit-script wonder-tool fixes up all these problems already, I'll take a look after food. Its pretty nifty stuff, but for merging a lot of patches in non-mbox format, either I'm doing something wrong, or its, well.. painful. Dave [1] http://lkml.org/lkml/2005/5/26/11/index.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 22:12 ` Dave Jones @ 2005-05-30 22:55 ` Dmitry Torokhov 2005-05-30 23:15 ` Junio C Hamano 2005-05-30 23:23 ` Dmitry Torokhov 2005-05-31 0:52 ` Linus Torvalds 1 sibling, 2 replies; 64+ messages in thread From: Dmitry Torokhov @ 2005-05-30 22:55 UTC (permalink / raw) To: git; +Cc: Dave Jones, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 648 bytes --] On Monday 30 May 2005 17:12, Dave Jones wrote: > I realise you've got a nifty bunch of tools to apply a whole mbox of > patches, but that's not ideal if all of my patches aren't in mboxes > (some I create myself and toss in my spool, some I pull from bugzilla etc..) I mercilessly hacked Linus's scripts from git-tools repo to work with non-mailbox patches, maybe you can make use of them too. Note that stripspace.c is not changed in any way whatsoever and mailsplit.c was changed to handle my personal preference of having patch description in the form of: Input: make blah blah change --- And Linus's script would eat that line. -- Dmitry [-- Attachment #2: applypatch --] [-- Type: application/x-shellscript, Size: 888 bytes --] [-- Attachment #3: apply_parsed_patch --] [-- Type: application/x-shellscript, Size: 2123 bytes --] [-- Attachment #4: stripspace.c --] [-- Type: text/x-csrc, Size: 786 bytes --] #include <stdio.h> #include <string.h> #include <ctype.h> /* * Remove empty lines from the beginning and end. * * Turn multiple consecutive empty lines into just one * empty line. */ static void cleanup(char *line) { int len = strlen(line); if (len > 1 && line[len-1] == '\n') { do { unsigned char c = line[len-2]; if (!isspace(c)) break; line[len-2] = '\n'; len--; line[len] = 0; } while (len > 1); } } int main(int argc, char **argv) { int empties = -1; char line[1024]; while (fgets(line, sizeof(line), stdin)) { cleanup(line); /* Not just an empty line? */ if (line[0] != '\n') { if (empties > 0) putchar('\n'); empties = 0; fputs(line, stdout); continue; } if (empties < 0) continue; empties++; } return 0; } [-- Attachment #5: mailsplit.c --] [-- Type: text/x-csrc, Size: 2526 bytes --] /* * Totally braindamaged mbox splitter program. * * It just splits a mbox into a list of files: "0001" "0002" .. * so you can process them further from there. */ #include <unistd.h> #include <stdlib.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #include <string.h> #include <stdio.h> #include <ctype.h> #include <assert.h> static int usage(void) { fprintf(stderr, "mailsplit <mbox> <directory>\n"); exit(1); } static int linelen(const char *map, unsigned long size) { int len = 0, c; do { c = *map; map++; size--; len++; } while (size && c != '\n'); return len; } static int is_from_line(const char *line, int len) { const char *colon; if (len < 20 || memcmp("From ", line, 5)) return 0; colon = line + len - 2; line += 5; for (;;) { if (colon < line) return 0; if (*--colon == ':') break; } if (!isdigit(colon[-4]) || !isdigit(colon[-2]) || !isdigit(colon[-1]) || !isdigit(colon[ 1]) || !isdigit(colon[ 2])) return 0; /* year */ if (strtol(colon+3, NULL, 10) <= 90) return 0; /* Ok, close enough */ return 1; } static int parse_email(const void *map, unsigned long size) { unsigned long offset; if (size < 6 || memcmp("From ", map, 5)) goto corrupt; /* Make sure we don't trigger on this first line */ map++; size--; offset=1; /* * Search for a line beginning with "From ", and * having smething that looks like a date format. */ do { int len = linelen(map, size); if (is_from_line(map, len)) return offset; map += len; size -= len; offset += len; } while (size); return offset; corrupt: fprintf(stderr, "corrupt mailbox\n"); exit(1); } int main(int argc, char **argv) { int fd, nr; struct stat st; unsigned long size; void *map; if (argc != 3) usage(); fd = open(argv[1], O_RDONLY); if (fd < 0) { perror(argv[1]); exit(1); } if (chdir(argv[2]) < 0) usage(); if (fstat(fd, &st) < 0) { perror("stat"); exit(1); } size = st.st_size; map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); if (-1 == (int)(long)map) { perror("mmap"); exit(1); } close(fd); nr = 0; do { char name[10]; unsigned long len = parse_email(map, size); assert(len <= size); sprintf(name, "%04d", ++nr); fd = open(name, O_WRONLY | O_CREAT | O_EXCL, 0600); if (fd < 0) { perror(name); exit(1); } if (write(fd, map, len) != len) { perror("write"); exit(1); } close(fd); map += len; size -= len; } while (size > 0); return 0; } ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 22:55 ` Dmitry Torokhov @ 2005-05-30 23:15 ` Junio C Hamano 2005-05-30 23:23 ` Dmitry Torokhov 1 sibling, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-05-30 23:15 UTC (permalink / raw) To: git; +Cc: Linus Torvalds On a related topic of making bare Plumbing a bit easier to use, here is what I use to prepare patches, one patch per file, to be sent to Linus via e-mail. Usage: $ git-format-patch-script HEAD linus Assuming that "linus" is the tip of the tree from Linus (typically stored in .git/branches/linus if you use Cogito), and HEAD is your additions on top of it, the above command will produce patches in the format you have been seeing on this list from me, one file per commit, in .patches/XXXX-patch-title.txt file. Signed-off-by: Junio C Hamano <junkio@cox.net> --- sed -e 's/^X//' >git-format-patch-script <<\EOF X#!/bin/sh X# X# Copyright (c) 2005 Junio C Hamano X# Xjunio="$1" Xlinus="$2" X Xtmp=.tmp-series$$ Xtrap 'rm -f $tmp-*' 0 1 2 3 15 X Xseries=$tmp-series X XtitleScript=' X 1,/^$/d X : loop X /^$/b loop X s/[^-a-z.A-Z_0-9]/-/g X s/^--*//g X s/--*$//g X s/---*/-/g X s/$/.txt/ X s/\.\.\.*/\./g X q X' XO= Xif test -f .git/patch-order Xthen X O=-O.git/patch-order Xfi Xgit-rev-list "$junio" "$linus" >$series Xtotal=`wc -l <$series` Xi=$total Xwhile read commit Xdo X title=`git-cat-file commit "$commit" | sed -e "$titleScript"` X num=`printf "%d/%d" $i $total` X file=`printf '%04d-%s' $i "$title"` X i=`expr "$i" - 1` X echo "$file" X { X mailScript=' X 1,/^$/d X : loop X /^$/b loop X s|^|[PATCH '"$num"'] | X : body X p X n X b body' X X git-cat-file commit "$commit" | sed -ne "$mailScript" X echo '---' X git-diff-tree -p $O "$commit" | diffstat -p1 X echo X git-diff-tree -p $O "$commit" X } >".patches/$file" Xdone <$series EOF ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 22:55 ` Dmitry Torokhov 2005-05-30 23:15 ` Junio C Hamano @ 2005-05-30 23:23 ` Dmitry Torokhov 1 sibling, 0 replies; 64+ messages in thread From: Dmitry Torokhov @ 2005-05-30 23:23 UTC (permalink / raw) To: git; +Cc: Dave Jones, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 776 bytes --] On Monday 30 May 2005 17:55, Dmitry Torokhov wrote: > On Monday 30 May 2005 17:12, Dave Jones wrote: > > I realise you've got a nifty bunch of tools to apply a whole mbox of > > patches, but that's not ideal if all of my patches aren't in mboxes > > (some I create myself and toss in my spool, some I pull from bugzilla etc..) > > I mercilessly hacked Linus's scripts from git-tools repo to work with > non-mailbox patches, maybe you can make use of them too. Note that > stripspace.c is not changed in any way whatsoever and mailsplit.c was > changed to handle my personal preference of having patch description > in the form of: > > Input: make blah blah change > --- > > And Linus's script would eat that line. > Oops, make it mailinfo.c, not mailsplit.c -- Dmitry [-- Attachment #2: mailinfo.c --] [-- Type: text/x-csrc, Size: 5691 bytes --] /* * Another stupid program, this one parsing the headers of an * email to figure out authorship and subject */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> static FILE *cmitmsg, *patchfile, *filelist; static char line[1000]; static char date[1000]; static char name[1000]; static char email[1000]; static char subject[1000]; static char *sanity_check(char *name, char *email) { int len = strlen(name); if (len < 3 || len > 60) return email; if (strchr(name, '@') || strchr(name, '<') || strchr(name, '>')) return email; return name; } static int handle_from(char *line) { char *at = strchr(line, '@'); char *dst; if (!at) return 0; /* * If we already have one email, don't take any confusing lines */ if (*email && strchr(at + 1, '@')) return 0; while (at > line) { char c = at[-1]; if (isspace(c) || c == '<') break; at--; } dst = email; for (;;) { unsigned char c = *at; if (!c || c == '>' || isspace(c)) break; *at++ = ' '; *dst++ = c; } *dst++ = 0; at = line + strlen(line); while (at > line) { unsigned char c = *--at; if (isalnum(c)) break; *at = 0; } at = line; for (;;) { unsigned char c = *at; if (!c) break; if (isalnum(c)) break; at++; } at = sanity_check(at, email); strcpy(name, at); return 1; } static void handle_date(char *line) { strcpy(date, line); } static void handle_subject(char *line) { strcpy(subject, line); } static void add_subject_line(char *line) { while (isspace(*line)) line++; *--line = ' '; strcat(subject, line); } static int check_special_line(char *line, int len) { static int cont = -1; if (!memcmp(line, "From:", 5) && isspace(line[5])) { handle_from(line + 6); cont = 0; return 1; } if (!memcmp(line, "Date:", 5) && isspace(line[5])) { handle_date(line + 6); cont = 0; return 1; } if (!memcmp(line, "Subject:", 8) && isspace(line[8])) { handle_subject(line + 9); cont = 1; return 1; } if (isspace(*line)) { switch (cont) { case 0: fprintf(stderr, "I don't do 'Date:' or 'From:' line continuations\n"); break; case 1: add_subject_line(line); return 1; default: break; } } cont = -1; return 0; } static char *cleanup_subject(char *subject) { for (;;) { char *p; int len, remove; switch (*subject) { case 'r': case 'R': if (!memcmp("e:", subject + 1, 2)) { subject += 3; continue; } break; case ' ': case '\t': case ':': subject++; continue; case '[': p = strchr(subject, ']'); if (!p) { subject++; continue; } len = strlen(p); remove = p - subject; if (remove <= len * 2) { subject = p + 1; continue; } break; } return subject; } } static void cleanup_space(char *buf) { unsigned char c; while ((c = *buf) != 0) { buf++; if (isspace(c)) { buf[-1] = ' '; c = *buf; while (isspace(c)) { int len = strlen(buf); memmove(buf, buf + 1, len); c = *buf; } } } } /* * Hacky hacky. This depends not only on -p1, but on * filenames not having some special characters in them, * like tilde. */ static void show_filename(char *line) { int len; char *name = strchr(line, '/'); if (!name || !isspace(*line)) return; name++; len = 0; for (;;) { unsigned char c = name[len]; switch (c) { default: len++; continue; case 0: case ' ': case '\t': case '\n': break; /* patch tends to special-case these things.. */ case '~': break; } break; } /* remove ".orig" from the end - common patch behaviour */ if (len > 5 && !memcmp(name + len - 5, ".orig", 5)) len -= 5; if (!len) return; fprintf(filelist, "%.*s\n", len, name); } static void handle_rest(void) { char *sub = cleanup_subject(subject); cleanup_space(name); cleanup_space(date); cleanup_space(email); cleanup_space(sub); printf("Author: %s\nEmail: %s\nSubject: %s\nDate: %s\n\n", name, email, sub, date); FILE *out = cmitmsg; do { /* Track filename information from the patch.. */ if (!memcmp("---", line, 3)) { out = patchfile; show_filename(line + 3); } if (!memcmp("+++", line, 3)) show_filename(line + 3); fputs(line, out); } while (fgets(line, sizeof(line), stdin) != NULL); if (out == cmitmsg) { fprintf(stderr, "No patch found\n"); exit(1); } fclose(cmitmsg); fclose(patchfile); } static int eatspace(char *line) { int len = strlen(line); while (len > 0 && isspace(line[len - 1])) line[--len] = 0; return len; } static void handle_body(void) { int has_from = 0; /* First line of body can be a From: */ while (fgets(line, sizeof(line), stdin) != NULL) { int len = eatspace(line); if (!len) continue; if (!memcmp("From:", line, 5) && isspace(line[5])) { if (!has_from && handle_from(line + 6)) { has_from = 1; continue; } } line[len] = '\n'; handle_rest(); break; } } static void usage(void) { fprintf(stderr, "mailinfo msg-file path-file filelist-file < email\n"); exit(1); } int main(int argc, char **argv) { int mail_patch = 0; if (argc != 4) usage(); cmitmsg = fopen(argv[1], "w"); if (!cmitmsg) { perror(argv[1]); exit(1); } patchfile = fopen(argv[2], "w"); if (!patchfile) { perror(argv[2]); exit(1); } filelist = fopen(argv[3], "w"); if (!filelist) { perror(argv[3]); exit(1); } while (fgets(line, sizeof(line), stdin) != NULL) { int len = eatspace(line); if (!len) { if (!mail_patch) fputs("\n", cmitmsg); handle_body(); break; } if (check_special_line(line, len)) { mail_patch = 1; rewind(cmitmsg); } if (!mail_patch) { line[len] = '\n'; fputs(line, cmitmsg); } } return 0; } ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 22:12 ` Dave Jones 2005-05-30 22:55 ` Dmitry Torokhov @ 2005-05-31 0:52 ` Linus Torvalds 1 sibling, 0 replies; 64+ messages in thread From: Linus Torvalds @ 2005-05-31 0:52 UTC (permalink / raw) To: Dave Jones; +Cc: Git Mailing List On Mon, 30 May 2005, Dave Jones wrote: > > GIT_AUTHOR_NAME="John Doe" \ > GIT_AUTHOR_EMAIL="jdoe@foo.com" \ > git-commit-tree `git-write-tree` \ > -p $(cat .git/HEAD ) \ > < changelog.txt \ > > .git/HEAD You _really_ want to script this. Also, I'd seriously suggest you avoid using ".git/HEAD" _and_ writing to .git/HEAD in the same command. Maybe it works, maybe it doesn't. So script it with something like #!/bin/sh export GIT_AUTHOR_NAME="$1" export GIT_AUTHOR_EMAIL="$2" tree=$(git-write-tree) || exit 1 commit=$(git-commit-tree $tree -p HEAD) || exit 1 echo $commit > .git/HEAD and now you can just do commit-as "John Doe" "jdoe@foo.com" < changelog.txt or something like that. The git commands really are designed to be scripted. Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds ` (5 preceding siblings ...) 2005-05-30 22:12 ` Dave Jones @ 2005-05-30 22:19 ` Ryan Anderson 2005-05-31 0:58 ` Linus Torvalds 2005-05-30 22:32 ` Chris Wedgwood ` (3 subsequent siblings) 10 siblings, 1 reply; 64+ messages in thread From: Ryan Anderson @ 2005-05-30 22:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Mon, May 30, 2005 at 01:00:42PM -0700, Linus Torvalds wrote: > > I think I'll move the "cvs2git" script thing to git proper before the 1.0 > release (again, in order to have the tutorial able to show what to do if > you already have an existing CVS tree), what else? Umm, why do you maintain two seperate "git" related trees? Why not merge all of git-tools in, in a tools/ subdirectory? I've been meaning to ask the same question about "gitweb" for that matter. The distributions that want seperate packages for dependency reasons can handle that easily inside one tree, anyway, I believe. I'd guess part of this is a holdover from the fact that you needed an independent tree for BitKeeper, but does it still make sense? -- Ryan Anderson sometimes Pug Majere ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 22:19 ` Ryan Anderson @ 2005-05-31 0:58 ` Linus Torvalds 0 siblings, 0 replies; 64+ messages in thread From: Linus Torvalds @ 2005-05-31 0:58 UTC (permalink / raw) To: Ryan Anderson; +Cc: Git Mailing List On Mon, 30 May 2005, Ryan Anderson wrote: > > Umm, why do you maintain two seperate "git" related trees? Well, my "tools" thing really isn't git proper, and may not make much sense in the git distribution. That said, I'm actually moving things into git as they turn useful. For example, I move the "stripspace" program into git (which means it got renamed into "git-stripspace", since it ended up being useful for the stand-alone git-commit-scripts too. But how many non-Linux projects really apply mailboxes of patches? It doesn't seem to be very "core". > Why not merge all of git-tools in, in a tools/ subdirectory? I'll think about it. It does look like at least about half of the git tools end up being pretty core. > I've been meaning to ask the same question about "gitweb" for that > matter. Well, there the issue definitely boils down to "different maintainers". I don't want to connect things that don't need to be connected. > I'd guess part of this is a holdover from the fact that you needed an > independent tree for BitKeeper, but does it still make sense? Well, I see the "tools" thing really as my private tools that may or may not make sense for anybody else. Even the cvs2git thing is just so _stupid_, since I bet you can do it quite cleanly in perl without having that strange "convert cvsps output into a shellscript" stage (admittedly, it was _really_ convenient for debugging to have that separate stage, so while it looks a bit hacky, it ended up being very powerful). Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds ` (6 preceding siblings ...) 2005-05-30 22:19 ` Ryan Anderson @ 2005-05-30 22:32 ` Chris Wedgwood 2005-05-30 23:56 ` Chris Wedgwood 2005-05-31 1:06 ` Linus Torvalds 2005-05-31 0:19 ` Petr Baudis ` (2 subsequent siblings) 10 siblings, 2 replies; 64+ messages in thread From: Chris Wedgwood @ 2005-05-30 22:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Mon, May 30, 2005 at 01:00:42PM -0700, Linus Torvalds wrote: > So before I do that, is there something people think is just too > hard for somebody coming from the CVS world to understand? I already > realized that the "git-write-tree" + "git-commit-tree" interfaces > were just _too_ hard to put into a sane tutorial. I'm still at a loss how to do the equivalent of annotate. I know a couple of front ends can do this but I have no idea what command line magic would be equivalent. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 22:32 ` Chris Wedgwood @ 2005-05-30 23:56 ` Chris Wedgwood 2005-05-31 1:06 ` Linus Torvalds 1 sibling, 0 replies; 64+ messages in thread From: Chris Wedgwood @ 2005-05-30 23:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Mon, May 30, 2005 at 03:32:42PM -0700, Chris Wedgwood wrote: > I'm still at a loss how to do the equivalent of annotate. I know a > couple of front ends can do this but I have no idea what command line > magic would be equivalent. A few people asked what does this now. Git Tracker does, a (random) example of which might be: http://www.tglx.de/cgi-bin/gittracker/annotate/tracker-linux/torvalds/linux-2.6.git/mm/mmap.c?blob=de54acd9942f9929004921042721df5cdfe2b6c7 ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 22:32 ` Chris Wedgwood 2005-05-30 23:56 ` Chris Wedgwood @ 2005-05-31 1:06 ` Linus Torvalds 2005-06-01 2:11 ` Junio C Hamano 1 sibling, 1 reply; 64+ messages in thread From: Linus Torvalds @ 2005-05-31 1:06 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Git Mailing List On Mon, 30 May 2005, Chris Wedgwood wrote: > > I'm still at a loss how to do the equivalent of annotate. I know a > couple of front ends can do this but I have no idea what command line > magic would be equivalent. There isn't any. It's actually pretty nasty to do, following history backwards and keeping track of lines as they are added. I know how, I'm just really lazy and hoping somebody else will do it, since I really end up not caring that much myself. I notice that Thomas Gleixner seems to have one, but that one is based on a database, and doesn't look usable as a standalone command.. Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-31 1:06 ` Linus Torvalds @ 2005-06-01 2:11 ` Junio C Hamano 2005-06-01 2:25 ` David Lang 0 siblings, 1 reply; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 2:11 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Linus Torvalds, Git Mailing List >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> On Mon, 30 May 2005, Chris Wedgwood wrote: >> >> I'm still at a loss how to do the equivalent of annotate. I know a >> couple of front ends can do this but I have no idea what command line >> magic would be equivalent. LT> There isn't any. It's actually pretty nasty to do, following history LT> backwards and keeping track of lines as they are added. I know how, I'm LT> just really lazy and hoping somebody else will do it, since I really end LT> up not caring that much myself. LT> I notice that Thomas Gleixner seems to have one, but that one is based on LT> a database, and doesn't look usable as a standalone command.. Here is my quick-and-dirty one done in Perl. This is dog-slow and not suited for interactive use, but its algorithm should handle the merges, renames and complete rewrites correctly. Its sample output for: $ blame.perl HEAD git-commit-script look like this (I've edited the SHA1 and names to make it a bit shorter, but it still does not fit on my 80-column terminal X-<). For each line in the version in the HEAD, it outputs (TAB separated) the SHA1 of the commit that is responsible for the line to be there, author, commiter, line number in the version of the guity commit and the filename in the guilty commit (this file could have been renamed in which case this may not match the name of the file the script was originally asked to annotate). It shows that 9th line was what was in a3e870f2... commit as 11th line done by Linus, for example. :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 1 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 2 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 3 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 4 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 5 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 6 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 7 git-commit-script :2036d841... Junio C....@cox.net> Linus T....osdl.org> 8 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 11 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 12 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 13 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 14 git-commit-script :a3e870f2... Linus T....osdl.org> Linus T....osdl.org> 15 git-commit-script ------------ A blame script for use by higher-level annotate tools. Signed-off-by: Junio C Hamano <junkio@cox.net> --- diff -u a/blame.perl b/blame.perl --- /dev/null +++ b/blame.perl @@ -0,0 +1,400 @@ +#!/usr/bin/perl -w + +use strict; + +package main; +$::debug = 0; + +sub read_blob { + my $sha1 = shift; + my $fh = undef; + my $result; + local ($/) = undef; + open $fh, '-|', 'git-cat-file', 'blob', $sha1 + or die "cannot read blob $sha1"; + $result = join('', <$fh>); + close $fh + or die "failure while closing pipe to git-cat-file"; + return $result; +} + +sub read_diff_raw { + my ($parent, $filename) = @_; + my $fh = undef; + local ($/) = "\0"; + my @result = (); + my ($meta, $status, $sha1_1, $sha1_2, $file1, $file2); + print STDERR "* diff-cache $parent\n" if $::debug; + open $fh, '-|', 'git-diff-cache', '-B', '-C', '--cached', '-z', $parent + or die "cannot read git-diff-cache with $parent"; + while (defined ($meta = <$fh>)) { + chomp($meta); + (undef, undef, $sha1_1, $sha1_2, $status) = split(/ /, $meta); + $file1 = <$fh>; + chomp($file1); + if ($status =~ /^[CR]/) { + $file2 = <$fh>; + chomp($file2); + } elsif ($status =~ /^D/) { + next; + } else { + $file2 = $file1; + } + if ($file2 eq $filename) { + push @result, [$status, $sha1_1, $sha1_2, $file1, $file2]; + } + } + close $fh + or die "failure while closing pipe to git-diff-cache"; + return @result; +} + +sub write_temp_blob { + my ($sha1, $temp) = @_; + my $fh = undef; + my $blob = read_blob($sha1); + open $fh, '>', $temp + or die "cannot open temporary file $temp"; + print $fh $blob; + close($fh); +} + +package Git::Patch; +sub new { + my ($class, $sha1_1, $sha1_2) = @_; + my $self = bless [], $class; + my $fh = undef; + ::write_temp_blob($sha1_1, "/tmp/blame-$$-1"); + ::write_temp_blob($sha1_2, "/tmp/blame-$$-2"); + open $fh, '-|', 'diff', '-u0', "/tmp/blame-$$-1", "/tmp/blame-$$-2" + or die "cannot read diff"; + while (<$fh>) { + if (/^\@\@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? \@\@/) { + push @$self, [$1, (defined $2 ? $2 : 1), + $3, (defined $4 ? $4 : 1)]; + } + } + close $fh; + unlink "/tmp/blame-$$-1", "/tmp/blame-$$-2"; + return $self; +} + +sub find_parent_line { + my ($self, $commit_lineno) = @_; + my $ofs = 0; + for (@$self) { + my ($line_1, $len_1, $line_2, $len_2) = @$_; + if ($commit_lineno < $line_2) { + return $commit_lineno - $ofs; + } + if ($line_2 <= $commit_lineno && $commit_lineno < $line_2 + $len_2) { + return -1; # changed by commit. + } + $ofs += ($len_1 - $len_2); + } + return $commit_lineno + $ofs; +} + +package Git::Commit; +sub new { + my $class = shift; + my $self = bless { + PARENT => [], + TREE => undef, + AUTHOR => undef, + COMMITTER => undef, + }, $class; + my $commit_sha1 = shift; + $self->{SHA1} = $commit_sha1; + my $fh = undef; + open $fh, '-|', 'git-cat-file', 'commit', $commit_sha1 + or die "cannot read commit object $commit_sha1"; + while (<$fh>) { + chomp; + if (/^tree ([0-9a-f]{40})$/) { $self->{TREE} = $1; } + elsif (/^parent ([0-9a-f]{40})$/) { push @{$self->{PARENT}}, $1; } + elsif (/^author ([^>]+>)/) { $self->{AUTHOR} = $1; } + elsif (/^committer ([^>]+>)/) { $self->{COMMITTER} = $1; } + } + close $fh + or die "failure while closing pipe to git-cat-file"; + return $self; +} + +sub find_file { + my ($commit, $path) = @_; + my $result = undef; + my $fh = undef; + local ($/) = "\0"; + open $fh, '-|', 'git-ls-tree', '-z', '-r', '-d', $commit->{TREE}, $path + or die "cannot read git-ls-tree $commit->{TREE}"; + while (<$fh>) { + chomp; + if (/^[0-7]{6} blob ([0-9a-f]{40}) (.*)$/) { + if ($2 ne $path) { + die "$2 ne $path???"; + } + $result = $1; + last; + } + } + close $fh + or die "failure while closing pipe to git-ls-tree"; + return $result; +} + +package Git::Blame; +sub new { + my $class = shift; + my $self = bless { + LINE => [], + UNKNOWN => undef, + WORK => [], + }, $class; + my $commit = shift; + my $filename = shift; + my $sha1 = $commit->find_file($filename); + my $blob = ::read_blob($sha1); + my @blob = (split(/\n/, $blob)); + for (my $i = 0; $i < @blob; $i++) { + $self->{LINE}[$i] = +{ + COMMIT => $commit, + FOUND => undef, + FILENAME => $filename, + LINENO => ($i + 1), + }; + } + $self->{UNKNOWN} = scalar @blob; + push @{$self->{WORK}}, [$commit, $filename]; + return $self; +} + +sub print { + my $self = shift; + my $line_termination = shift; + for (my $i = 0; $i < @{$self->{LINE}}; $i++) { + my $l = $self->{LINE}[$i]; + print ($l->{FOUND} ? ':' : '?');; + print "$l->{COMMIT}->{SHA1} "; + print "$l->{COMMIT}->{AUTHOR} "; + print "$l->{COMMIT}->{COMMITTER} "; + print "$l->{LINENO} $l->{FILENAME}"; + print $line_termination; + } +} + +sub take_responsibility { + my ($self, $commit) = @_; + for (my $i = 0; $i < @{$self->{LINE}}; $i++) { + my $l = $self->{LINE}[$i]; + if (! $l->{FOUND} && ($l->{COMMIT}->{SHA1} eq $commit->{SHA1})) { + $l->{FOUND} = 1; + $self->{UNKNOWN}--; + } + } +} + +sub blame_parent { + my ($self, $commit, $parent, $filename) = @_; + my @diff = ::read_diff_raw($parent->{SHA1}, $filename); + my $filename_in_parent; + my $passed_blame_to_parent = undef; + if (@diff == 0) { + # We have not touched anything. Blame parent for everything + # that we are suspected for. + for (my $i = 0; $i < @{$self->{LINE}}; $i++) { + my $l = $self->{LINE}[$i]; + if (! $l->{FOUND} && ($l->{COMMIT}->{SHA1} eq $commit->{SHA1})) { + $l->{COMMIT} = $parent; + $passed_blame_to_parent = 1; + } + } + $filename_in_parent = $filename; + } + elsif (@diff != 1) { + # This should not happen. + for (@diff) { + print "** @$_\n"; + } + die "Oops"; + } + else { + my ($status, $sha1_1, $sha1_2, $file1, $file2) = @{$diff[0]}; + print STDERR "** $status $file1 $file2\n" if $::debug; + if ($status =~ /N/) { + # Either some of other parents created it, or we did. + # At this point the only thing we know is that this + # parent is not responsible for it. + ; + } + else { + my $patch = Git::Patch->new($sha1_1, $sha1_2); + $filename_in_parent = $file1; + for (my $i = 0; $i < @{$self->{LINE}}; $i++) { + my $l = $self->{LINE}[$i]; + if (! $l->{FOUND} && $l->{COMMIT}->{SHA1} eq $commit->{SHA1}) { + # We are suspected to have introduced this line. + # Does it exist in the parent? + my $lineno = $l->{LINENO}; + my $parent_line = $patch->find_parent_line($lineno); + if ($parent_line < 0) { + # No, we may be the guilty ones, or some other + # parent might be. We do not assign blame to + # ourselves here yet. + ; + } + else { + # This line is coming from the parent, so pass + # blame to it. + $l->{COMMIT} = $parent; + $l->{FILENAME} = $file1; + $l->{LINENO} = $parent_line; + $passed_blame_to_parent = 1; + } + } + } + } + } + if ($passed_blame_to_parent && $self->{UNKNOWN}) { + unshift @{$self->{WORK}}, + [$parent, $filename_in_parent]; + } +} + +sub assign { + my ($self, $commit, $filename) = @_; + # We do read-tree of the current commit and diff-cache + # with each parents, instead of running diff-tree. This + # is because diff-tree does not look for copies hard enough. + # + print STDERR "* read-tree $commit->{SHA1}\n" if $::debug; + system('git-read-tree', '-m', $commit->{SHA1}); + for my $parent (@{$commit->{PARENT}}) { + $self->blame_parent($commit, Git::Commit->new($parent), $filename); + } + $self->take_responsibility($commit); +} + +sub assign_blame { + my ($self) = @_; + while ($self->{UNKNOWN} && @{$self->{WORK}}) { + my $wk = shift @{$self->{WORK}}; + my ($commit, $filename) = @$wk; + $self->assign($commit, $filename); + } +} + + + +################################################################ +package main; +my $usage = "blame [-z] <commit> filename"; +my $line_termination = "\n"; + +$::ENV{GIT_INDEX_FILE} = "/tmp/blame-$$-index"; +unlink($::ENV{GIT_INDEX_FILE}); + +if ($ARGV[0] eq '-z') { + $line_termination = "\0"; + shift; +} + +if (@ARGV != 2) { + die $usage; +} + +my $head_commit = Git::Commit->new($ARGV[0]); +my $filename = $ARGV[1]; +my $blame = Git::Blame->new($head_commit, $filename); + +$blame->assign_blame(); +$blame->print($line_termination); + +unlink($::ENV{GIT_INDEX_FILE}); + +__END__ + +How does this work, and what do we do about merges? + +The algorithm considers that the first parent is our main line of +development and treats it somewhat special than other parents. So we +pass on the blame to the first parent if a line has not changed from +it. For lines that have changed from the first parent, we must have +either inherited that change from some other parent, or it could have +been merge conflict resolution edit we did on our own. + +The following picture illustrates how we pass on and assign blames. + +In the sample, the original O was forked into A and B and then merged +into M. Line 1, 2, and 4 did not change. Line 3 and 5 are changed in +A, and Line 5 and 6 are changed in B. M made its own decision to +resolve merge conflicts at Line 5 to something different from A and B: + + A: 1 2 T 4 T 6 + / \ +O: 1 2 3 4 5 6 M: 1 2 T 4 M S + \ / + B: 1 2 3 4 S S + +In the following picture, each line is annotated with a blame letter. +A lowercase blame (e.g. "a" for "1") means that commit or its ancestor +is the guilty party but we do not know which particular ancestor is +responsible for the change yet. An uppercase blame means that we know +that commit is the guilty party. + +First we look at M (the HEAD) and initialize Git::Blame->{LINE} like +this: + + M: 1 2 T 4 M S + m m m m m m + +That is, we know all lines are results of modification made by some +ancestor of M, so we assign lowercase 'm' to all of them. + +Then we examine our first parent A. Throughout the algorithm, we are +always only interested in the lines we are the suspect, but this being +the initial round, we are the suspect for all of them. We notice that +1 2 T 4 are the same as the parent A, so we pass the blame for these +four lines to A. M and S are different from A, so we leave them as +they are (note that we do not immediately take the blame for them): + + M: 1 2 T 4 M S + a a a a m m + +Next we go on to examine parent B. Again, we are only interested in +the lines we are still the suspect (i.e. M and S). We notice S is +something we inherited from B, so we pass the blame on to it, like +this: + + M: 1 2 T 4 M S + a a a a m b + +Once we exhausted the parents, we look at the results and take +responsibility for the remaining ones that we are still the suspect: + + M: 1 2 T 4 M S + a a a a M b + +We are done with M. And we know commits A and B need to be examined +further, so we do them recursively. When we look at A, we again only +look at the lines that A is the suspect: + + A: 1 2 T 4 T 6 + a a a a M b + +Among 1 2 T 4, comparing against its parent O, we notice 1 2 4 are +the same so pass the blame for those lines to O: + + A: 1 2 T 4 T 6 + o o a o M b + +A is a non-merge commit; we have already exhausted the parents and +take responsibility for the remaining ones that A is the suspect: + + A: 1 2 T 4 T 6 + o o A o M b + +We go on like this and the final result would become: + + O: 1 2 3 4 5 6 + O O A O M B ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 2:11 ` Junio C Hamano @ 2005-06-01 2:25 ` David Lang 2005-06-01 4:53 ` Junio C Hamano 0 siblings, 1 reply; 64+ messages in thread From: David Lang @ 2005-06-01 2:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Chris Wedgwood, Linus Torvalds, Git Mailing List On Tue, 31 May 2005, Junio C Hamano wrote: >>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: > > LT> On Mon, 30 May 2005, Chris Wedgwood wrote: >>> >>> I'm still at a loss how to do the equivalent of annotate. I know a >>> couple of front ends can do this but I have no idea what command line >>> magic would be equivalent. > > LT> There isn't any. It's actually pretty nasty to do, following history > LT> backwards and keeping track of lines as they are added. I know how, I'm > LT> just really lazy and hoping somebody else will do it, since I really end > LT> up not caring that much myself. > > LT> I notice that Thomas Gleixner seems to have one, but that one is based on > LT> a database, and doesn't look usable as a standalone command.. > > Here is my quick-and-dirty one done in Perl. This is dog-slow > and not suited for interactive use, but its algorithm should > handle the merges, renames and complete rewrites correctly. Hmm, thinking out loud. would it help to look at the deltify scripts and let them find the major chunks and then look in detail only when that fails? David Lang ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 2:25 ` David Lang @ 2005-06-01 4:53 ` Junio C Hamano 2005-06-01 20:06 ` David Lang 0 siblings, 1 reply; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 4:53 UTC (permalink / raw) To: David Lang; +Cc: Chris Wedgwood, Linus Torvalds, Git Mailing List >>>>> "DL" == David Lang <david.lang@digitalinsight.com> writes: DL> Hmm, thinking out loud. would it help to look at the deltify scripts DL> and let them find the major chunks and then look in detail only when DL> that fails? It's unclear to me which part you are trying to help with deltify algorithm [*1*]. Internally, git-diff-cache -B -C is used which does use the deltify to locate complete rewrites, renames and copies (that's why the script is so slow). For passing on and assigning blames line by line, parsing "diff --unified=0" output was a lot easier for this script and that was what I did in this quick-and-dirty version. [Footnotes] *1* David says "deltify" and Nico calls it "deltafy". I am not a native speaker so I cannot tell, but which one is correct? ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 4:53 ` Junio C Hamano @ 2005-06-01 20:06 ` David Lang 2005-06-01 20:16 ` C. Scott Ananian 2005-06-01 23:03 ` Junio C Hamano 0 siblings, 2 replies; 64+ messages in thread From: David Lang @ 2005-06-01 20:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: Chris Wedgwood, Linus Torvalds, Git Mailing List On Tue, 31 May 2005, Junio C Hamano wrote: >>>>>> "DL" == David Lang <david.lang@digitalinsight.com> writes: > > DL> Hmm, thinking out loud. would it help to look at the deltify scripts > DL> and let them find the major chunks and then look in detail only when > DL> that fails? > > It's unclear to me which part you are trying to help with > deltify algorithm [*1*]. I was thinking that the speedups (only look for similar sized files, etc) would help narrow the search. Also each chunk that's different should be able to be able to be annotated as a chunk, instead of by individual line > Internally, git-diff-cache -B -C is used which does use the > deltify to locate complete rewrites, renames and copies (that's > why the script is so slow). For passing on and assigning blames > line by line, parsing "diff --unified=0" output was a lot easier > for this script and that was what I did in this quick-and-dirty > version. I was under the impressin that the deltafy stuff was significantly faster then you are suggeting that it is here > [Footnotes] > > *1* David says "deltify" and Nico calls it "deltafy". I am not > a native speaker so I cannot tell, but which one is correct? Nico is correct David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 20:06 ` David Lang @ 2005-06-01 20:16 ` C. Scott Ananian 2005-06-02 0:43 ` Nicolas Pitre 2005-06-01 23:03 ` Junio C Hamano 1 sibling, 1 reply; 64+ messages in thread From: C. Scott Ananian @ 2005-06-01 20:16 UTC (permalink / raw) To: David Lang Cc: Junio C Hamano, Chris Wedgwood, Linus Torvalds, Git Mailing List On Wed, 1 Jun 2005, David Lang wrote: >> *1* David says "deltify" and Nico calls it "deltafy". I am not >> a native speaker so I cannot tell, but which one is correct? > > Nico is correct Au contraire. The common *pronunciation* may be 'delta-fy', but the correct spelling should be 'deltify'. The google oracle agrees (1,440 vs 54) as does the spelling of the svnadmin command. (Of course, what google is really measuring is relative frequency of 'git' vs 'svn'.) $ grep '[^if]fy$' /usr/dict/american-english-large shows that the only vowels other than 'i' which preced the '-fy' morpheme are 'e's, and they only appear in words like 'liquefy' where the root has been substantially altered. Most sources (eg http://www.southampton.liunet.edu/academic/pau/course/websuf.htm#IFYVERB ) list the morpheme as '-ify'. See http://m-w.com/cgi-bin/dictionary?book=Dictionary&va=ify and compare http://m-w.com/cgi-bin/dictionary?book=Dictionary&va=fy Contrary to David's assertion, David is right. --scott United Nations KMPLEBE AMTHUG AVBRANDY UNIFRUIT chemical agent tonight ZPSEMANTIC ODYOKE struggle PBCABOOSE FJDEFLECT CLOWER MKSEARCH ZRBRIEF ( http://cscott.net/ ) ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 20:16 ` C. Scott Ananian @ 2005-06-02 0:43 ` Nicolas Pitre 2005-06-02 1:14 ` Brian O'Mahoney 0 siblings, 1 reply; 64+ messages in thread From: Nicolas Pitre @ 2005-06-02 0:43 UTC (permalink / raw) To: C. Scott Ananian Cc: David Lang, Junio C Hamano, Chris Wedgwood, Linus Torvalds, Git Mailing List On Wed, 1 Jun 2005, C. Scott Ananian wrote: > On Wed, 1 Jun 2005, David Lang wrote: > > > > *1* David says "deltify" and Nico calls it "deltafy". I am not > > > a native speaker so I cannot tell, but which one is correct? > > > > Nico is correct > > Au contraire. The common *pronunciation* may be 'delta-fy', but the correct > spelling should be 'deltify'. Ainsi soit-il alors. I'm not a native english speaker either so I defer to anyone with better english knowledge. Nicolas ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-02 0:43 ` Nicolas Pitre @ 2005-06-02 1:14 ` Brian O'Mahoney 0 siblings, 0 replies; 64+ messages in thread From: Brian O'Mahoney @ 2005-06-02 1:14 UTC (permalink / raw) To: Nicolas Pitre Cc: C. Scott Ananian, David Lang, Junio C Hamano, Chris Wedgwood, Linus Torvalds, Git Mailing List Neither are _correct_, both are slang and a new word, but English is good at that, represent via a delta, would be traditional, but 'deltify' sounds nicer. Nicolas Pitre wrote: > On Wed, 1 Jun 2005, C. Scott Ananian wrote: > > >>On Wed, 1 Jun 2005, David Lang wrote: >> >> >>>>*1* David says "deltify" and Nico calls it "deltafy". I am not >>>>a native speaker so I cannot tell, but which one is correct? >>> >>>Nico is correct >> >>Au contraire. The common *pronunciation* may be 'delta-fy', but the correct >>spelling should be 'deltify'. > > > Ainsi soit-il alors. > > I'm not a native english speaker either so I defer to anyone with better > english knowledge. > > > Nicolas > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- mit freundlichen Grüßen, Brian. Dr. Brian O'Mahoney Mobile +41 (0)79 334 8035 Email: omb@bluewin.ch Bleicherstrasse 25, CH-8953 Dietikon, Switzerland PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D F5 14 39 C9 6D 38 56 D5 ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 20:06 ` David Lang 2005-06-01 20:16 ` C. Scott Ananian @ 2005-06-01 23:03 ` Junio C Hamano 1 sibling, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 23:03 UTC (permalink / raw) To: David Lang; +Cc: Chris Wedgwood, Linus Torvalds, Git Mailing List >>>>> "DL" == David Lang <david.lang@digitalinsight.com> writes: >> Internally, git-diff-cache -B -C is used which does use the >> deltify to locate complete rewrites, renames and copies (that's >> why the script is so slow). For passing on and assigning blames >> line by line, parsing "diff --unified=0" output was a lot easier >> for this script and that was what I did in this quick-and-dirty >> version. DL> I was under the impressin that the deltafy stuff was significantly DL> faster then you are suggeting that it is here I perhaps phrased it poorly. The slow part is not a single delta operation, but having to run many delta operations between all combinations of rename/copy candidates, which is O(n * m) where n is the number of newly created files (counting "broken" ones created by -B flag) and m is the number of (deleted, modified and unmodified) files in the original tree. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds ` (7 preceding siblings ...) 2005-05-30 22:32 ` Chris Wedgwood @ 2005-05-31 0:19 ` Petr Baudis 2005-05-31 13:45 ` Eric W. Biederman 2005-06-02 19:43 ` CVS migration section to the tutorial Junio C Hamano 10 siblings, 0 replies; 64+ messages in thread From: Petr Baudis @ 2005-05-31 0:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Dear diary, on Mon, May 30, 2005 at 10:00:42PM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > Ok, I'm at the point where I really think it's getting close to a 1.0, and > make another tar-ball etc. I obviously feel that it's already way superior > to CVS, but I also realize that somebody who is used to CVS may not > actually realize that very easily. Can we (well, me) count on the output format of the git commands being stabilized now and not change in a backwards-incompatible way from now on? I would like to finally remove the git itself from Cogito, but for that I have to be able to rely on the fact that as long as the user has git version >=N, it will work (assuming that Cogito is bugless ;-). > So before I do a 1.0 release, I want to write some stupid git tutorial for > a complete beginner that has only used CVS before, with a real example of > how to use raw git, and along those lines I actually want the thing to > show how to do something useful. Is there actually much point in using raw git directly? You don't usually invoke the syscalls directly from the user programs either (and you usually actually use stdio for the casual stuff). I guess the raw git usage can get quite long and tiresome sometimes. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds ` (8 preceding siblings ...) 2005-05-31 0:19 ` Petr Baudis @ 2005-05-31 13:45 ` Eric W. Biederman 2005-06-01 3:04 ` Linus Torvalds 2005-06-02 19:43 ` CVS migration section to the tutorial Junio C Hamano 10 siblings, 1 reply; 64+ messages in thread From: Eric W. Biederman @ 2005-05-31 13:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds <torvalds@osdl.org> writes: > Ok, I'm at the point where I really think it's getting close to a 1.0, and > make another tar-ball etc. I obviously feel that it's already way superior > to CVS, but I also realize that somebody who is used to CVS may not > actually realize that very easily. I way behind the power curve on learning git at this point but one piece of the puzzle that CVS has that I don't believe git does are multiple people committing to the same repository, especially remotely. I don't see that as a down side of git but it is a common way people CVS so it is worth documenting. Eric ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-05-31 13:45 ` Eric W. Biederman @ 2005-06-01 3:04 ` Linus Torvalds 2005-06-01 4:06 ` Junio C Hamano ` (5 more replies) 0 siblings, 6 replies; 64+ messages in thread From: Linus Torvalds @ 2005-06-01 3:04 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Git Mailing List On Tue, 31 May 2005, Eric W. Biederman wrote: > > I way behind the power curve on learning git at this point but > one piece of the puzzle that CVS has that I don't believe git does > are multiple people committing to the same repository, especially > remotely. I don't see that as a down side of git but it is a common > way people CVS so it is worth documenting. It's actually one thing git doesn't do per se. You have to do a "git-pull-script" from the common repository side, there's no "git-push-script". Ugly. Anyway, I wrote just a _very_ introductory thing in Documentation/tutorial.txt, I'll try to update and expand on it later. It basically has a really stupid example of "how to set up a new project". Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 3:04 ` Linus Torvalds @ 2005-06-01 4:06 ` Junio C Hamano 2005-06-02 23:54 ` [PATCH] Fix -B "very-different" logic Junio C Hamano 2005-06-01 6:28 ` I want to release a "git-1.0" Junio C Hamano ` (4 subsequent siblings) 5 siblings, 1 reply; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 4:06 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> Anyway, I wrote just a _very_ introductory thing in LT> Documentation/tutorial.txt, I'll try to update and expand on LT> it later. It basically has a really stupid example of "how LT> to set up a new project". I've spotted a couple of typos which I will leave others to fix, but there is one thing I am to blame. (Btw, current versions of git will consider the change in question to be so big that it's considered a whole new file, since the diff is actually bigger than the file. So the helpful comments that git-commit-script tells you for this example will say that you deleted and re-created the file "a". For a less contrieved example, these things are usually more obvious). Do you want me to do something about this with -B (and possibly -C/-M), like skipping the comparison altogether if the file size is smaller than, say, 1k bytes or something silly like that? Or not having special case for this kind of "contrived example" preferrable? ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH] Fix -B "very-different" logic. 2005-06-01 4:06 ` Junio C Hamano @ 2005-06-02 23:54 ` Junio C Hamano 2005-06-03 0:21 ` Linus Torvalds 0 siblings, 1 reply; 64+ messages in thread From: Junio C Hamano @ 2005-06-02 23:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List >>>>> "JCH" == Junio C Hamano <junkio@cox.net> writes: > (Btw, current versions of git will consider the change > in question to be so big that it's considered a whole > new file, since the diff is actually bigger than the > file. JCH> Do you want me to do something about this with -B (and possibly JCH> -C/-M), like skipping the comparison altogether if the file size JCH> is smaller than, say, 1k bytes or something silly like that? Or JCH> not having special case for this kind of "contrived example" JCH> preferrable? I was looking at the -B code. The reason it thinks change is too big is because xdelta tells us to reconstruct the destination by all new literal bytes in this small string case. There is not much I can do about it. However I think the diffcore-break algorithm itself was basing its "very_different" computation on numbers somewhat bogus. It was counting newly inserted bytes into account, but amount of those bytes should not make any difference when determining if the change is a complete rewrite. I suspect that -M/-C heuristics has similar (if not the same) issues, but I would like to address that separately. Here is a proposed fix for -B. It also tells diffcore-break not to break a file smaller than 400 bytes. I did not make this number configurable, since that would be too many knobs to tweak. If somebody feels strong enough about it, it can be made into an option later, but for now that size "feels" reasonable. -- >8 -- cut here -- >8 -- ------------ What we are interested in here is how much the original source material remains in the final result, and it does not really matter how much new contents are added as part of the edit. If you remove 97 lines from an original 100-line document, it does not matter if you add 47 lines of your own to make a 50-line document, or if you add 997 lines to make a 1000-line document. Either way, you did a complete rewrite. Earlier code counted both new material and deletions to detect complete rewrites. This patch fixes it. With its default setting, it detects three such complete rewrites in the core-GIT repository. Signed-off-by: Junio C Hamano <junkio@cox.net> --- count-delta.h | 1 + diffcore.h | 4 ++- count-delta.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ diffcore-break.c | 55 +++++++++++++++--------------------------- 4 files changed, 92 insertions(+), 38 deletions(-) diff --git a/count-delta.h b/count-delta.h --- a/count-delta.h +++ b/count-delta.h @@ -5,5 +5,6 @@ #define COUNT_DELTA_H unsigned long count_delta(void *, unsigned long); +unsigned long count_excluded_source_material(void *, unsigned long); #endif diff --git a/diffcore.h b/diffcore.h --- a/diffcore.h +++ b/diffcore.h @@ -10,9 +10,9 @@ */ #define MAX_SCORE 60000 #define DEFAULT_RENAME_SCORE 30000 /* rename/copy similarity minimum (50%) */ -#define DEFAULT_BREAK_SCORE 59400 /* minimum for break to happen (99%)*/ +#define DEFAULT_BREAK_SCORE 48000 /* minimum for break to happen (80%) */ -#define RENAME_DST_MATCHED 01 +#define DIFF_MINIMUM_BREAK 400 /* minimum size of source that -B breaks */ struct diff_filespec { unsigned char sha1[20]; diff --git a/count-delta.c b/count-delta.c --- a/count-delta.c +++ b/count-delta.c @@ -93,3 +93,73 @@ unsigned long count_delta(void *delta_bu return 0; return (src_size - copied_from_source) + added_literal; } + + +/* + * What we are interested in here is how much the original source + * material remains in the final result, and it does not really matter + * how much new contents are added as part of the edit. If you remove + * 97 lines from an original 100-line document, it does not matter if + * you add 47 lines of your own to make a 50-line document, or if you + * add 997 lines to make a 1000-line document. Either way, you did a + * complete rewrite. + * + * Note. We do not interprete delta fully. Instead, we look at xdelta + * instructions that copy bytes from the source, and count those copied + * bytes. Subtracting this number from the original source size yields + * the number of bytes not used from the source material. In the above + * example, this number corresponds to 97-line (but we count in bytes). + */ +unsigned long count_excluded_source_material(void *delta_buf, + unsigned long delta_size) +{ + unsigned long copied_from_source; + const unsigned char *data, *top; + unsigned char cmd; + unsigned long src_size, dst_size, out; + + /* the smallest delta size possible is 6 bytes */ + if (delta_size < 6) + return UINT_MAX; + + data = delta_buf; + top = delta_buf + delta_size; + + src_size = get_hdr_size(&data); + dst_size = get_hdr_size(&data); + + copied_from_source = out = 0; + while (data < top) { + cmd = *data++; + if (cmd & 0x80) { + unsigned long cp_off = 0, cp_size = 0; + if (cmd & 0x01) cp_off = *data++; + if (cmd & 0x02) cp_off |= (*data++ << 8); + if (cmd & 0x04) cp_off |= (*data++ << 16); + if (cmd & 0x08) cp_off |= (*data++ << 24); + if (cmd & 0x10) cp_size = *data++; + if (cmd & 0x20) cp_size |= (*data++ << 8); + if (cp_size == 0) cp_size = 0x10000; + + if (cmd & 0x40) + /* copy from dst */ + ; + else + copied_from_source += cp_size; + out += cp_size; + } else { + /* write literal into dst */ + out += cmd; + data += cmd; + } + } + + /* sanity check */ + if (data != top || out != dst_size) + return UINT_MAX; + + if (src_size < copied_from_source) + /* we ended up overcounting and underflowed; I dunno why */ + return 0; + return src_size - copied_from_source; +} diff --git a/diffcore-break.c b/diffcore-break.c --- a/diffcore-break.c +++ b/diffcore-break.c @@ -13,63 +13,46 @@ static int very_different(struct diff_fi { /* dst is recorded as a modification of src. Are they so * different that we are better off recording this as a pair - * of delete and create? min_score is the minimum amount of - * new material that must exist in the dst and not in src for - * the pair to be considered a complete rewrite, and recommended - * to be set to a very high value, 99% or so. + * of delete and create? * - * The value we return represents the amount of new material - * that is in dst and not in src. We return 0 when we do not - * want to get the filepair broken. + * We base the score on the amount of material originally from + * src that still remains in the dst. If src was 100-line + * file among which only 3-line remains in the dst, then it is + * a complete rewrite with 97% "change", and it does not + * matter if the resulting file is a 15-line file or a + * 2000-line file. On the other hand, if 40-line remains + * among those 100-lines, even if the resulting file is a + * 2000-lines file, it still is an edit with 60% "change", + * which may sound counter-intuitive at first but that is the + * right number to use. */ + void *delta; - unsigned long delta_size, base_size; + unsigned long delta_size; if (!S_ISREG(src->mode) || !S_ISREG(dst->mode)) return 0; /* leave symlink rename alone */ - if (diff_populate_filespec(src, 1) || diff_populate_filespec(dst, 1)) - return 0; /* error but caught downstream */ - - delta_size = ((src->size < dst->size) ? - (dst->size - src->size) : (src->size - dst->size)); - - /* Notice that we use max of src and dst as the base size, - * unlike rename similarity detection. This is so that we do - * not mistake a large addition as a complete rewrite. - */ - base_size = ((src->size < dst->size) ? dst->size : src->size); - - /* - * If file size difference is too big compared to the - * base_size, we declare this a complete rewrite. - */ - if (base_size * min_score < delta_size * MAX_SCORE) - return MAX_SCORE; - if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0)) return 0; /* error but caught downstream */ + if (src->size < DIFF_MINIMUM_BREAK) + return 0; /* Too small to consider breaking */ + delta = diff_delta(src->data, src->size, dst->data, dst->size, &delta_size); - /* A delta that has a lot of literal additions would have - * big delta_size no matter what else it does. - */ - if (base_size * min_score < delta_size * MAX_SCORE) - return MAX_SCORE; - /* Estimate the edit size by interpreting delta. */ - delta_size = count_delta(delta, delta_size); + delta_size = count_excluded_source_material(delta, delta_size); free(delta); if (delta_size == UINT_MAX) return 0; /* error in delta computation */ - if (base_size < delta_size) + if (src->size < delta_size) return MAX_SCORE; - return delta_size * MAX_SCORE / base_size; + return delta_size * MAX_SCORE / src->size; } void diffcore_break(int min_score) ------------ ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] Fix -B "very-different" logic. 2005-06-02 23:54 ` [PATCH] Fix -B "very-different" logic Junio C Hamano @ 2005-06-03 0:21 ` Linus Torvalds 2005-06-03 1:33 ` Junio C Hamano 0 siblings, 1 reply; 64+ messages in thread From: Linus Torvalds @ 2005-06-03 0:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List On Thu, 2 Jun 2005, Junio C Hamano wrote: > > However I think the diffcore-break algorithm itself was basing > its "very_different" computation on numbers somewhat bogus. It > was counting newly inserted bytes into account, but amount of > those bytes should not make any difference when determining if > the change is a complete rewrite. Careful. I think the amount of new code _should_ matter. Otherwise, an old empty file would always be considered the source of a new file, since the diff doesn't remove anything. Similarly, just because we have a boilerplate file shouldn't make that always be considered a "wonderful source", when people add the real meat to it. So I think you're on the right track, but I don't think you should entirely dismiss "lots of stuff added" as a reason for a "break". I think that if the new stuff is _much_ larger than the old stuff, it might as well be considered a rewrite. In particular, let's say that I used to have two files: a.c - small helper functions b.c - the "meat" of the thing and I end up deciding that I might as well collapse it all into one file, a.c. What happens? There's almost no deletes from a.c, but there's a lot of new code in it. Wouldn't it be _better_ if you considered the new "a.c" a new file, so that you might notice that it's actually _closer_ to the old removed "b.c" than the old "a.c"? See what I'm saying? Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] Fix -B "very-different" logic. 2005-06-03 0:21 ` Linus Torvalds @ 2005-06-03 1:33 ` Junio C Hamano 2005-06-03 8:32 ` [PATCH 0/4] " Junio C Hamano 0 siblings, 1 reply; 64+ messages in thread From: Junio C Hamano @ 2005-06-03 1:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> Careful. LT> I think the amount of new code _should_ matter. Otherwise, an old empty LT> file would always be considered the source of a new file, since the diff LT> doesn't remove anything. Similarly, just because we have a boilerplate LT> file shouldn't make that always be considered a "wonderful source", when LT> people add the real meat to it. Yes, I agree that rename/copy logic should use different heuristics from the one I proposed for breaking. It is my assumption that people in practice tend to make only small edits after a rename/copy just to adjust things like: - filenames mentioned in the comment of the file itself, - include paths that refer other files if the file was moved/copied from a different directory, - names of functions and variables. and making sure there would not be too much new stuff is quite useful to detect rename/copy source correctly as the current similarity estimator in diffcore-rename does. I do not intend to touch that. The boilderplate example you mention is a very good reason not to dismiss the amount of new material when doing rename/copy detection. LT> In particular, let's say that I used to have two files: LT> a.c - small helper functions LT> b.c - the "meat" of the thing LT> and I end up deciding that I might as well collapse it all into one file, LT> a.c. What happens? There's almost no deletes from a.c, but there's a lot LT> of new code in it. LT> See what I'm saying? Yes. I think I do. When git-diff-tree -B -C runs your example, it feeds diffcore with these: :100644 100644 sha1-a-helper-only sha1-a-and-meat M a.c :100644 000000 sha1-b-stale-meat 0{40} D b.c The ideal diffcore-break breaks a.c because it looks at insertions as well: :100644 000000 sha1-a-helper-only 0{40} D a.c :000000 100644 0{40} sha1-a-and-meat N a.c :100644 000000 sha1-b-stale-meat 0{40} D b.c Then diffcore-rename notices that sha1-b-stale-meat is better match than sha1-a-helper-only to produce sha1-a-and-meat, and resolves the above to: :100644 100644 sha1-b-stale-meat sha1-a-and-meat R b.c a.c Up to this point is just a demonstration that I see your point. But I still want to keep the example I gave in the original commit message. Suppose you did not have b.c file under version control, and did the same operation. I.e. a.c acquired a lot of good stuff. git-diff-tree -B -C feeds: :100644 100644 sha1-a-helper-only sha1-a-and-meat M a.c which is broken into: :100644 000000 sha1-a-helper-only 0{40} D a.c :000000 100644 0{40} sha1-a-and-meat N a.c Unfortunately, in this case nobody absorbs these pairs. I want to allow you to add 1000 lines of new stuff to a file (which was originally 100 lines long) as long as you do not remove too many lines from the original 100 lines without triggering "this is a rewrite" logic in this case. So after rename/copy runs, we need to match these up and merge them back into the original. :100644 100644 sha1-a-helper-only sha1-a-and-meat M a.c We should carry a bit more information about broken entries than we currently do. We would break a pair based on both deletion and insertion, just like the current code (i.e. without the patch you are responding to) does. But when we do break a pair, we need to mark them if the "new" side have enough original source material remaining. If we have such mark to tell us that "these were broken but there are a good chunk of source material remaining", the clean-up phase, to run after diffcore-rename finishes, should be able to notice surviving broken pairs and merge them back accordingly. ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 0/4] Fix -B "very-different" logic. 2005-06-03 1:33 ` Junio C Hamano @ 2005-06-03 8:32 ` Junio C Hamano 2005-06-03 8:36 ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano ` (3 more replies) 0 siblings, 4 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-03 8:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List I am sending the following four patch series: [PATCH 1/4] Tweak count-delta interface [PATCH 2/4] diff: Fix docs and add -O to diff-helper. [PATCH 3/4] diff: Clean up diff_scoreopt_parse(). [PATCH 4/4] diff: Update -B heuristics. The first three are preparations and cleanups I found necessary while I was working on the last one, which is the gem of this series. It addresses the concerns you raised in your message "Careful." while keeping the semantics I wanted to have "if you keep 97 lines out of original 100-line document, it does not matter if the end result is a 110-line or 1000-line document. You did not do a rewrite." You may have to remove the warning about git-status with this change, though. ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 1/4] Tweak count-delta interface 2005-06-03 8:32 ` [PATCH 0/4] " Junio C Hamano @ 2005-06-03 8:36 ` Junio C Hamano 2005-06-03 8:36 ` [PATCH 2/4] diff: Fix docs and add -O to diff-helper Junio C Hamano ` (2 subsequent siblings) 3 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-03 8:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Make it return copied source and insertion separately, so that later implementation of heuristics can use them more flexibly. This does not change the heuristics implemented in diffcore-rename nor diffcore-break in any way. Signed-off-by: Junio C Hamano <junkio@cox.net> --- count-delta.h | 3 ++- diffcore.h | 2 -- count-delta.c | 30 ++++++++++++++++-------------- diffcore-break.c | 15 +++++++++++---- diffcore-rename.c | 15 +++++++++++---- 5 files changed, 40 insertions(+), 25 deletions(-) diff --git a/count-delta.h b/count-delta.h --- a/count-delta.h +++ b/count-delta.h @@ -4,6 +4,7 @@ #ifndef COUNT_DELTA_H #define COUNT_DELTA_H -unsigned long count_delta(void *, unsigned long); +int count_delta(void *, unsigned long, + unsigned long *src_copied, unsigned long *literal_added); #endif diff --git a/diffcore.h b/diffcore.h --- a/diffcore.h +++ b/diffcore.h @@ -12,8 +12,6 @@ #define DEFAULT_RENAME_SCORE 30000 /* rename/copy similarity minimum (50%) */ #define DEFAULT_BREAK_SCORE 59400 /* minimum for break to happen (99%)*/ -#define RENAME_DST_MATCHED 01 - struct diff_filespec { unsigned char sha1[20]; char *path; diff --git a/count-delta.c b/count-delta.c --- a/count-delta.c +++ b/count-delta.c @@ -29,15 +29,18 @@ static unsigned long get_hdr_size(const /* * NOTE. We do not _interpret_ delta fully. As an approximation, we * just count the number of bytes that are copied from the source, and - * the number of literal data bytes that are inserted. Number of - * bytes that are _not_ copied from the source is deletion, and number - * of inserted literal bytes are addition, so sum of them is what we - * return. xdelta can express an edit that copies data inside of the - * destination which originally came from the source. We do not count - * that in the following routine, so we are undercounting the source - * material that remains in the final output that way. + * the number of literal data bytes that are inserted. + * + * Number of bytes that are _not_ copied from the source is deletion, + * and number of inserted literal bytes are addition, so sum of them + * is the extent of damage. xdelta can express an edit that copies + * data inside of the destination which originally came from the + * source. We do not count that in the following routine, so we are + * undercounting the source material that remains in the final output + * that way. */ -unsigned long count_delta(void *delta_buf, unsigned long delta_size) +int count_delta(void *delta_buf, unsigned long delta_size, + unsigned long *src_copied, unsigned long *literal_added) { unsigned long copied_from_source, added_literal; const unsigned char *data, *top; @@ -46,7 +49,7 @@ unsigned long count_delta(void *delta_bu /* the smallest delta size possible is 6 bytes */ if (delta_size < 6) - return UINT_MAX; + return -1; data = delta_buf; top = delta_buf + delta_size; @@ -83,13 +86,12 @@ unsigned long count_delta(void *delta_bu /* sanity check */ if (data != top || out != dst_size) - return UINT_MAX; + return -1; /* delete size is what was _not_ copied from source. * edit size is that and literal additions. */ - if (src_size + added_literal < copied_from_source) - /* we ended up overcounting and underflowed */ - return 0; - return (src_size - copied_from_source) + added_literal; + *src_copied = copied_from_source; + *literal_added = added_literal; + return 0; } diff --git a/diffcore-break.c b/diffcore-break.c --- a/diffcore-break.c +++ b/diffcore-break.c @@ -23,7 +23,7 @@ static int very_different(struct diff_fi * want to get the filepair broken. */ void *delta; - unsigned long delta_size, base_size; + unsigned long delta_size, base_size, src_copied, literal_added; if (!S_ISREG(src->mode) || !S_ISREG(dst->mode)) return 0; /* leave symlink rename alone */ @@ -61,10 +61,17 @@ static int very_different(struct diff_fi return MAX_SCORE; /* Estimate the edit size by interpreting delta. */ - delta_size = count_delta(delta, delta_size); + if (count_delta(delta, delta_size, &src_copied, &literal_added)) { + free(delta); + return 0; + } free(delta); - if (delta_size == UINT_MAX) - return 0; /* error in delta computation */ + + /* Extent of damage */ + if (src->size + literal_added < src_copied) + delta_size = 0; + else + delta_size = (src->size - src_copied) + literal_added; if (base_size < delta_size) return MAX_SCORE; diff --git a/diffcore-rename.c b/diffcore-rename.c --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -135,7 +135,7 @@ static int estimate_similarity(struct di * call into this function in that case. */ void *delta; - unsigned long delta_size, base_size; + unsigned long delta_size, base_size, src_copied, literal_added; int score; /* We deal only with regular files. Symlink renames are handled @@ -174,10 +174,17 @@ static int estimate_similarity(struct di return 0; /* Estimate the edit size by interpreting delta. */ - delta_size = count_delta(delta, delta_size); - free(delta); - if (delta_size == UINT_MAX) + if (count_delta(delta, delta_size, &src_copied, &literal_added)) { + free(delta); return 0; + } + free(delta); + + /* Extent of damage */ + if (src->size + literal_added < src_copied) + delta_size = 0; + else + delta_size = (src->size - src_copied) + literal_added; /* * Now we will give some score to it. 100% edit gets 0 points ------------ ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/4] diff: Fix docs and add -O to diff-helper. 2005-06-03 8:32 ` [PATCH 0/4] " Junio C Hamano 2005-06-03 8:36 ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano @ 2005-06-03 8:36 ` Junio C Hamano 2005-06-03 8:37 ` [PATCH 3/4] diff: Clean up diff_scoreopt_parse() Junio C Hamano 2005-06-03 8:40 ` [PATCH 4/4] diff: Update -B heuristics Junio C Hamano 3 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-03 8:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List This patch updates diff documentation and usage strings: - clarify the semantics of -R. It is not "output in reverse"; rather, it is "I will feed diff backwards". Semantically they are different when -C is involved. - describe -O in usage strings of diff-* brothers. It was implemented, documented but not described in usage text. Also it adds -O to diff-helper. Like -S (and unlike -M/-C/-B), this option can work on sanitized diff-raw output produced by the diff-* brothers. While we are at it, the call it makes to diffcore is cleaned up to use the diffcore_std() like everybody else, and the declaration for the low level diffcore routines are moved from diff.h (public) to diffcore.h (private between diff.c and diffcore backends). Signed-off-by: Junio C Hamano <junkio@cox.net> --- Documentation/git-diff-cache.txt | 3 ++- Documentation/git-diff-files.txt | 3 ++- Documentation/git-diff-helper.txt | 5 ++++- Documentation/git-diff-tree.txt | 2 +- diff.h | 10 +--------- diffcore.h | 6 ++++++ diff-cache.c | 2 +- diff-files.c | 2 +- diff-helper.c | 25 ++++++++++++++----------- diff-tree.c | 2 +- 10 files changed, 33 insertions(+), 27 deletions(-) diff --git a/Documentation/git-diff-cache.txt b/Documentation/git-diff-cache.txt --- a/Documentation/git-diff-cache.txt +++ b/Documentation/git-diff-cache.txt @@ -57,7 +57,8 @@ OPTIONS <orderfile>, which has one shell glob pattern per line. -R:: - Output diff in reverse. + Swap two inputs; that is, show differences from cache or + on-disk file to tree contents. --cached:: do not consider the on-disk file at all diff --git a/Documentation/git-diff-files.txt b/Documentation/git-diff-files.txt --- a/Documentation/git-diff-files.txt +++ b/Documentation/git-diff-files.txt @@ -27,7 +27,8 @@ OPTIONS Remain silent even on nonexisting files -R:: - Output diff in reverse. + Swap two inputs; that is, show differences from on-disk files + to cache contents. -B:: Break complete rewrite changes into pairs of delete and create. diff --git a/Documentation/git-diff-helper.txt b/Documentation/git-diff-helper.txt --- a/Documentation/git-diff-helper.txt +++ b/Documentation/git-diff-helper.txt @@ -9,7 +9,7 @@ git-diff-helper - Generates patch format SYNOPSIS -------- -'git-diff-helper' [-z] [-S<string>] +'git-diff-helper' [-z] [-S<string>] [-O<orderfile>] DESCRIPTION ----------- @@ -24,6 +24,9 @@ OPTIONS -S<string>:: Look for differences that contains the change in <string>. +-O<orderfile>:: + Output the patch in the order specified in the + <orderfile>, which has one shell glob pattern per line. See Also -------- diff --git a/Documentation/git-diff-tree.txt b/Documentation/git-diff-tree.txt --- a/Documentation/git-diff-tree.txt +++ b/Documentation/git-diff-tree.txt @@ -43,7 +43,7 @@ OPTIONS Detect copies as well as renames. -R:: - Output diff in reverse. + Swap two input trees. -S<string>:: Look for differences that contains the change in <string>. diff --git a/diff.h b/diff.h --- a/diff.h +++ b/diff.h @@ -35,21 +35,13 @@ extern int diff_scoreopt_parse(const cha #define DIFF_SETUP_REVERSE 1 #define DIFF_SETUP_USE_CACHE 2 #define DIFF_SETUP_USE_SIZE_CACHE 4 + extern void diff_setup(int flags); #define DIFF_DETECT_RENAME 1 #define DIFF_DETECT_COPY 2 -extern void diffcore_rename(int rename_copy, int minimum_score); - #define DIFF_PICKAXE_ALL 1 -extern void diffcore_pickaxe(const char *needle, int opts); - -extern void diffcore_pathspec(const char **pathspec); - -extern void diffcore_order(const char *orderfile); - -extern void diffcore_break(int max_score); extern void diffcore_std(const char **paths, int detect_rename, int rename_score, diff --git a/diffcore.h b/diffcore.h --- a/diffcore.h +++ b/diffcore.h @@ -73,6 +73,12 @@ extern struct diff_filepair *diff_queue( struct diff_filespec *); extern void diff_q(struct diff_queue_struct *, struct diff_filepair *); +extern void diffcore_pathspec(const char **pathspec); +extern void diffcore_break(int); +extern void diffcore_rename(int rename_copy, int); +extern void diffcore_pickaxe(const char *needle, int opts); +extern void diffcore_order(const char *orderfile); + #define DIFF_DEBUG 0 #if DIFF_DEBUG void diff_debug_filespec(struct diff_filespec *, int, const char *); diff --git a/diff-cache.c b/diff-cache.c --- a/diff-cache.c +++ b/diff-cache.c @@ -157,7 +157,7 @@ static void mark_merge_entries(void) } static char *diff_cache_usage = -"git-diff-cache [-p] [-r] [-z] [-m] [-M] [-C] [-R] [-S<string>] [--cached] <tree-ish> [<path>...]"; +"git-diff-cache [-p] [-r] [-z] [-m] [-M] [-C] [-R] [-S<string>] [-O<orderfile>] [--cached] <tree-ish> [<path>...]"; int main(int argc, const char **argv) { diff --git a/diff-files.c b/diff-files.c --- a/diff-files.c +++ b/diff-files.c @@ -7,7 +7,7 @@ #include "diff.h" static const char *diff_files_usage = -"git-diff-files [-p] [-q] [-r] [-z] [-M] [-C] [-R] [-S<string>] [paths...]"; +"git-diff-files [-p] [-q] [-r] [-z] [-M] [-C] [-R] [-S<string>] [-O<orderfile>] [paths...]"; static int diff_output_format = DIFF_FORMAT_HUMAN; static int detect_rename = 0; diff --git a/diff-helper.c b/diff-helper.c --- a/diff-helper.c +++ b/diff-helper.c @@ -7,11 +7,22 @@ static const char *pickaxe = NULL; static int pickaxe_opts = 0; +static const char *orderfile = NULL; static int line_termination = '\n'; static int inter_name_termination = '\t'; +static void flush_them(int ac, const char **av) +{ + diffcore_std(av + 1, + 0, 0, /* no renames */ + pickaxe, pickaxe_opts, + -1, /* no breaks */ + orderfile); + diff_flush(DIFF_FORMAT_PATCH, 0); +} + static const char *diff_helper_usage = - "git-diff-helper [-z] [-S<string>] paths..."; + "git-diff-helper [-z] [-S<string>] [-O<orderfile>] paths..."; int main(int ac, const char **av) { struct strbuf sb; @@ -131,17 +142,9 @@ int main(int ac, const char **av) { new_path); continue; } - if (1 < ac) - diffcore_pathspec(av + 1); - if (pickaxe) - diffcore_pickaxe(pickaxe, pickaxe_opts); - diff_flush(DIFF_FORMAT_PATCH, 0); + flush_them(ac, av); printf(garbage_flush_format, sb.buf); } - if (1 < ac) - diffcore_pathspec(av + 1); - if (pickaxe) - diffcore_pickaxe(pickaxe, pickaxe_opts); - diff_flush(DIFF_FORMAT_PATCH, 0); + flush_them(ac, av); return 0; } diff --git a/diff-tree.c b/diff-tree.c --- a/diff-tree.c +++ b/diff-tree.c @@ -397,7 +397,7 @@ static int diff_tree_stdin(char *line) } static char *diff_tree_usage = -"git-diff-tree [-p] [-r] [-z] [--stdin] [-M] [-C] [-R] [-S<string>] [-m] [-s] [-v] [-t] <tree-ish> <tree-ish>"; +"git-diff-tree [-p] [-r] [-z] [--stdin] [-M] [-C] [-R] [-S<string>] [-O<orderfile>] [-m] [-s] [-v] [-t] <tree-ish> <tree-ish>"; int main(int argc, const char **argv) { ------------ ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 3/4] diff: Clean up diff_scoreopt_parse(). 2005-06-03 8:32 ` [PATCH 0/4] " Junio C Hamano 2005-06-03 8:36 ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano 2005-06-03 8:36 ` [PATCH 2/4] diff: Fix docs and add -O to diff-helper Junio C Hamano @ 2005-06-03 8:37 ` Junio C Hamano 2005-06-03 8:40 ` [PATCH 4/4] diff: Update -B heuristics Junio C Hamano 3 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-03 8:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List This cleans up diff_scoreopt_parse() function that is used to parse the fractional notation -B, -C and -M option takes. The callers are modified to check for errors and complain. Earlier they silently ignored malformed input and falled back on the default. Signed-off-by: Junio C Hamano <junkio@cox.net> --- diff-cache.c | 9 ++++++--- diff-files.c | 15 +++++++++++---- diff-tree.c | 9 ++++++--- diff.c | 39 +++++++++++++++++++++++++++++++++++++++ diffcore-rename.c | 18 ------------------ 5 files changed, 62 insertions(+), 28 deletions(-) diff --git a/diff-cache.c b/diff-cache.c --- a/diff-cache.c +++ b/diff-cache.c @@ -191,17 +191,20 @@ int main(int argc, const char **argv) continue; } if (!strncmp(arg, "-B", 2)) { - diff_break_opt = diff_scoreopt_parse(arg); + if ((diff_break_opt = diff_scoreopt_parse(arg)) == -1) + usage(diff_cache_usage); continue; } if (!strncmp(arg, "-M", 2)) { detect_rename = DIFF_DETECT_RENAME; - diff_score_opt = diff_scoreopt_parse(arg); + if ((diff_score_opt = diff_scoreopt_parse(arg)) == -1) + usage(diff_cache_usage); continue; } if (!strncmp(arg, "-C", 2)) { detect_rename = DIFF_DETECT_COPY; - diff_score_opt = diff_scoreopt_parse(arg); + if ((diff_score_opt = diff_scoreopt_parse(arg)) == -1) + usage(diff_cache_usage); continue; } if (!strcmp(arg, "-z")) { diff --git a/diff-files.c b/diff-files.c --- a/diff-files.c +++ b/diff-files.c @@ -61,14 +61,21 @@ int main(int argc, const char **argv) orderfile = argv[1] + 2; else if (!strcmp(argv[1], "--pickaxe-all")) pickaxe_opts = DIFF_PICKAXE_ALL; - else if (!strncmp(argv[1], "-B", 2)) - diff_break_opt = diff_scoreopt_parse(argv[1]); + else if (!strncmp(argv[1], "-B", 2)) { + if ((diff_break_opt = + diff_scoreopt_parse(argv[1])) == -1) + usage(diff_files_usage); + } else if (!strncmp(argv[1], "-M", 2)) { - diff_score_opt = diff_scoreopt_parse(argv[1]); + if ((diff_score_opt = + diff_scoreopt_parse(argv[1])) == -1) + usage(diff_files_usage); detect_rename = DIFF_DETECT_RENAME; } else if (!strncmp(argv[1], "-C", 2)) { - diff_score_opt = diff_scoreopt_parse(argv[1]); + if ((diff_score_opt = + diff_scoreopt_parse(argv[1])) == -1) + usage(diff_files_usage); detect_rename = DIFF_DETECT_COPY; } else diff --git a/diff-tree.c b/diff-tree.c --- a/diff-tree.c +++ b/diff-tree.c @@ -459,16 +459,19 @@ int main(int argc, const char **argv) } if (!strncmp(arg, "-M", 2)) { detect_rename = DIFF_DETECT_RENAME; - diff_score_opt = diff_scoreopt_parse(arg); + if ((diff_score_opt = diff_scoreopt_parse(arg)) == -1) + usage(diff_tree_usage); continue; } if (!strncmp(arg, "-C", 2)) { detect_rename = DIFF_DETECT_COPY; - diff_score_opt = diff_scoreopt_parse(arg); + if ((diff_score_opt = diff_scoreopt_parse(arg)) == -1) + usage(diff_tree_usage); continue; } if (!strncmp(arg, "-B", 2)) { - diff_break_opt = diff_scoreopt_parse(arg); + if ((diff_break_opt = diff_scoreopt_parse(arg)) == -1) + usage(diff_tree_usage); continue; } if (!strcmp(arg, "-z")) { diff --git a/diff.c b/diff.c --- a/diff.c +++ b/diff.c @@ -589,6 +589,45 @@ void diff_setup(int flags) } +static int parse_num(const char **cp_p) +{ + int num, scale, ch, cnt; + const char *cp = *cp_p; + + cnt = num = 0; + scale = 1; + while ('0' <= (ch = *cp) && ch <= '9') { + if (cnt++ < 5) { + /* We simply ignore more than 5 digits precision. */ + scale *= 10; + num = num * 10 + ch - '0'; + } + *cp++; + } + *cp_p = cp; + + /* user says num divided by scale and we say internally that + * is MAX_SCORE * num / scale. + */ + return (MAX_SCORE * num / scale); +} + +int diff_scoreopt_parse(const char *opt) +{ + int opt1, cmd; + + if (*opt++ != '-') + return -1; + cmd = *opt++; + if (cmd != 'M' && cmd != 'C' && cmd != 'B') + return -1; /* that is not a -M, -C nor -B option */ + + opt1 = parse_num(&opt); + if (*opt != 0) + return -1; + return opt1; +} + struct diff_queue_struct diff_queued_diff; void diff_q(struct diff_queue_struct *queue, struct diff_filepair *dp) diff --git a/diffcore-rename.c b/diffcore-rename.c --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -229,24 +229,6 @@ static int score_compare(const void *a_, return b->score - a->score; } -int diff_scoreopt_parse(const char *opt) -{ - int diglen, num, scale, i; - if (opt[0] != '-' || (opt[1] != 'M' && opt[1] != 'C' && opt[1] != 'B')) - return -1; /* that is not a -M, -C nor -B option */ - diglen = strspn(opt+2, "0123456789"); - if (diglen == 0 || strlen(opt+2) != diglen) - return 0; /* use default */ - sscanf(opt+2, "%d", &num); - for (i = 0, scale = 1; i < diglen; i++) - scale *= 10; - - /* user says num divided by scale and we say internally that - * is MAX_SCORE * num / scale. - */ - return MAX_SCORE * num / scale; -} - void diffcore_rename(int detect_rename, int minimum_score) { struct diff_queue_struct *q = &diff_queued_diff; ------------ ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 4/4] diff: Update -B heuristics. 2005-06-03 8:32 ` [PATCH 0/4] " Junio C Hamano ` (2 preceding siblings ...) 2005-06-03 8:37 ` [PATCH 3/4] diff: Clean up diff_scoreopt_parse() Junio C Hamano @ 2005-06-03 8:40 ` Junio C Hamano 3 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-03 8:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List As Linus pointed out on the mailing list discussion, -B should break a files that has many inserts even if it still keeps enough of the original contents, so that the broken pieces can later be matched with other files by -M or -C. However, if such a broken pair does not get picked up by -M or -C, we would want to apply different criteria; namely, regardless of the amount of new material in the result, the determination of "rewrite" should be done by looking at the amount of original material still left in the result. If you still have the original 97 lines from a 100-line document, it does not matter if you add your own 13 lines to make a 110-line document, or if you add 903 lines to make a 1000-line document. It is not a rewrite but an in-place edit. On the other hand, if you did lose 97 lines from the original, it does not matter if you added 27 lines to make a 30-line document or if you added 997 lines to make a 1000-line document. You did a complete rewrite in either case. This patch introduces a post-processing phase that runs after diffcore-rename matches up broken pairs diffcore-break creates. The purpose of this post-processing is to pick up these broken pieces and merge them back into in-place modifications. For this, the score parameter -B option takes is changed into a pair of numbers, and it takes "-B99/80" format when fully spelled out. The first number is the minimum amount of "edit" (same definition as what diffcore-rename uses, which is "sum of deletion and insertion") that a modification needs to have to be broken, and the second number is the minimum amount of "delete" a surviving broken pair must have to avoid being merged back together. It can be abbreviated to "-B" to use default for both, "-B9" or "-B9/" to use 90% for "edit" but default (80%) for merge avoidance, or "-B/75" to use default (99%) "edit" and 75% for merge avoidance. Signed-off-by: Junio C Hamano <junkio@cox.net> --- diffcore.h | 11 ++ diff.c | 18 ++++ diffcore-break.c | 240 +++++++++++++++++++++++++++++++++++++++++++++--------- 3 files changed, 225 insertions(+), 44 deletions(-) diff --git a/diffcore.h b/diffcore.h --- a/diffcore.h +++ b/diffcore.h @@ -8,9 +8,19 @@ * (e.g. diffcore-rename, diffcore-pickaxe). Never include this header * in anything else. */ + +/* We internally use unsigned short as the score value, + * and rely on an int capable to hold 32-bits. -B can take + * -Bmerge_score/break_score format and the two scores are + * passed around in one int (high 16-bit for merge and low 16-bit + * for break). + */ #define MAX_SCORE 60000 #define DEFAULT_RENAME_SCORE 30000 /* rename/copy similarity minimum (50%) */ #define DEFAULT_BREAK_SCORE 59400 /* minimum for break to happen (99%)*/ +#define DEFAULT_MERGE_SCORE 48000 /* maximum for break-merge to happen (80%)*/ + +#define MINIMUM_BREAK_SIZE 400 /* do not break a file smaller than this */ struct diff_filespec { unsigned char sha1[20]; @@ -76,6 +86,7 @@ extern void diff_q(struct diff_queue_str extern void diffcore_pathspec(const char **pathspec); extern void diffcore_break(int); extern void diffcore_rename(int rename_copy, int); +extern void diffcore_merge_broken(void); extern void diffcore_pickaxe(const char *needle, int opts); extern void diffcore_order(const char *orderfile); diff --git a/diff.c b/diff.c --- a/diff.c +++ b/diff.c @@ -614,7 +614,7 @@ static int parse_num(const char **cp_p) int diff_scoreopt_parse(const char *opt) { - int opt1, cmd; + int opt1, opt2, cmd; if (*opt++ != '-') return -1; @@ -623,9 +623,21 @@ int diff_scoreopt_parse(const char *opt) return -1; /* that is not a -M, -C nor -B option */ opt1 = parse_num(&opt); + if (cmd != 'B') + opt2 = 0; + else { + if (*opt == 0) + opt2 = 0; + else if (*opt != '/') + return -1; /* we expect -B80/99 or -B80 */ + else { + opt++; + opt2 = parse_num(&opt); + } + } if (*opt != 0) return -1; - return opt1; + return opt1 | (opt2 << 16); } struct diff_queue_struct diff_queued_diff; @@ -955,6 +967,8 @@ void diffcore_std(const char **paths, diffcore_break(break_opt); if (detect_rename) diffcore_rename(detect_rename, rename_score); + if (0 <= break_opt) + diffcore_merge_broken(); if (pickaxe) diffcore_pickaxe(pickaxe, pickaxe_opts); if (orderfile) diff --git a/diffcore-break.c b/diffcore-break.c --- a/diffcore-break.c +++ b/diffcore-break.c @@ -7,28 +7,58 @@ #include "delta.h" #include "count-delta.h" -static int very_different(struct diff_filespec *src, - struct diff_filespec *dst, - int min_score) +static int should_break(struct diff_filespec *src, + struct diff_filespec *dst, + int break_score, + int *merge_score_p) { /* dst is recorded as a modification of src. Are they so * different that we are better off recording this as a pair - * of delete and create? min_score is the minimum amount of - * new material that must exist in the dst and not in src for - * the pair to be considered a complete rewrite, and recommended - * to be set to a very high value, 99% or so. - * - * The value we return represents the amount of new material - * that is in dst and not in src. We return 0 when we do not - * want to get the filepair broken. + * of delete and create? + * + * There are two criteria used in this algorithm. For the + * purposes of helping later rename/copy, we take both delete + * and insert into account and estimate the amount of "edit". + * If the edit is very large, we break this pair so that + * rename/copy can pick the pieces up to match with other + * files. + * + * On the other hand, we would want to ignore inserts for the + * pure "complete rewrite" detection. As long as most of the + * existing contents were removed from the file, it is a + * complete rewrite, and if sizable chunk from the original + * still remains in the result, it is not a rewrite. It does + * not matter how much or how little new material is added to + * the file. + * + * The score we leave for such a broken filepair uses the + * latter definition so that later clean-up stage can find the + * pieces that should not have been broken according to the + * latter definition after rename/copy runs, and merge the + * broken pair that have a score lower than given criteria + * back together. The break operation itself happens + * according to the former definition. + * + * The minimum_edit parameter tells us when to break (the + * amount of "edit" required for us to consider breaking the + * pair). We leave the amount of deletion in *merge_score_p + * when we return. + * + * The value we return is 1 if we want the pair to be broken, + * or 0 if we do not. */ void *delta; unsigned long delta_size, base_size, src_copied, literal_added; + int to_break = 0; + + *merge_score_p = 0; /* assume no deletion --- "do not break" + * is the default. + */ if (!S_ISREG(src->mode) || !S_ISREG(dst->mode)) return 0; /* leave symlink rename alone */ - if (diff_populate_filespec(src, 1) || diff_populate_filespec(dst, 1)) + if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0)) return 0; /* error but caught downstream */ delta_size = ((src->size < dst->size) ? @@ -40,53 +70,95 @@ static int very_different(struct diff_fi */ base_size = ((src->size < dst->size) ? dst->size : src->size); - /* - * If file size difference is too big compared to the - * base_size, we declare this a complete rewrite. - */ - if (base_size * min_score < delta_size * MAX_SCORE) - return MAX_SCORE; - - if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0)) - return 0; /* error but caught downstream */ - delta = diff_delta(src->data, src->size, dst->data, dst->size, &delta_size); - /* A delta that has a lot of literal additions would have - * big delta_size no matter what else it does. - */ - if (base_size * min_score < delta_size * MAX_SCORE) - return MAX_SCORE; - /* Estimate the edit size by interpreting delta. */ - if (count_delta(delta, delta_size, &src_copied, &literal_added)) { + if (count_delta(delta, delta_size, + &src_copied, &literal_added)) { free(delta); - return 0; + return 0; /* we cannot tell */ } free(delta); - /* Extent of damage */ - if (src->size + literal_added < src_copied) - delta_size = 0; + /* Compute merge-score, which is "how much is removed + * from the source material". The clean-up stage will + * merge the surviving pair together if the score is + * less than the minimum, after rename/copy runs. + */ + if (src->size <= src_copied) + delta_size = 0; /* avoid wrapping around */ + else + delta_size = src->size - src_copied; + *merge_score_p = delta_size * MAX_SCORE / src->size; + + /* Extent of damage, which counts both inserts and + * deletes. + */ + if (src->size + literal_added <= src_copied) + delta_size = 0; /* avoid wrapping around */ else delta_size = (src->size - src_copied) + literal_added; + + /* We break if the edit exceeds the minimum. + * i.e. (break_score / MAX_SCORE < delta_size / base_size) + */ + if (break_score * base_size < delta_size * MAX_SCORE) + to_break = 1; - if (base_size < delta_size) - return MAX_SCORE; - - return delta_size * MAX_SCORE / base_size; + return to_break; } -void diffcore_break(int min_score) +void diffcore_break(int break_score) { struct diff_queue_struct *q = &diff_queued_diff; struct diff_queue_struct outq; + + /* When the filepair has this much edit (insert and delete), + * it is first considered to be a rewrite and broken into a + * create and delete filepair. This is to help breaking a + * file that had too much new stuff added, possibly from + * moving contents from another file, so that rename/copy can + * match it with the other file. + * + * int break_score; we reuse incoming parameter for this. + */ + + /* After a pair is broken according to break_score and + * subjected to rename/copy, both of them may survive intact, + * due to lack of suitable rename/copy peer. Or, the caller + * may be calling us without using rename/copy. When that + * happens, we merge the broken pieces back into one + * modification together if the pair did not have more than + * this much delete. For this computation, we do not take + * insert into account at all. If you start from a 100-line + * file and delete 97 lines of it, it does not matter if you + * add 27 lines to it to make a new 30-line file or if you add + * 997 lines to it to make a 1000-line file. Either way what + * you did was a rewrite of 97%. On the other hand, if you + * delete 3 lines, keeping 97 lines intact, it does not matter + * if you add 3 lines to it to make a new 100-line file or if + * you add 903 lines to it to make a new 1000-line file. + * Either way you did a lot of additions and not a rewrite. + * This merge happens to catch the latter case. A merge_score + * of 80% would be a good default value (a broken pair that + * has score lower than merge_score will be merged back + * together). + */ + int merge_score; int i; - if (!min_score) - min_score = DEFAULT_BREAK_SCORE; + /* See comment on DEFAULT_BREAK_SCORE and + * DEFAULT_MERGE_SCORE in diffcore.h + */ + merge_score = (break_score >> 16) & 0xFFFF; + break_score = (break_score & 0xFFFF); + + if (!break_score) + break_score = DEFAULT_BREAK_SCORE; + if (!merge_score) + merge_score = DEFAULT_MERGE_SCORE; outq.nr = outq.alloc = 0; outq.queue = NULL; @@ -101,12 +173,22 @@ void diffcore_break(int min_score) if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two) && !S_ISDIR(p->one->mode) && !S_ISDIR(p->two->mode) && !strcmp(p->one->path, p->two->path)) { - score = very_different(p->one, p->two, min_score); - if (min_score <= score) { + if (should_break(p->one, p->two, + break_score, &score)) { /* Split this into delete and create */ struct diff_filespec *null_one, *null_two; struct diff_filepair *dp; + /* Set score to 0 for the pair that + * needs to be merged back together + * should they survive rename/copy. + * Also we do not want to break very + * small files. + */ + if ((score < merge_score) || + (p->one->size < MINIMUM_BREAK_SIZE)) + score = 0; + /* deletion of one */ null_one = alloc_filespec(p->one->path); dp = diff_queue(&outq, p->one, null_one); @@ -132,3 +214,77 @@ void diffcore_break(int min_score) return; } + +static void merge_broken(struct diff_filepair *p, + struct diff_filepair *pp, + struct diff_queue_struct *outq) +{ + /* p and pp are broken pairs we want to merge */ + struct diff_filepair *c = p, *d = pp; + if (DIFF_FILE_VALID(p->one)) { + /* this must be a delete half */ + d = p; c = pp; + } + /* Sanity check */ + if (!DIFF_FILE_VALID(d->one)) + die("internal error in merge #1"); + if (DIFF_FILE_VALID(d->two)) + die("internal error in merge #2"); + if (DIFF_FILE_VALID(c->one)) + die("internal error in merge #3"); + if (!DIFF_FILE_VALID(c->two)) + die("internal error in merge #4"); + + diff_queue(outq, d->one, c->two); + diff_free_filespec_data(d->two); + diff_free_filespec_data(c->one); + free(d); + free(c); +} + +void diffcore_merge_broken(void) +{ + struct diff_queue_struct *q = &diff_queued_diff; + struct diff_queue_struct outq; + int i, j; + + outq.nr = outq.alloc = 0; + outq.queue = NULL; + + for (i = 0; i < q->nr; i++) { + struct diff_filepair *p = q->queue[i]; + if (!p) + /* we already merged this with its peer */ + continue; + else if (p->broken_pair && + p->score == 0 && + !strcmp(p->one->path, p->two->path)) { + /* If the peer also survived rename/copy, then + * we merge them back together. + */ + for (j = i + 1; j < q->nr; j++) { + struct diff_filepair *pp = q->queue[j]; + if (pp->broken_pair && + p->score == 0 && + !strcmp(pp->one->path, pp->two->path) && + !strcmp(p->one->path, pp->two->path)) { + /* Peer survived. Merge them */ + merge_broken(p, pp, &outq); + q->queue[j] = NULL; + break; + } + } + if (q->nr <= j) + /* The peer did not survive, so we keep + * it in the output. + */ + diff_q(&outq, p); + } + else + diff_q(&outq, p); + } + free(q->queue); + *q = outq; + + return; +} ------------ ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 3:04 ` Linus Torvalds 2005-06-01 4:06 ` Junio C Hamano @ 2005-06-01 6:28 ` Junio C Hamano 2005-06-01 22:00 ` Daniel Barkalow ` (3 subsequent siblings) 5 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 6:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> Anyway, I wrote just a _very_ introductory thing in LT> Documentation/tutorial.txt, I'll try to update and expand on it later. It LT> basically has a really stupid example of "how to set up a new project". Linus, I was following your "tutorial" and saw the last step (git-whatchanged) showing the HEAD commit and diff _twice_. You got me _WORRIED_!!! I knew it uses your faviorite diff-tree command and I was the most likely suspect who broke it. And I remember you were understandably unhappy last time I broke it (the "diff-tree -s" problem). It turns out that the example in the tutorial was bad. Here is a fix. It is so obvious that I do not think it deserves a sign-off nor credit. Please just fold it into your edit next time you update the tutorial. --- cd /opt/packrat/playpen/public/in-place/git/git.junio/ jit-diff : Documentation # - linus: git-apply --stat: limit lines to 79 characters # + (working tree) diff --git a/Documentation/tutorial.txt b/Documentation/tutorial.txt --- a/Documentation/tutorial.txt +++ b/Documentation/tutorial.txt @@ -401,7 +401,7 @@ activity. To see the whole history of our pitiful little git-tutorial project, we can do - git-whatchanged -p --root HEAD + git-whatchanged -p --root (the "--root" flag is a flag to git-diff-tree to tell it to show the initial aka "root" commit as a diff too), and you will see exactly what Compilation finished at Tue May 31 23:12:32 ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 3:04 ` Linus Torvalds 2005-06-01 4:06 ` Junio C Hamano 2005-06-01 6:28 ` I want to release a "git-1.0" Junio C Hamano @ 2005-06-01 22:00 ` Daniel Barkalow 2005-06-01 23:05 ` Junio C Hamano 2005-06-03 9:47 ` Petr Baudis 2005-06-02 7:15 ` Eric W. Biederman ` (2 subsequent siblings) 5 siblings, 2 replies; 64+ messages in thread From: Daniel Barkalow @ 2005-06-01 22:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Eric W. Biederman, Git Mailing List On Tue, 31 May 2005, Linus Torvalds wrote: > On Tue, 31 May 2005, Eric W. Biederman wrote: > > > > I way behind the power curve on learning git at this point but > > one piece of the puzzle that CVS has that I don't believe git does > > are multiple people committing to the same repository, especially > > remotely. I don't see that as a down side of git but it is a common > > way people CVS so it is worth documenting. > > It's actually one thing git doesn't do per se. > > You have to do a "git-pull-script" from the common repository side, > there's no "git-push-script". Ugly. It shouldn't be hard to do one, except that locking with rsync is going to be a pain. I had a patch to make it work with the rpush/rpull pair, but I didn't get its dependancies in at the time. I can dust those patches off again if you want that functionality included. The patches are essentially: - make the transport protocol handle things other than objects - library procedure for locking atomic update of refs files - fetching refs in general - rpull/rpush that updates a specified ref file atomically At least the first would be very nice to get in before 1.0, since it is an incompatible change to the protocol. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 22:00 ` Daniel Barkalow @ 2005-06-01 23:05 ` Junio C Hamano 2005-06-03 9:47 ` Petr Baudis 1 sibling, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-01 23:05 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Linus Torvalds, Eric W. Biederman, Git Mailing List >>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes: DB> It shouldn't be hard to do one, except that locking with DB> rsync is going to be a pain. I had a patch to make it work DB> with the rpush/rpull pair, but I didn't get its dependancies DB> in at the time. I can dust those patches off again if you DB> want that functionality included. Talking about pulls, wouldn't it be nicer to (re)name it to git-ssh-pull, for consistency with others, especially before we hit 1.0? ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 22:00 ` Daniel Barkalow 2005-06-01 23:05 ` Junio C Hamano @ 2005-06-03 9:47 ` Petr Baudis 2005-06-03 15:09 ` Daniel Barkalow 1 sibling, 1 reply; 64+ messages in thread From: Petr Baudis @ 2005-06-03 9:47 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Linus Torvalds, Eric W. Biederman, Git Mailing List Dear diary, on Thu, Jun 02, 2005 at 12:00:55AM CEST, I got a letter where Daniel Barkalow <barkalow@iabervon.org> told me that... > It shouldn't be hard to do one, except that locking with rsync is going to > be a pain. I had a patch to make it work with the rpush/rpull pair, but I > didn't get its dependancies in at the time. Was that the patch I was replying to recently? It didn't seem to have any dependencies. > I can dust those patches off again if you want that functionality included. > > The patches are essentially: > > - make the transport protocol handle things other than objects > - library procedure for locking atomic update of refs files > - fetching refs in general > - rpull/rpush that updates a specified ref file atomically > > At least the first would be very nice to get in before 1.0, since it is an > incompatible change to the protocol. I would like to have this a lot too. Pulling tags now is a PITA, and I definitively want to go in this way. So it will land at least in git-pb. :-) (But that's a little troublesome if you say it's incompatible change.) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-03 9:47 ` Petr Baudis @ 2005-06-03 15:09 ` Daniel Barkalow 0 siblings, 0 replies; 64+ messages in thread From: Daniel Barkalow @ 2005-06-03 15:09 UTC (permalink / raw) To: Petr Baudis; +Cc: Linus Torvalds, Eric W. Biederman, Git Mailing List On Fri, 3 Jun 2005, Petr Baudis wrote: > Dear diary, on Thu, Jun 02, 2005 at 12:00:55AM CEST, I got a letter > where Daniel Barkalow <barkalow@iabervon.org> told me that... > > It shouldn't be hard to do one, except that locking with rsync is going to > > be a pain. I had a patch to make it work with the rpush/rpull pair, but I > > didn't get its dependancies in at the time. > > Was that the patch I was replying to recently? It didn't seem to have > any dependencies. The rpush/rpull changes were at the end of a series that you were replying to the beginning of. > > I can dust those patches off again if you want that functionality included. > > > > The patches are essentially: > > > > - make the transport protocol handle things other than objects > > - library procedure for locking atomic update of refs files > > - fetching refs in general > > - rpull/rpush that updates a specified ref file atomically > > > > At least the first would be very nice to get in before 1.0, since it is an > > incompatible change to the protocol. > > I would like to have this a lot too. Pulling tags now is a PITA, and I > definitively want to go in this way. So it will land at least in git-pb. > :-) (But that's a little troublesome if you say it's incompatible > change.) The ssh-based protocol has to change, because the current version doesn't have any way of being extended. The first patch in the new set makes the incompatible change without adding anything new (so as to be as uncontroversial as possible), and now also adds a version number so that future additions should be less of a big deal. The rest of the series will add the transfer of refs to the transfer mechanism and the protocol. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 3:04 ` Linus Torvalds ` (2 preceding siblings ...) 2005-06-01 22:00 ` Daniel Barkalow @ 2005-06-02 7:15 ` Eric W. Biederman 2005-06-02 8:32 ` Kay Sievers 2005-06-02 14:52 ` Linus Torvalds 2005-06-02 12:02 ` [PATCH] several typos in tutorial Alexey Nezhdanov 2005-06-02 23:40 ` I want to release a "git-1.0" Adam Kropelin 5 siblings, 2 replies; 64+ messages in thread From: Eric W. Biederman @ 2005-06-02 7:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds <torvalds@osdl.org> writes: > Anyway, I wrote just a _very_ introductory thing in > Documentation/tutorial.txt, I'll try to update and expand on it later. It > basically has a really stupid example of "how to set up a new project". So I need to do a git checkout of the latest version of git to read the tutorial? So I can figure out how to use git? Catch 22? :) Eric ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-02 7:15 ` Eric W. Biederman @ 2005-06-02 8:32 ` Kay Sievers 2005-06-02 14:52 ` Linus Torvalds 1 sibling, 0 replies; 64+ messages in thread From: Kay Sievers @ 2005-06-02 8:32 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, Git Mailing List On Thu, Jun 02, 2005 at 01:15:59AM -0600, Eric W. Biederman wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > Anyway, I wrote just a _very_ introductory thing in > > Documentation/tutorial.txt, I'll try to update and expand on it later. It > > basically has a really stupid example of "how to set up a new project". > > So I need to do a git checkout of the latest version of git to > read the tutorial? So I can figure out how to use git? No problem: :) http://www.kernel.org/git/?p=git/git.git;a=blob;f=Documentation/tutorial.txt Kay ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-02 7:15 ` Eric W. Biederman 2005-06-02 8:32 ` Kay Sievers @ 2005-06-02 14:52 ` Linus Torvalds 1 sibling, 0 replies; 64+ messages in thread From: Linus Torvalds @ 2005-06-02 14:52 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Git Mailing List On Thu, 2 Jun 2005, Eric W. Biederman wrote: > > So I need to do a git checkout of the latest version of git to > read the tutorial? So I can figure out how to use git? Just use the gitweb thing, it's easy to read off there.. Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH] several typos in tutorial 2005-06-01 3:04 ` Linus Torvalds ` (3 preceding siblings ...) 2005-06-02 7:15 ` Eric W. Biederman @ 2005-06-02 12:02 ` Alexey Nezhdanov 2005-06-02 12:41 ` Vincent Hanquez 2005-06-02 23:40 ` I want to release a "git-1.0" Adam Kropelin 5 siblings, 1 reply; 64+ messages in thread From: Alexey Nezhdanov @ 2005-06-02 12:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List, Alexey Nezhdanov Signed-off-by: Alexey Nezhdanov <snake@penza-gsm.ru> --- diff --git a/Documentation/tutorial.txt b/Documentation/tutorial.txt --- a/Documentation/tutorial.txt +++ b/Documentation/tutorial.txt @@ -298,7 +298,7 @@ have committed something, we can also le Unlike "git-diff-files", which showed the difference between the index file and the working directory, "git-diff-cache" shows the differences -between a committed _tree_ and the index file. In other words, +between a committed _tree_ and the working directory. In other words, git-diff-cache wants a tree to be diffed against, and before we did the commit, we couldn't do that, because we didn't have anything to diff against. @@ -423,8 +423,8 @@ With that, you should now be having some can explore on your own. - Copoying archives - ----------------- + Copying archives + ---------------- Git arhives are normally totally self-sufficient, and it's worth noting that unlike CVS, for example, there is no separate notion of ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] several typos in tutorial 2005-06-02 12:02 ` [PATCH] several typos in tutorial Alexey Nezhdanov @ 2005-06-02 12:41 ` Vincent Hanquez 2005-06-02 12:45 ` Alexey Nezhdanov 0 siblings, 1 reply; 64+ messages in thread From: Vincent Hanquez @ 2005-06-02 12:41 UTC (permalink / raw) To: Alexey Nezhdanov; +Cc: Linus Torvalds, Git Mailing List On Thu, Jun 02, 2005 at 04:02:07PM +0400, Alexey Nezhdanov wrote: > Git arhives are normally totally self-sufficient, and it's worth noting ^^^^^^^ and one more here -- Vincent Hanquez ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] several typos in tutorial 2005-06-02 12:41 ` Vincent Hanquez @ 2005-06-02 12:45 ` Alexey Nezhdanov 2005-06-02 12:51 ` Vincent Hanquez 0 siblings, 1 reply; 64+ messages in thread From: Alexey Nezhdanov @ 2005-06-02 12:45 UTC (permalink / raw) To: git; +Cc: Vincent Hanquez, Linus Torvalds On thursday, 02 June 2005 16:41 Vincent Hanquez wrote: > On Thu, Jun 02, 2005 at 04:02:07PM +0400, Alexey Nezhdanov wrote: > > Git arhives are normally totally self-sufficient, and it's worth noting > > ^^^^^^^ > and one more here Why? It's ok to speak about many [existing] archives here. -- Respectfully Alexey Nezhdanov ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] several typos in tutorial 2005-06-02 12:45 ` Alexey Nezhdanov @ 2005-06-02 12:51 ` Vincent Hanquez 2005-06-02 12:56 ` Alexey Nezhdanov 2005-06-02 13:00 ` Alexey Nezhdanov 0 siblings, 2 replies; 64+ messages in thread From: Vincent Hanquez @ 2005-06-02 12:51 UTC (permalink / raw) To: Alexey Nezhdanov; +Cc: git, Linus Torvalds On Thu, Jun 02, 2005 at 04:45:15PM +0400, Alexey Nezhdanov wrote: > On thursday, 02 June 2005 16:41 Vincent Hanquez wrote: > > On Thu, Jun 02, 2005 at 04:02:07PM +0400, Alexey Nezhdanov wrote: > > > Git arhives are normally totally self-sufficient, and it's worth noting > > > > ^^^^^^^ > > and one more here > Why? It's ok to speak about many [existing] archives here. it's missing a 'c' -- Vincent Hanquez ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] several typos in tutorial 2005-06-02 12:51 ` Vincent Hanquez @ 2005-06-02 12:56 ` Alexey Nezhdanov 2005-06-02 13:00 ` Alexey Nezhdanov 1 sibling, 0 replies; 64+ messages in thread From: Alexey Nezhdanov @ 2005-06-02 12:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, Vincent Hanquez Signed-off-by: Alexey Nezhdanov <snake@penza-gsm.ru> --- diff --git a/Documentation/tutorial.txt b/Documentation/tutorial.txt --- a/Documentation/tutorial.txt +++ b/Documentation/tutorial.txt @@ -298,7 +298,7 @@ have committed something, we can also le Unlike "git-diff-files", which showed the difference between the index file and the working directory, "git-diff-cache" shows the differences -between a committed _tree_ and the index file. In other words, +between a committed _tree_ and the working directory. In other words, git-diff-cache wants a tree to be diffed against, and before we did the commit, we couldn't do that, because we didn't have anything to diff against. @@ -423,10 +423,10 @@ With that, you should now be having some can explore on your own. - Copoying archives - ----------------- + Copying archives + ---------------- -Git arhives are normally totally self-sufficient, and it's worth noting +Git archives are normally totally self-sufficient, and it's worth noting that unlike CVS, for example, there is no separate notion of "repository" and "working tree". A git repository normally _is_ the working tree, with the local git information hidden in the ".git" ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH] several typos in tutorial 2005-06-02 12:51 ` Vincent Hanquez 2005-06-02 12:56 ` Alexey Nezhdanov @ 2005-06-02 13:00 ` Alexey Nezhdanov 1 sibling, 0 replies; 64+ messages in thread From: Alexey Nezhdanov @ 2005-06-02 13:00 UTC (permalink / raw) To: git; +Cc: Vincent Hanquez On thursday, 02 June 2005 16:51 Vincent Hanquez wrote: > On Thu, Jun 02, 2005 at 04:45:15PM +0400, Alexey Nezhdanov wrote: > > On thursday, 02 June 2005 16:41 Vincent Hanquez wrote: > > > On Thu, Jun 02, 2005 at 04:02:07PM +0400, Alexey Nezhdanov wrote: > > > > Git arhives are normally totally self-sufficient, and it's worth > > > > noting > > > > > > ^^^^^^^ > > > and one more here > > > > Why? It's ok to speak about many [existing] archives here. > > it's missing a 'c' ok :) -- Respectfully Alexey Nezhdanov ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-01 3:04 ` Linus Torvalds ` (4 preceding siblings ...) 2005-06-02 12:02 ` [PATCH] several typos in tutorial Alexey Nezhdanov @ 2005-06-02 23:40 ` Adam Kropelin 2005-06-03 0:06 ` Linus Torvalds 5 siblings, 1 reply; 64+ messages in thread From: Adam Kropelin @ 2005-06-02 23:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds wrote: > Anyway, I wrote just a _very_ introductory thing in > Documentation/tutorial.txt, I'll try to update and expand on it later. > It > basically has a really stupid example of "how to set up a new > project". I've been working my way thru the tutorial, trying to up my git clue level a bit. One part where things start to go a bit pear-shaped for me is in the description of git-diff-files vs. git-diff-cache. The tutorial takes pains to emphasize the difference between "working directory contents", "index file", and "committed tree", and I'm on board with that. What confuses me is the following: > Unlike "git-diff-files", which showed the difference between the index > file and the working directory, "git-diff-cache" shows the differences > between a committed _tree_ and the index file. > ... > [example where git-diff-cache shows difference between working > directory and committed tree] > ... > "git-diff-cache" also has a specific flag "--cached", which is used to > tell it to show the differences purely with the index file, and ignore > the current working directory state entirely The example and the description of --cached seem to contradict the first sentence's description the tool's purpose in life. If it shows you differences between a committed tree and the index file, why is it looking in my working directory at all? In order to get the behavior the first sentence describes you actually have to use --cached. Am I on right track? --Adam ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-02 23:40 ` I want to release a "git-1.0" Adam Kropelin @ 2005-06-03 0:06 ` Linus Torvalds 2005-06-03 0:47 ` Linus Torvalds 0 siblings, 1 reply; 64+ messages in thread From: Linus Torvalds @ 2005-06-03 0:06 UTC (permalink / raw) To: Adam Kropelin; +Cc: Git Mailing List On Thu, 2 Jun 2005, Adam Kropelin wrote: > What confuses me is the following: Yeah, I'll try to clarify. git-diff-cache can show the difference between a tree and either the index _or_ the working directory. Will fix up. Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-03 0:06 ` Linus Torvalds @ 2005-06-03 0:47 ` Linus Torvalds 2005-06-03 1:34 ` Adam Kropelin 0 siblings, 1 reply; 64+ messages in thread From: Linus Torvalds @ 2005-06-03 0:47 UTC (permalink / raw) To: Adam Kropelin; +Cc: Git Mailing List On Thu, 2 Jun 2005, Linus Torvalds wrote: > > Yeah, I'll try to clarify. Adam, do you find the current version a bit more clear on this? Linus ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: I want to release a "git-1.0" 2005-06-03 0:47 ` Linus Torvalds @ 2005-06-03 1:34 ` Adam Kropelin 0 siblings, 0 replies; 64+ messages in thread From: Adam Kropelin @ 2005-06-03 1:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds wrote: > On Thu, 2 Jun 2005, Linus Torvalds wrote: >> >> Yeah, I'll try to clarify. > > Adam, do you find the current version a bit more clear on this? Absolutely. I especially like the new digression explaining that the --cached flag controls where file _content_ is fetched from and reinforcing that the index file always governs which files are involved in the diff. Thanks! --Adam ^ permalink raw reply [flat|nested] 64+ messages in thread
* CVS migration section to the tutorial. 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds ` (9 preceding siblings ...) 2005-05-31 13:45 ` Eric W. Biederman @ 2005-06-02 19:43 ` Junio C Hamano 10 siblings, 0 replies; 64+ messages in thread From: Junio C Hamano @ 2005-06-02 19:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List I think a section to discuss "I am used to doing 'cvs xxx' to solve this problem, how do I do that in GIT" would be a good idea. Here is an example to talk about "cvs annotate". ------------ CVS annotate. The core GIT itself does not do "cvs annotate" equivalent, but it has something much nicer. Let's step back a bit and think about the reason why you would want to do "cvs annotate a-file.c" to begin with. - Are you really interested in _all_ the lines in that file? - Are you interested in lines _only_ in that file and do not care if the file was created by renaming from a different file? You would use "cvs annotate" on a file when you have trouble with a function (or even a single "if" statement in that function) that happens to be defined in the file, which does not do what you want it to do. And you would want to find out why it was written in that way, because you are about to modify it to suit your needs, and at the same time you do not want to break its current callers. For that, you want to find out why the original author did things that way in the original context. That's why you want "cvs annotate". So your answer to the first question _should_ be "no". You do not care about the whole file, only a segment of it. Also, in the original context, the same statement might have appeared at first in a different file and later the file was renamed to "a-file.c". Or the entire program may have constructs similar to the "if" statement you are having trouble with in different places, that you are still not aware of. So your answer to the second question _should_ be "no" as well. As an example, assuming that you have this piece code that you are interested in in the HEAD version: if (frotz) { nitfol(); } you would use git-rev-list and git-diff-tree like this: $ git-rev-list HEAD | git-diff-tree --stdin -v -p -S'if (frotz) { nitfol(); }' We have already talked about the "--stdin" form of git-diff-tree command that reads the list of commits and compares each commit with its parents. What the -S flag and its argument does is called "pickaxe", a tool for software archaeologists. When "pickaxe" is used, git-diff-tree command outputs differences between two commits only if one tree has the specified string in a file and the corresponding file in the other tree does not. The above example looks for a commit that has the "if" statement in it in a file, but its parent commit does not have it in the same shape in the corresponding file (or the other way around, where the parent has it and the commit does not), and the differences between them are shown, along with the commit message (thanks to the -v flag). It does not show anything for commits that do not touch this "if" statement. To make things more interesting, you can give the -C flag to git-diff-tree, like this: $ git-rev-list HEAD | git-diff-tree --stdin -v -p -C -S'if (frotz) { nitfol(); }' When the -C flag is used, file renames and copies are followed. So if the "if" statement in question happens to be in "a-file.c" in the current HEAD commit, even if the file was originally called "o-file.c" and then renamed in an earlier commit, or if the file was created by copying an existing "o-file.c" in an earlier commit, you will not lose track. If the "if" statement did not change across such rename or copy, then the commit that does rename or copy would not show in the output, and if the "if" statement was modified while the file was still called "o-file.c", it would find the commit that changed the statement when it was in "o-file.c". [ BTW, the current versions of "git-diff-tree -C" is not eager enough to find copies, and it will miss the fact that a-file.c was created by copying o-file.c unless o-file.c was somehow changed in the same commit.] To make things even more interesting, you can use the --pickaxe-all flag in addition to the -S flag. This causes the differences from all the files contained in those two commits, not just the differences between the files that contain this changed "if" statement: $ git-rev-list HEAD | git-diff-tree --stdin -v -p -C -S'if (frotz) { nitfol(); }' --pickaxe-all ^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2005-06-03 15:07 UTC | newest] Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds 2005-05-30 20:33 ` jeff millar 2005-05-30 20:49 ` Nicolas Pitre 2005-06-01 6:52 ` Junio C Hamano 2005-06-01 8:24 ` [PATCH] Add -d flag to git-pull-* family Junio C Hamano 2005-06-01 14:39 ` Nicolas Pitre 2005-06-01 16:00 ` Junio C Hamano [not found] ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net> 2005-06-02 0:47 ` [PATCH] Handle deltified object correctly in git-*-pull family Nicolas Pitre [not found] ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net> 2005-06-02 0:51 ` [PATCH] Stop inflating the whole SHA1 file only to check size Nicolas Pitre 2005-06-02 1:32 ` Junio C Hamano 2005-06-02 0:58 ` [PATCH] Handle deltified object correctly in git-*-pull family Linus Torvalds 2005-06-02 1:43 ` Junio C Hamano 2005-05-30 20:59 ` I want to release a "git-1.0" Junio C Hamano 2005-05-30 21:07 ` Junio C Hamano 2005-05-30 22:11 ` David Greaves 2005-05-30 22:12 ` Dave Jones 2005-05-30 22:55 ` Dmitry Torokhov 2005-05-30 23:15 ` Junio C Hamano 2005-05-30 23:23 ` Dmitry Torokhov 2005-05-31 0:52 ` Linus Torvalds 2005-05-30 22:19 ` Ryan Anderson 2005-05-31 0:58 ` Linus Torvalds 2005-05-30 22:32 ` Chris Wedgwood 2005-05-30 23:56 ` Chris Wedgwood 2005-05-31 1:06 ` Linus Torvalds 2005-06-01 2:11 ` Junio C Hamano 2005-06-01 2:25 ` David Lang 2005-06-01 4:53 ` Junio C Hamano 2005-06-01 20:06 ` David Lang 2005-06-01 20:16 ` C. Scott Ananian 2005-06-02 0:43 ` Nicolas Pitre 2005-06-02 1:14 ` Brian O'Mahoney 2005-06-01 23:03 ` Junio C Hamano 2005-05-31 0:19 ` Petr Baudis 2005-05-31 13:45 ` Eric W. Biederman 2005-06-01 3:04 ` Linus Torvalds 2005-06-01 4:06 ` Junio C Hamano 2005-06-02 23:54 ` [PATCH] Fix -B "very-different" logic Junio C Hamano 2005-06-03 0:21 ` Linus Torvalds 2005-06-03 1:33 ` Junio C Hamano 2005-06-03 8:32 ` [PATCH 0/4] " Junio C Hamano 2005-06-03 8:36 ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano 2005-06-03 8:36 ` [PATCH 2/4] diff: Fix docs and add -O to diff-helper Junio C Hamano 2005-06-03 8:37 ` [PATCH 3/4] diff: Clean up diff_scoreopt_parse() Junio C Hamano 2005-06-03 8:40 ` [PATCH 4/4] diff: Update -B heuristics Junio C Hamano 2005-06-01 6:28 ` I want to release a "git-1.0" Junio C Hamano 2005-06-01 22:00 ` Daniel Barkalow 2005-06-01 23:05 ` Junio C Hamano 2005-06-03 9:47 ` Petr Baudis 2005-06-03 15:09 ` Daniel Barkalow 2005-06-02 7:15 ` Eric W. Biederman 2005-06-02 8:32 ` Kay Sievers 2005-06-02 14:52 ` Linus Torvalds 2005-06-02 12:02 ` [PATCH] several typos in tutorial Alexey Nezhdanov 2005-06-02 12:41 ` Vincent Hanquez 2005-06-02 12:45 ` Alexey Nezhdanov 2005-06-02 12:51 ` Vincent Hanquez 2005-06-02 12:56 ` Alexey Nezhdanov 2005-06-02 13:00 ` Alexey Nezhdanov 2005-06-02 23:40 ` I want to release a "git-1.0" Adam Kropelin 2005-06-03 0:06 ` Linus Torvalds 2005-06-03 0:47 ` Linus Torvalds 2005-06-03 1:34 ` Adam Kropelin 2005-06-02 19:43 ` CVS migration section to the tutorial Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).