From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: What's in git.git Date: Mon, 11 Sep 2006 09:31:30 -0700 Message-ID: <7v7j0ajrfh.fsf@assigned-by-dhcp.cox.net> References: <7vk64bnnxl.fsf@assigned-by-dhcp.cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: git@vger.kernel.org X-From: git-owner@vger.kernel.org Mon Sep 11 18:30:51 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GMogN-0003Ya-08 for gcvg-git@gmane.org; Mon, 11 Sep 2006 18:30:51 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965029AbWIKQan (ORCPT ); Mon, 11 Sep 2006 12:30:43 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965034AbWIKQan (ORCPT ); Mon, 11 Sep 2006 12:30:43 -0400 Received: from fed1rmmtao10.cox.net ([68.230.241.29]:49792 "EHLO fed1rmmtao10.cox.net") by vger.kernel.org with ESMTP id S965029AbWIKQam (ORCPT ); Mon, 11 Sep 2006 12:30:42 -0400 Received: from fed1rmimpo01.cox.net ([70.169.32.71]) by fed1rmmtao10.cox.net (InterMail vM.6.01.06.01 201-2131-130-101-20060113) with ESMTP id <20060911163040.XJGT18985.fed1rmmtao10.cox.net@fed1rmimpo01.cox.net>; Mon, 11 Sep 2006 12:30:40 -0400 Received: from assigned-by-dhcp.cox.net ([68.5.247.80]) by fed1rmimpo01.cox.net with bizsmtp id M4WW1V00K1kojtg0000000 Mon, 11 Sep 2006 12:30:30 -0400 To: Jakub Narebski In-Reply-To: (Jakub Narebski's message of "Mon, 11 Sep 2006 13:29:19 +0200") User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Jakub Narebski writes: > Perhaps those two options, --stdin to feed arguments from standard input, > and -revs to run rev-list logic internally should be used whenever possible > in all the git commands? This would allow to avoid forks and/or command > line length limit. Some make more sense than others, from usability point of view. It all depends on how much sense it makes to be able to run them on sequence of commits. > * git-rev-list: --stdin to feed list of ; it is not clear from > the manpage if one can use ^, and commit related options > and shortcuts like --not, .., ... > * git-pack-objects: --revs to provide arguments to rev-list from stdin, > instead of list of objects. UNDOCUMENTED. Time spent whining about it is better spent finding it out yourself (UTSL) and writing it I suspect. You'll learn how things actually work while doing so ;-). > It would be nice if the following commands had --stdin or had it's --stdin > usage extended: > * git-diff-tree: --stdin to allow to provide path limits, separated > by ' -- ' from or pair of Something like this could be used to follow renames and do more interesting stuff. The caller would create a pair of pipes, throw first tree-pair at it, receive and examine the output, and decide what pathspec to use for the next one and continue. If you do not limit yourself to pathspec but e.g. allow -S to be also specified per invocation, then you can make it to follow line movements, not just renames. So in the bigger picture, I like what something like this can offer to the Porcelain writers. It is a separate story if it is worth building that _into_ diff-tree itself. A sane way to see if it is is to start by writing a rough equivalent that is: #!/bin/sh while read stuff; do # Note: this really needs to do shell quote git-diff-tree $stuff echo I am done with one record. done Then, write such a driver program to drive it and see if per-line invocation of diff-tree is really the bottleneck. For any useful and interesting application of this pattern, I suspect the process that drives this diff-tree loop will have enough computation in it, and the only thing you are saving is the start-up cost of diff-tree (i.e. fork+exec). I am not sure how much of the bottleneck it would be in the while thing. > * git-ls-tree: --stdin to loop over , one tree per line. Likewise. Also you would need to decide the record delimiter. > * git-cat-object: --stdin to loop over objects, plus -z to change separator > between records to NULL (or have it turned on by default). I would suggest TYPE SP BYTE-COUNT LF followed by payload instead. You cannot otherwise handle binary files with NUL (lesson of the day: NULL is a pointer, the character's name is NUL) in it. If you do not mind redundancy to help the consumer of this output, TYPE SP SHA-1 SP BYTE-COUNT LF followed by payload might even be a better choice. > For all "loop" --stdin, the output should begin with the line which was > arguments, like git-diff-tree outputs first used for diff. That's something you cannot decide on a whim without knowing how they are used and what convention is the most useful. I suspect that single record delimiter without frill might turn out to be more useful. Parrotting the arguments means you would need to make sure the output format can easily parsable -- the issues include that you need deal with embedded newlines in them. To quote them in the output routine is easy but the consumer now needs to know they need to be dequoted. > I think it is quite often to use git-rev-list ...| git-diff-tree ... > pipeline, so it might be worth to add --revs option to git-diff-tree. Are you talking about "git rev-list | git diff-tree --stdin"??? > P.S. does git-merge take -F option? No; it is not a Porcelain so ease of typing is not its goal. You can say longhand: git-merge "`cat $file`" ... Merge messages are supposed to be mostly automated and short, so command line length limit is not much of an issue here.