From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: Handling large files with GIT Date: Tue, 14 Feb 2006 17:39:51 -0800 Message-ID: <7vslqlo0wo.fsf@assigned-by-dhcp.cox.net> References: <46a038f90602080114r2205d72cmc2b5c93f6fffe03d@mail.gmail.com> <87slqty2c8.fsf@mid.deneb.enyo.de> <46a038f90602081435x49e53a1cgdc56040a19768adb@mail.gmail.com> <43F113A5.2080506@f2s.com> <43F249F7.5060008@vilain.net> <7vy80dpo9g.fsf@assigned-by-dhcp.cox.net> <43F27878.50701@vilain.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: git@vger.kernel.org X-From: git-owner@vger.kernel.org Wed Feb 15 02:39:58 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1F9Be8-0006A5-PQ for gcvg-git@gmane.org; Wed, 15 Feb 2006 02:39:57 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932376AbWBOBjy (ORCPT ); Tue, 14 Feb 2006 20:39:54 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932447AbWBOBjy (ORCPT ); Tue, 14 Feb 2006 20:39:54 -0500 Received: from fed1rmmtao01.cox.net ([68.230.241.38]:62631 "EHLO fed1rmmtao01.cox.net") by vger.kernel.org with ESMTP id S932376AbWBOBjx (ORCPT ); Tue, 14 Feb 2006 20:39:53 -0500 Received: from assigned-by-dhcp.cox.net ([68.4.9.127]) by fed1rmmtao01.cox.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with ESMTP id <20060215013842.VBAX15695.fed1rmmtao01.cox.net@assigned-by-dhcp.cox.net>; Tue, 14 Feb 2006 20:38:42 -0500 To: Sam Vilain In-Reply-To: <43F27878.50701@vilain.net> (Sam Vilain's message of "Wed, 15 Feb 2006 13:40:24 +1300") User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Sam Vilain writes: > ... Clearly, it needs to be out of the "tree". OK. > 2. forensic - extra stuff at the end of the commit object? (except "extra at the end of commit", which does not make it out of the tree). > eg > Copied: /new/path from /old/path:commit:c0bb171d.. > (for SVN case where history matters) > Copied: /new/path from blob:b10b1d.. > (for general pre-caching case) > Merged: /new/path from /old/path:commit:C0bb171d.. > (for an SVK clone, so we know that subsequent merges on > /new/path need only merge from /old/path starting at commit > C0bb171d..) I am not sure if recording the bare SVN ``copied'' is very useful. You would need to infer things from what SVN did to tell if the copy is a tree copy inside a project (e.g. cp -r i386 x86_64), tagging (e.g. svn-cp rHEAD trunk tags/v1.2), or branching, wouldn't you? SVK merge ticket is a bit more useful in that sense. So far, git philosophy is to record things you _know_ about and defer such guesswork to the future, so limiting what you record to what you can actually see from the foreign SCM would be more in line with it. For the same reason, if you are talking about maildir managed under git, you should not have record anything other than what git already records: "we used to have these files, now we have these instead". But I thought you were talking about caching what earlier inference declared what happened, so that you do not have to do the same inference every time. If that is the case, SVN level "Copied:" is probably not what you would want to record, I suspect. You would do some inference with the given information ("SVN says it copied this tree to that tree, what was it that it really wanted to do? Was it a copy, or was it to create a branch which was implemented as a copy?"), and record that, hoping that information would help your other operations this time and later. So I think the order of questions you should be asking is: - what operations are you trying to help? - what information you would need to achieve those operations better? - among the second one, what will be necessary to be set in stone (IOW, cannot be computed later), and what are computable but expensive to recompute every time? An example from an ancient thread. With criss-cross merge between renamed trees, it was conjectured that recording renames detected earlier would help later merges. I think you should arrive at the list of "what we should record" by thinking things in this order: (1) currently criss-cross merge between renamed trees does not work well (realization of the status quo); (2) if we had this kind of information it would work better, here are the things we need to record when a new commit is made, and here is how to compute other information that can be inferred, and here is how to use that information to make the merge work better (solution without caching); (3) but it is expensive to recompute information we said computable in (2) if we were to do so every time. Let's cache it. I am getting an impression that you are doing only the first half of (2) without other parts, which somewhat bothers me.