git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sam Vilain <sam@vilain.net>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Handling large files with GIT
Date: Wed, 15 Feb 2006 17:03:52 +1300	[thread overview]
Message-ID: <43F2A828.2050102@vilain.net> (raw)
In-Reply-To: <7vslqlo0wo.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> So I think the order of questions you should be asking is:
> 
>   1 - what operations are you trying to help?

Primarily, tracing history when dealing with history/changeset based
revision systems like SVN or darcs, and doing this in a manner that we
can make guarantees about behaving in the same way as those systems
would.

>   2 - what information you would need to achieve those operations
>       better?

Minimally, this tuple:

   ( merge|copy, source_path, source_tree|source_commit,
     destination_path, destination_commit )

It makes sense to record this with commits, as conceptually it is a part
of the intended commit history along with the change comment.

>   3 - among the second one, what will be necessary to be set in
>       stone (IOW, cannot be computed later), and what are
>       computable but expensive to recompute every time?

The only operation you cannot automatically and with certainty detect a 
rename and change content without inserting a dummy commit between the 
name change and the content change.  But in a sense this is the same as
my suggestion - using the commit object history to record information
that normally doesn't matter when you are doing content-keyed
operations.

> I am getting an impression that you are doing only the first
> half of (2) without other parts, which somewhat bothers me.

Well, thank you for spending so much time to reply to me given that was
your assessment.  I think the best direction from here would be to start
molding some porcelain, then I can cross this bridge when I come to it
rather than simply speculating and hand-waving.

Besides, I can always prototype it for discussion using the commit
description as a surrogate container for the information.

Sam.

ps I also responded to the rest of your e-mail, but decided that the 
answers to the above questions were more important.

 >>  2. forensic - extra stuff at the end of the commit object?
 > (except "extra at the end of commit", which does not make it out
 > of the tree).

It is a part of the repository, but more a property of the commit itself
- like the commit description.  Like somebody writing "I renamed this
file to that file and changed its contents", but in a parsable form
that can _optionally_ be used to prevent the relevant git-core tools
from having to do content comparison, or perhaps something subtler like
increasing the score of the recorded history branch when scoring
alternatives looking for history.

 >>     eg
 >>        Copied: /new/path from /old/path:commit:c0bb171d..
 >>          (for SVN case where history matters)
 >>        Copied: /new/path from blob:b10b1d..
 >>          (for general pre-caching case)
 >>        Merged: /new/path from /old/path:commit:C0bb171d..
 >>          (for an SVK clone, so we know that subsequent merges on
 >>           /new/path need only merge from /old/path starting at commit
 >>           C0bb171d..)
 > I am not sure if recording the bare SVN ``copied'' is very
 > useful.  You would need to infer things from what SVN did to
 > tell if the copy is a tree copy inside a project (e.g. cp -r
 > i386 x86_64), tagging (e.g. svn-cp rHEAD trunk tags/v1.2), or
 > branching, wouldn't you?  SVK merge ticket is a bit more useful
 > in that sense.

In the SVN model there really is no difference between these cases.  Of
course the actual representation of these in the object does not matter;
the above is the what, not the how.  But in general, SVN only records
copying; it has no repository concept of merge, branch, tag, rename.
SVK adds merging to the picture.

Representing an SVN tree copy as a new sub-tree in a git repository
should still be a "cheap copy", it's just that all the tools will not
(and probably should not) see it as a branch but a copy.

 > So far, git philosophy is to record things you _know_ about and
 > defer such guesswork to the future, so limiting what you record
 > to what you can actually see from the foreign SCM would be more
 > in line with it.

Yes, and if I am mirroring an SVN repository, then I only know that in
that repository, the history /was recorded/ as such.  Not the history
/is/ as such, that's a different question, and is the guesswork worth
being defered to the future.

 > For the same reason, if you are talking about
 > maildir managed under git, you should not have record anything
 > other than what git already records: "we used to have these
 > files, now we have these instead".

Ok.  As Martin pointed out, the Maildir situation is actually a simple
case.  In a sense, I hijacked a vaguely related thread to resolve my
Warnock dilemma :)

 > But I thought you were talking about caching what earlier
 > inference declared what happened, so that you do not have to do
 > the same inference every time.  If that is the case, SVN level
 > "Copied:" is probably not what you would want to record, I
 > suspect.  You would do some inference with the given information
 > ("SVN says it copied this tree to that tree, what was it that it
 > really wanted to do?  Was it a copy, or was it to create a
 > branch which was implemented as a copy?"), and record that,
 > hoping that information would help your other operations this
 > time and later.

Well, this is already guesswork defered to the future that the
Subversion authors inflict on the users of Subversion repositories.  If
you read the Subversion manual you will find recommendations to
studiously record this information and to use a standard repository
layout so that other people will understand what your copies were
intended to be.

  reply	other threads:[~2006-02-15  4:04 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-08  9:14 Handling large files with GIT Martin Langhoff
2006-02-08 11:54 ` Johannes Schindelin
2006-02-08 16:34   ` Linus Torvalds
2006-02-08 17:01     ` Linus Torvalds
2006-02-08 20:11       ` Junio C Hamano
2006-02-08 21:20 ` Florian Weimer
2006-02-08 22:35   ` Martin Langhoff
2006-02-13  1:26     ` Ben Clifford
2006-02-13  3:42       ` Linus Torvalds
2006-02-13  4:57         ` Linus Torvalds
2006-02-13  5:05           ` Linus Torvalds
2006-02-13 23:17             ` Ian Molton
2006-02-13 23:19               ` Martin Langhoff
2006-02-14 18:56               ` Johannes Schindelin
2006-02-14 19:52                 ` Linus Torvalds
2006-02-14 21:21                   ` Sam Vilain
2006-02-14 22:01                     ` Linus Torvalds
2006-02-14 22:30                       ` Junio C Hamano
2006-02-15  0:40                         ` Sam Vilain
2006-02-15  1:39                           ` Junio C Hamano
2006-02-15  4:03                             ` Sam Vilain [this message]
2006-02-15  2:07                           ` Martin Langhoff
2006-02-15  2:05                         ` Linus Torvalds
2006-02-15  2:18                           ` Linus Torvalds
2006-02-15  2:33                             ` Linus Torvalds
2006-02-15  3:58                               ` Linus Torvalds
2006-02-15  9:54                                 ` Junio C Hamano
2006-02-15 15:44                                   ` Linus Torvalds
2006-02-15 17:16                                     ` Linus Torvalds
2006-02-16  3:25                                   ` Linus Torvalds
2006-02-16  3:29                                     ` Junio C Hamano
2006-02-16 20:32                                 ` Fredrik Kuivinen
2006-02-13  5:55           ` Jeff Garzik
2006-02-13  6:07             ` Keith Packard
2006-02-14  0:07               ` Martin Langhoff
2006-02-13 16:19             ` Linus Torvalds
2006-02-13  4:40       ` Martin Langhoff
2006-02-09  4:54   ` Greg KH
2006-02-09  5:38     ` Martin Langhoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43F2A828.2050102@vilain.net \
    --to=sam@vilain.net \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).