git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Rogan Dawes <lists@dawes.za.net>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Jon Smirl <jonsmirl@gmail.com>,
	Martin Langhoff <martin.langhoff@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: Figured out how to get Mozilla into git
Date: Sat, 10 Jun 2006 10:36:12 +0200	[thread overview]
Message-ID: <448A847C.20105@dawes.za.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0606092001590.5498@g5.osdl.org>

Linus Torvalds wrote:
> 
> On Fri, 9 Jun 2006, Carl Worth wrote:
> 
>> On Fri, 9 Jun 2006 22:21:17 -0400, "Jon Smirl" wrote:
>>> Could you clone the repo and delete changesets earlier than 2004? Then
>>> I would clone the small repo and work with it. Later I decide I want
>>> full history, can I pull from a full repository at that point and get
>>> updated? That would need a flag to trigger it since I don't want full
>>> history to come over if I am just getting updates from someone else's
>>> tree that has a full history.
>> This is clearly a desirable feature, and has been requested by several
>> people (including myself) looking to switch some large-ish histories
>> from an existing system to git.
> 
> The thing is, to some degree it's really fundamentally hard.
> 
> It's easy for a linear history. What you do for a linear history is to 
> just get the top commit, and the tree associated with it, and then you 
> cauterize the parent by just grafting it to go away. Boom. You're done.
> 
> The problems are that if the preceding history _wasn't_ linear (or, in 
> fact, _subsequent_ development refers to it by having branched off at an 
> earlier point), and you try to pull your updates, the other end (that 
> knows about all the history) will assume you have all the history that you 
> don't have, and will send you a pack assuming that.
> 
> Which won't even necessarily have all the tree/blob objects (it assumed 
> you already had them), but more annoyingly, the history won't be 
> cauterized, and you'll have dangling commits. Which you can cauterize by 
> hand, of course, but you literally _will_ have to get the objects and 
> cauterize the thing by hand.
> 
> You're right that it's not "fundamentally impossible" to do: the git 
> format certainly _allows_ it. But the git protocol handshake really does 
> end up optimizing away all the unnecessary work by knowing that the other 
> side will have all the shared history, so lacking the shared history will 
> mean that you're a bit screwed.

Here's an idea. How about separating trees and commits from the actual 
blobs (e.g. in separate packs)? My reasoning is that the commits and 
trees should only be a small portion of the overall repository size, and 
should not be that expensive to transfer. (Of course, this is only a 
guess, and needs some numbers to back it up.)

So, a shallow clone would receive all of the tree objects, and all of 
the commit objects, and could then request a pack containing the blobs 
represented by the current HEAD.

In this way, the user has a history that will show all of the commit 
messages, and would be able to see _which_ files have changed over time 
e.g. gitk would still work - except for the actual file level diff, "git 
log" should also still work, etc

This would also enable other optimisations.

For example, documentation people would only need to get the objects 
under the doc/ tree, and would not need to actually check out the 
source. Git could detect any actual changes by checking whether it has 
the previous blob in its local repository, and whether the file exists 
locally. Creating a patch would obviously require that the person checks 
out the previous version, but one could theoretically commit a new blob 
to a repo without having the previous one (not saying that this would be 
a good idea, of course)

This would probably require Eric Biederman's "direct access to blob" 
patches, I guess, in order to be feasible.

Regards,

Rogan

  parent reply	other threads:[~2006-06-10  8:36 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-09  2:17 Figured out how to get Mozilla into git Jon Smirl
2006-06-09  2:56 ` Nicolas Pitre
2006-06-09  3:06 ` Martin Langhoff
2006-06-09  3:28   ` Jon Smirl
2006-06-09  7:17     ` Jakub Narebski
2006-06-09 15:01       ` Linus Torvalds
2006-06-09 16:11         ` Nicolas Pitre
2006-06-09 16:30           ` Linus Torvalds
2006-06-09 17:38             ` Nicolas Pitre
2006-06-09 17:49               ` Linus Torvalds
2006-06-09 17:10           ` Jakub Narebski
2006-06-09 18:13   ` Jon Smirl
2006-06-09 19:00     ` Linus Torvalds
2006-06-09 20:17       ` Jon Smirl
2006-06-09 20:40         ` Linus Torvalds
2006-06-09 20:56           ` Jon Smirl
2006-06-09 21:57             ` Linus Torvalds
2006-06-09 22:17               ` Linus Torvalds
2006-06-09 23:16               ` Greg KH
2006-06-09 23:37               ` Martin Langhoff
2006-06-09 23:43                 ` Linus Torvalds
2006-06-10  0:00                   ` Jon Smirl
2006-06-10  0:11                     ` Linus Torvalds
2006-06-10  0:16                       ` Jon Smirl
2006-06-10  0:45                         ` Jon Smirl
2006-06-09 20:44         ` Jakub Narebski
2006-06-09 21:05         ` Nicolas Pitre
2006-06-09 21:46           ` Jon Smirl
2006-06-10  1:23         ` Martin Langhoff
2006-06-10  1:14   ` Martin Langhoff
2006-06-10  1:33     ` Linus Torvalds
2006-06-10  1:43       ` Linus Torvalds
2006-06-10  1:48         ` Jon Smirl
2006-06-10  1:59           ` Linus Torvalds
2006-06-10  2:21             ` Jon Smirl
2006-06-10  2:34               ` Carl Worth
2006-06-10  3:08                 ` Linus Torvalds
2006-06-10  8:21                   ` Jakub Narebski
2006-06-10  9:00                     ` Junio C Hamano
2006-06-10  8:36                   ` Rogan Dawes [this message]
2006-06-10  9:08                     ` Junio C Hamano
2006-06-10 14:47                       ` Rogan Dawes
2006-06-10 14:58                         ` Jakub Narebski
2006-06-10 15:14                         ` Nicolas Pitre
2006-06-10 17:53                     ` Linus Torvalds
2006-06-10 18:02                       ` Jon Smirl
2006-06-10 18:36                       ` Rogan Dawes
2006-06-10  3:01               ` Linus Torvalds
2006-06-10  2:30             ` Jon Smirl
2006-06-10  3:41             ` Martin Langhoff
2006-06-10  3:55               ` Junio C Hamano
2006-06-10  4:02               ` Linus Torvalds
2006-06-10  4:11                 ` Linus Torvalds
2006-06-10  6:02                   ` Jon Smirl
2006-06-10  6:15                     ` Junio C Hamano
2006-06-10 15:44                       ` Jon Smirl
2006-06-10 16:15                         ` Timo Hirvonen
2006-06-10 18:37                         ` Petr Baudis
2006-06-10 18:55                         ` Lars Johannsen
2006-06-11 22:00       ` Nicolas Pitre
2006-06-18 19:26         ` Linus Torvalds
2006-06-18 21:40           ` Martin Langhoff
2006-06-18 22:36             ` Linus Torvalds
2006-06-18 22:51               ` Broken PPC sha1.. (Re: Figured out how to get Mozilla into git) Linus Torvalds
2006-06-18 23:25                 ` [PATCH] Fix PPC SHA1 routine for large input buffers Paul Mackerras
2006-06-19  5:02                   ` Linus Torvalds
2006-06-09  3:12 ` Figured out how to get Mozilla into git Pavel Roskin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=448A847C.20105@dawes.za.net \
    --to=lists@dawes.za.net \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=martin.langhoff@gmail.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).