git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Rogan Dawes <lists@dawes.za.net>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Jon Smirl <jonsmirl@gmail.com>,
	Martin Langhoff <martin.langhoff@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: Figured out how to get Mozilla into git
Date: Sat, 10 Jun 2006 20:36:32 +0200	[thread overview]
Message-ID: <448B1130.8020005@dawes.za.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0606101041490.5498@g5.osdl.org>

Linus Torvalds wrote:
> 
> On Sat, 10 Jun 2006, Rogan Dawes wrote:
>> Here's an idea. How about separating trees and commits from the actual blobs
>> (e.g. in separate packs)? My reasoning is that the commits and trees should
>> only be a small portion of the overall repository size, and should not be that
>> expensive to transfer. (Of course, this is only a guess, and needs some
>> numbers to back it up.)
> 
> The trees in particular are actually a pretty big part of the history. 
> 
> More importantly, the blobs compress horribly badly in the absense of 
> history - a _lot_ of the compression in git packing comes very much from 
> the fact that we do a good job at delta-compression.
> 
> So if you get all of the commit/tree history, but none of the blob 
> history, you're actually not going to win that much space. As already 
> discussed, the _whole_ history packed with git is usually not insanely 
> bigger than just the whole unpacked tree (with no history at all).
> 
> So you'd think that getting just the top version of the tree would be a 
> much bigger space-saving that it actually is. If you _also_ get all the 
> tree and commit objects, the space saving is even less.
> 

One possibility, given that the full commit and tree history is so
large, is simply to get the HEAD commit and the trees that the commit
depends directly on, rather than fetching them all up front.

> I actually suspect that the most realistic way to handle this is to use 
> the "fetch.c" logic (ie the incremental fetcher used by http), and add 
> some mode to the git daemon where you fetch literally one object at a time 
> (ie this would be totally _separate_ from the pack-file thing: you'd not 
> ask for "git-upload-pack", you'd ask for something like 
> "git-serve-objects" instead).
> 
> The fetch.c logic really does allow for on-demand object fetching, and is 
> thus much more suitable for incomplete repositories.
> 
> HOWEVER. The fetch.c logic - by necessity - works on a object-by-object 
> level. That means that you'd get no delta compression AT ALL, and I 
> suspect that the downside of that would be a factor of ten expansion or 
> more, which means that it would really not work that well in practice.

Would it be possible to add a mode where fetch.c is given a list of 
desired objects, and returns a list of pointers to those objects? Then 
callers that already have such a list could be modified to pass the 
whole list at once, allowing at least SOME compression, and optimisation 
of round trips, etc? There would be a tradeoff in memory use, though, I 
guess.

Rogan

  parent reply	other threads:[~2006-06-10 18:36 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-09  2:17 Figured out how to get Mozilla into git Jon Smirl
2006-06-09  2:56 ` Nicolas Pitre
2006-06-09  3:06 ` Martin Langhoff
2006-06-09  3:28   ` Jon Smirl
2006-06-09  7:17     ` Jakub Narebski
2006-06-09 15:01       ` Linus Torvalds
2006-06-09 16:11         ` Nicolas Pitre
2006-06-09 16:30           ` Linus Torvalds
2006-06-09 17:38             ` Nicolas Pitre
2006-06-09 17:49               ` Linus Torvalds
2006-06-09 17:10           ` Jakub Narebski
2006-06-09 18:13   ` Jon Smirl
2006-06-09 19:00     ` Linus Torvalds
2006-06-09 20:17       ` Jon Smirl
2006-06-09 20:40         ` Linus Torvalds
2006-06-09 20:56           ` Jon Smirl
2006-06-09 21:57             ` Linus Torvalds
2006-06-09 22:17               ` Linus Torvalds
2006-06-09 23:16               ` Greg KH
2006-06-09 23:37               ` Martin Langhoff
2006-06-09 23:43                 ` Linus Torvalds
2006-06-10  0:00                   ` Jon Smirl
2006-06-10  0:11                     ` Linus Torvalds
2006-06-10  0:16                       ` Jon Smirl
2006-06-10  0:45                         ` Jon Smirl
2006-06-09 20:44         ` Jakub Narebski
2006-06-09 21:05         ` Nicolas Pitre
2006-06-09 21:46           ` Jon Smirl
2006-06-10  1:23         ` Martin Langhoff
2006-06-10  1:14   ` Martin Langhoff
2006-06-10  1:33     ` Linus Torvalds
2006-06-10  1:43       ` Linus Torvalds
2006-06-10  1:48         ` Jon Smirl
2006-06-10  1:59           ` Linus Torvalds
2006-06-10  2:21             ` Jon Smirl
2006-06-10  2:34               ` Carl Worth
2006-06-10  3:08                 ` Linus Torvalds
2006-06-10  8:21                   ` Jakub Narebski
2006-06-10  9:00                     ` Junio C Hamano
2006-06-10  8:36                   ` Rogan Dawes
2006-06-10  9:08                     ` Junio C Hamano
2006-06-10 14:47                       ` Rogan Dawes
2006-06-10 14:58                         ` Jakub Narebski
2006-06-10 15:14                         ` Nicolas Pitre
2006-06-10 17:53                     ` Linus Torvalds
2006-06-10 18:02                       ` Jon Smirl
2006-06-10 18:36                       ` Rogan Dawes [this message]
2006-06-10  3:01               ` Linus Torvalds
2006-06-10  2:30             ` Jon Smirl
2006-06-10  3:41             ` Martin Langhoff
2006-06-10  3:55               ` Junio C Hamano
2006-06-10  4:02               ` Linus Torvalds
2006-06-10  4:11                 ` Linus Torvalds
2006-06-10  6:02                   ` Jon Smirl
2006-06-10  6:15                     ` Junio C Hamano
2006-06-10 15:44                       ` Jon Smirl
2006-06-10 16:15                         ` Timo Hirvonen
2006-06-10 18:37                         ` Petr Baudis
2006-06-10 18:55                         ` Lars Johannsen
2006-06-11 22:00       ` Nicolas Pitre
2006-06-18 19:26         ` Linus Torvalds
2006-06-18 21:40           ` Martin Langhoff
2006-06-18 22:36             ` Linus Torvalds
2006-06-18 22:51               ` Broken PPC sha1.. (Re: Figured out how to get Mozilla into git) Linus Torvalds
2006-06-18 23:25                 ` [PATCH] Fix PPC SHA1 routine for large input buffers Paul Mackerras
2006-06-19  5:02                   ` Linus Torvalds
2006-06-09  3:12 ` Figured out how to get Mozilla into git Pavel Roskin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=448B1130.8020005@dawes.za.net \
    --to=lists@dawes.za.net \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=martin.langhoff@gmail.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).