git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derek Fawcus <dfawcus@cisco.com>
To: git@vger.kernel.org
Subject: Re: space compression (again)
Date: Fri, 15 Apr 2005 19:50:38 +0100	[thread overview]
Message-ID: <20050415195038.E6735@mrwint.cisco.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0504151232160.27637@cag.csail.mit.edu>; from cscott@cscott.net on Fri, Apr 15, 2005 at 01:19:30PM -0400

On Fri, Apr 15, 2005 at 01:19:30PM -0400, C. Scott Ananian wrote:
> Why are blobs per-file?  [After all, Linus insists that files are an 
> illusion.]  Why not just have 'chunks', and assemble *these* 
> into blobs (read, 'files')?  A good chunk size would fit evenly into some 
> number of disk blocks (no wasted space!).

[ I've only been earwigging,  not paying a lot of attention,  however ...]

Funny I was just think of this having read Linus' discourse on
"files don't matter", the obvious chunking factor would be say
a function.

The problem being tending towards having very small files - I know
I tend to prefer small functions.  Hmm - a underlying filesystem that
efficiently stores small files - why does that ring a bell :-)

However the simple answer is to have a preparser for a file / tree
checkin which split say a .c file into it's associated chunks,  anf
represented it in git as a signed/hashed object.  i.e. a automatically
created extra level of indirection (as I seem to recall was added
somewhere else?).

  So say fred.c:

  /*
   * File boiler
   */
  #include <guff>
  #include <more guff>

  /*
   * Fn a boiler
   */
  int fn_a(args) {
  }

  /*
   * Fn b boiler
   */
  long fn_b(args) {
  }

Would be split into 4 parts within git,  the 'file object' which simply
points to the content objects,  and 3 contents objects,  being the stuff
before 'Fn a boiler',  fn_a and it's boiler,  fn_b and it's boiler.

The interesting bit is needing a preprocessor which can roughly parse
the code - i.e. detect where to place the boiler blocks.

You would then do most of your tree operations upon the file objects,
but get the space savings from the content objects being shared.

I suspect that simply to prevent pathological conditions you'd have to
arrange that the contents objects have a minimal size,  irrespective
of the number of desired chunks (functions) they would naturally
contain.  i.e. for compresion efficiency,  you may choose something like
2K as the minimal pre compression content object size.

DF

  parent reply	other threads:[~2005-04-15 18:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-15 17:19 space compression (again) C. Scott Ananian
2005-04-15 18:34 ` Linus Torvalds
2005-04-15 18:45   ` C. Scott Ananian
2005-04-15 19:00     ` Derek Fawcus
2005-04-15 19:11     ` Linus Torvalds
2005-04-16 14:39       ` Martin Uecker
2005-04-16 15:11         ` C. Scott Ananian
2005-04-16 17:37           ` Martin Uecker
2005-04-19 12:39             ` Martin Uecker
2005-04-15 18:50 ` Derek Fawcus [this message]
  -- strict thread matches above, loose matches on Subject: below --
2005-04-15 19:33 Ray Heasman
2005-04-16 12:29 ` David Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050415195038.E6735@mrwint.cisco.com \
    --to=dfawcus@cisco.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).