git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "C. Scott Ananian" <cscott@cscott.net>
To: Martin Uecker <muecker@gmx.de>
Cc: git@vger.kernel.org
Subject: Re: space compression (again)
Date: Sat, 16 Apr 2005 11:11:00 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.61.0504161101470.29343@cag.csail.mit.edu> (raw)
In-Reply-To: <20050416143905.GA10370@macavity>

On Sat, 16 Apr 2005, Martin Uecker wrote:

> The right thing (TM) is to switch from SHA1 of compressed
> content for the complete monolithic file to a merkle hash tree
> of the uncompressed content. This would make the hash
> independent of the actual storage method (chunked or not).

It would certainly be nice to change to a hash of the uncompressed 
content, rather than a hash of the compressed content, but it's not 
strictly necessary, since files are fetched all at once: there's not 'read 
subrange' operation on blobs.

I assume 'merkle hash tree' is talking about:
   http://www.open-content.net/specs/draft-jchapweske-thex-02.html
..which is very interesting, but not quite what I was thinking.
The merkle hash approach seems to require fixed chunk boundaries.
The rsync approach does not use fixed chunk boundaries; this is necessary 
to ensure good storage reuse for the expected case (ie; inserting a single 
line at the start or in the middle of the file, which changes all the 
chunk boundaries).

Further, in the absence of subrange reads on blobs, it's not entirely 
clear what using a merkle hash would buy you.
  --scott

WASHTUB supercomputer security Mk 48 justice ODUNIT radar COBRA JANE 
SSBN 731 BATF KUJUMP SECANT operation class struggle SYNCARP KGB ODACID
                          ( http://cscott.net/ )

  reply	other threads:[~2005-04-16 15:07 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-15 17:19 space compression (again) C. Scott Ananian
2005-04-15 18:34 ` Linus Torvalds
2005-04-15 18:45   ` C. Scott Ananian
2005-04-15 19:00     ` Derek Fawcus
2005-04-15 19:11     ` Linus Torvalds
2005-04-16 14:39       ` Martin Uecker
2005-04-16 15:11         ` C. Scott Ananian [this message]
2005-04-16 17:37           ` Martin Uecker
2005-04-19 12:39             ` Martin Uecker
2005-04-15 18:50 ` Derek Fawcus
  -- strict thread matches above, loose matches on Subject: below --
2005-04-15 19:33 Ray Heasman
2005-04-16 12:29 ` David Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.61.0504161101470.29343@cag.csail.mit.edu \
    --to=cscott@cscott.net \
    --cc=git@vger.kernel.org \
    --cc=muecker@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).