git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: sf <sf-gmane@stephan-feder.de>
Cc: git@vger.kernel.org
Subject: Re: [RFC]: Pack-file object format for individual objects (Was:   Revisiting large binary files issue.)
Date: Tue, 11 Jul 2006 15:26:49 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0607111520020.5623@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0607111512420.5623@g5.osdl.org>



On Tue, 11 Jul 2006, Linus Torvalds wrote:
> 
>  - for low bits: CM (compression method):
> 
>         "This identifies the compression method used in the file. CM = 8
>          denotes the "deflate" compression method with a window size up
>          to 32K.  This is the method used by gzip and PNG (see
>          references [1] and [2] in Chapter 3, below, for the reference
>          documents).  CM = 15 is reserved.  It might be used in a future
>          version of this specification to indicate the presence of an
>          extra field before the compressed data."
> 
>  - four high bits are CINFO: 
> 
>         "For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
>          size, minus eight (CINFO=7 indicates a 32K window size). Values
>          of CINFO above 7 are not allowed in this version of the
>          specification.  CINFO is not defined in this specification for
>          CM not equal to 8."
> 
> so 0x78 means "deflate with 32kB window size", but I don't see anything 
> guaranteeing that we might not see something else for an object that 
> cannot be compressed, for example.

Ahh. Looking at the zlib sources, I see

    /* Write the zlib header */
    if (s->status == INIT_STATE) {

        uInt header = (Z_DEFLATED + ((s->w_bits-8)<<4)) << 8;
        uInt level_flags = (s->level-1) >> 1;
     
        if (level_flags > 3) level_flags = 3;
        header |= (level_flags << 6);
        if (s->strstart != 0) header |= PRESET_DICT;
        header += 31 - (header % 31);

        s->status = BUSY_STATE;
        putShortMSB(s, header);

(which is that first 16-bit word, MSB first). So we'll always have the 
Z-DEFLATED (8) there in the low four bits, but the high nybble will be 
"s->w_bits-8" where w_bits comes from windowBits, and I think we can 
depend on it beign 15:

    "The windowBits parameter is the base two logarithm of the window size
   (the size of the history buffer).  It should be in the range 8..15 for this
   version of the library. Larger values of this parameter result in better
   compression at the expense of memory usage. The default value is 15 if
   deflateInit is used instead."

so since we use deflateInit(), we know the window will be 15.

So I guess we _can_ depend on the first byte being 0x78 for our use.

Goodie.

		Linus

  reply	other threads:[~2006-07-11 22:27 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-10 23:01 Revisiting large binary files issue Carl Baldwin
2006-07-10 23:14 ` Junio C Hamano
2006-07-11  6:20   ` Peter Baumann
2006-07-10 23:28 ` Linus Torvalds
2006-07-11  9:40   ` [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) sf
2006-07-11 18:00     ` Linus Torvalds
2006-07-11 21:45       ` sf
2006-07-11 22:17         ` Linus Torvalds
2006-07-11 22:26           ` Linus Torvalds [this message]
2006-07-11 14:55   ` Revisiting large binary files issue Carl Baldwin
2006-07-11 17:09     ` Linus Torvalds
2006-07-11 17:10       ` [PATCH 1/3] Make the unpacked object header functions static to sha1_file.c Linus Torvalds
2006-07-11 17:12       ` [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" Linus Torvalds
2006-07-11 18:40         ` Johannes Schindelin
2006-07-11 18:58           ` Linus Torvalds
2006-07-11 19:20             ` Johannes Schindelin
2006-07-11 19:48               ` Linus Torvalds
2006-07-11 21:25                 ` Johannes Schindelin
2006-07-11 21:47                 ` Junio C Hamano
2006-07-11 21:24         ` sf
2006-07-11 22:09           ` Linus Torvalds
2006-07-11 22:25             ` sf
2006-07-11 23:03             ` Junio C Hamano
2006-07-12  0:03               ` Linus Torvalds
2006-07-12  0:39                 ` Johannes Schindelin
2006-07-12  3:45                   ` Linus Torvalds
2006-07-12  4:31                     ` Linus Torvalds
2006-07-12  6:35                     ` Junio C Hamano
2006-07-12 16:29                       ` Linus Torvalds
2006-07-12  0:46                 ` Junio C Hamano
2006-07-12  3:42                   ` Linus Torvalds
2006-07-12  6:49                 ` Peter Baumann
2006-07-12  7:16                   ` Junio C Hamano
2006-07-12  8:28                     ` Peter Baumann
2006-07-12 15:13                   ` Linus Torvalds
2006-07-12 15:27                     ` Junio C Hamano
2006-07-11 17:16       ` [PATCH 3/3] Enable the new binary header format for unpacked objects Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0607111520020.5623@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=sf-gmane@stephan-feder.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).