git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Dana How <danahow@gmail.com>
Cc: Junio C Hamano <junkio@cox.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v2] Custom compression levels for objects and packs
Date: Wed, 09 May 2007 11:27:28 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.0.99.0705091048120.24220@xanadu.home> (raw)
In-Reply-To: <56b7f5510705090221g38ab0973x8631dacc601abb16@mail.gmail.com>

On Wed, 9 May 2007, Dana How wrote:

> OK,  I got really confused here, so I looked over the code,
> and figured out 2 causes for my confusion.
> (1) core.legacyheaders controls use_legacy_headers, which defaults to 1.
> So currently all loose objects are in legacy format and the code block
> I spoke of doesn't trigger [without a config setting].  I didn't realize
> legacy headers were still being produced (mislead by the name!).
> (2) I read your "setting core.legacyheaders" as followed by TRUE,
> but you meant FALSE, which is not the default.
> 
> I also read that 1 year after 1.4.2, the default for core.legacyheaders is
> going
> to change to FALSE.  I think our discussion should assume this has
> happened.

<tangential comment>

Now that we encourage and actively preserve objects in a packed form 
more agressively than we did at the time core.legacyheaders was 
introduced, I really wonder if this option is still worth it.  Because 
the packing of loose objects has to go through the delta match loop 
anyway, and since most of them should end up being deltified in most 
cases, there is really little advantage to have this parallel loose 
object format as the CPU savings it might provide is rather lost in the 
noise in the end.  

So I suggest that we get rid of core.legacyheaders, preserve the legacy 
format as the only writable loose object format and deprecate the other 
one to keep things simpler.  Thoughts?

</tangential comment>

> So let's assume FALSE in the following.  The point of that is that 
> such a FALSE setting can't be assumed to have any special intent; it 
> will be the default.
> 
> [Everything I write here boils down to only one question,
> which I repeat at the end.]
> 
> Data gets into a pack in these ways:
> 1. Loose object copied in;
> 2. Loose object newly deltified;
> 3. Packed object to be copied;
> 4. Packed object to be newly deltified;
> 5. Packed deltified object we can't re-use;
> 6. Packed deltified object we can re-use.
> ["copied" includes recompressed.]

I think you forgets "packed undeltified objects we can reuse".

> In (2), (4), and (5), pack.compression will always be newly used.
> If pack.compression doesn't change,  this means (6)
> will be using pack.compression since it comes from (2) or (4).
> So if I "guarantee" that (1) uses pack.compression,
> (3) will as well, meaning everything in the pack will be
> at pack.compression.
> 
> Thus if pack.compression != core.loosecompression takes precedence
> over core.legacyheaders = false,  then for pack.compression constant
> we get all 6 cases at level pack.compression.  If core.legacyheaders =
> false takes precedence as you suggest,  then all undeltified objects
> (20%?) will be stuck at core.loosecompression [since I see no way
> to sensibly re-apply compression to something copied pack-to-pack].
> 
> I think this is inconsistent with what a pack.compression !=
> core.loosecompression setting is telling us.

OK I see that I missed the fact that git-repack -f (or git-pack-objects 
--no-reuse-delta) does not recompress undeltified objects.  Note this is 
a problem in the case where you change pack.compression to a different 
value or override it on the command line as well: reused undeltified 
objects won't get recompressed with the new level.  My rant on 
core.legacyheaders and its removal would address the first case.  Your 
test for a difference between loose and packed compression levels is 
flawed because the value of core.compression does not necessarily 
represent the compression level that was used for the loose objects to 
pack (core.compression might have been modified since then), but it 
only addresses the first case too.  And this is a problem even now.

What we need instead is a --no-reuse-object that would force 
recompression of everything when you really want to enforce a specific 
compression level across the whole pack(s).


Nicolas

  reply	other threads:[~2007-05-09 15:27 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-08 22:38 [PATCH v2] Custom compression levels for objects and packs Dana How
2007-05-08 23:56 ` Junio C Hamano
2007-05-09  0:16   ` Nicolas Pitre
2007-05-09  0:29     ` Dana How
2007-05-09  1:03       ` Nicolas Pitre
2007-05-09  6:46         ` Dana How
2007-05-09  7:13           ` Junio C Hamano
2007-05-09  0:25   ` Dana How
2007-05-09  1:23     ` Nicolas Pitre
2007-05-09  9:21       ` Dana How
2007-05-09 15:27         ` Nicolas Pitre [this message]
2007-05-09 16:26           ` Junio C Hamano
2007-05-09 16:42             ` Dana How
2007-05-09 16:59             ` [PATCH] make "repack -f" imply "pack-objects --no-reuse-object" Nicolas Pitre
2007-05-09 18:42             ` [PATCH] deprecate the new loose object header format Nicolas Pitre
2007-05-09 20:16               ` Dana How
2007-05-09 20:42                 ` Nicolas Pitre
2007-05-09 21:00                   ` Dana How
2007-05-09  5:59     ` [PATCH v2] Custom compression levels for objects and packs Junio C Hamano
2007-05-09  6:24       ` Dana How
2007-05-09  0:30 ` Petr Baudis
2007-05-09 13:56 ` Theodore Tso
2007-05-09 16:44   ` Dana How

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.99.0705091048120.24220@xanadu.home \
    --to=nico@cam.org \
    --cc=danahow@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).