From: "Dana How" <danahow@gmail.com>
To: "Nicolas Pitre" <nico@cam.org>
Cc: "Junio C Hamano" <junkio@cox.net>,
"Git Mailing List" <git@vger.kernel.org>,
danahow@gmail.com
Subject: Re: [PATCH v2] Custom compression levels for objects and packs
Date: Wed, 9 May 2007 02:21:45 -0700 [thread overview]
Message-ID: <56b7f5510705090221g38ab0973x8631dacc601abb16@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.0.99.0705082106590.24220@xanadu.home>
On 5/8/07, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 8 May 2007, Dana How wrote:
> > I think it should be straightforward for me to re-submit this
> > based on current master.
> Since this patch is simpler it could be merged much faster, before the
> pack limit series.
Yes, I will do "custom compression" patch first.
> > > > + /* differing core & pack compression when loose object -> must
> > > recompress */
> > > I am not sure if that is worth it, as you do not know if the
> > > loose object you are looking at were compressed with the current
> > > settings.
> > You do not know for certain, that is correct. However, config
> > settings setting unequal compression levels signal that you
> > care differently about the two cases. (For me, I want the
> > compression investment to correspond to the expected lifetime of the file.)
> > Also, *if* we have the knobs we want in the config file,
> > I don't think we're going to be changing these settings all that often.
> >
> > If I didn't have this check forcing recompression in the pack,
> > then in the absence of deltification each object would enter the pack
> > by being copied (in the preceding code block) and pack.compression
> > would have little effect. I actually experienced this the very first
> > time I imported a large dataset into git (I was trying to achieve the
> > effect of this patch by changing core.compression dynamically, and
> > was a bit mystified for a while by the result).
> >
> > Thus, if core.loosecompression is set to speed up git-add, I should
> > take the time to recompress the object when packing if pack.compression
> > is different (of course the hit of not doing so will be lessened by
> > deltification
> > which forces a new compression).
>
> Right. And this also depends whether or not you have core.legacyheaders
> set to false or not.
>
> And the whole purpose for setting core.legacyheaders is exactly to allow
> for loose objects to be copied straight into the pack. This should have
> priority over mismatched compression levels IMHO.
OK, I got really confused here, so I looked over the code,
and figured out 2 causes for my confusion.
(1) core.legacyheaders controls use_legacy_headers, which defaults to 1.
So currently all loose objects are in legacy format and the code block
I spoke of doesn't trigger [without a config setting]. I didn't realize
legacy headers were still being produced (mislead by the name!).
(2) I read your "setting core.legacyheaders" as followed by TRUE,
but you meant FALSE, which is not the default.
I also read that 1 year after 1.4.2, the default for core.legacyheaders is going
to change to FALSE. I think our discussion should assume this has
happened. So let's assume FALSE in the following. The point of that
is that such a FALSE setting can't be assumed to have any special intent;
it will be the default.
[Everything I write here boils down to only one question,
which I repeat at the end.]
Data gets into a pack in these ways:
1. Loose object copied in;
2. Loose object newly deltified;
3. Packed object to be copied;
4. Packed object to be newly deltified;
5. Packed deltified object we can't re-use;
6. Packed deltified object we can re-use.
["copied" includes recompressed.]
In (2), (4), and (5), pack.compression will always be newly used.
If pack.compression doesn't change, this means (6)
will be using pack.compression since it comes from (2) or (4).
So if I "guarantee" that (1) uses pack.compression,
(3) will as well, meaning everything in the pack will be
at pack.compression.
Thus if pack.compression != core.loosecompression takes precedence
over core.legacyheaders = false, then for pack.compression constant
we get all 6 cases at level pack.compression. If core.legacyheaders =
false takes precedence as you suggest, then all undeltified objects
(20%?) will be stuck at core.loosecompression [since I see no way
to sensibly re-apply compression to something copied pack-to-pack].
I think this is inconsistent with what a pack.compression !=
core.loosecompression setting is telling us.
My arguments have 2 biases.
(a) I assume pack.compression stays constant, and if it changes,
I see little value in worrying about "forcing" all objects in a new pack,
some of which might be copied from an old pack, to be at the same
compression level.
(b) I focused first on packing to disk, where a constant pack.compression
makes sense. For packing to stdout (pack transfer), especially on
a popular, hence loaded, repository server, we may want to use
--compress=N to crank down on effort spent compressing new deltas [2+4]
(or loose objects [1]) (or unusable deltas [5]). This still lets
better compressed
objects flow through in the other cases [3+6]. So for one-use packs,
a mix of compression levels could be desirable and
is achievable with --compression=N if/when so.
> Also, when repacking, delta reuse does not recompress objects for the
> same reason, regardless of the compression level used when they were
> compressed initially. Same argument goes for delta depth.
Delta reuse doesn't need recompression. For pack.compression
constant, they will already be at the correct level. I think we agree
on behavior here for different reasons.
> So if you really want to ensure a compression level on the whole pack,
> you'll have to use -f with git-repack. Or leave core.legacyheaders
> unset.
-f would be needed if you were in the practice of changing
pack.compression, yes.
So, after covering all these cases, have I convinced you that
recompression takes precedence over legacyheaders = FALSE?
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
next prev parent reply other threads:[~2007-05-09 9:21 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-08 22:38 [PATCH v2] Custom compression levels for objects and packs Dana How
2007-05-08 23:56 ` Junio C Hamano
2007-05-09 0:16 ` Nicolas Pitre
2007-05-09 0:29 ` Dana How
2007-05-09 1:03 ` Nicolas Pitre
2007-05-09 6:46 ` Dana How
2007-05-09 7:13 ` Junio C Hamano
2007-05-09 0:25 ` Dana How
2007-05-09 1:23 ` Nicolas Pitre
2007-05-09 9:21 ` Dana How [this message]
2007-05-09 15:27 ` Nicolas Pitre
2007-05-09 16:26 ` Junio C Hamano
2007-05-09 16:42 ` Dana How
2007-05-09 16:59 ` [PATCH] make "repack -f" imply "pack-objects --no-reuse-object" Nicolas Pitre
2007-05-09 18:42 ` [PATCH] deprecate the new loose object header format Nicolas Pitre
2007-05-09 20:16 ` Dana How
2007-05-09 20:42 ` Nicolas Pitre
2007-05-09 21:00 ` Dana How
2007-05-09 5:59 ` [PATCH v2] Custom compression levels for objects and packs Junio C Hamano
2007-05-09 6:24 ` Dana How
2007-05-09 0:30 ` Petr Baudis
2007-05-09 13:56 ` Theodore Tso
2007-05-09 16:44 ` Dana How
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56b7f5510705090221g38ab0973x8631dacc601abb16@mail.gmail.com \
--to=danahow@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).