git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug with disabling compression and 'binary' files.
@ 2016-11-16  0:21 Douglas Cox
  2016-11-16  1:42 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Douglas Cox @ 2016-11-16  0:21 UTC (permalink / raw)
  To: git

I was doing some experiments today with large-ish (100-200MB) binary
files and was trying to determine the best configuration for Git. Here
are the steps and timings I saw:

git init Test
cp .../largemovie.mp4 .
time git add largemovie.mp4

This took 6.5s for a 200MB file.

This file compressed a little bit, but not enough to matter, so I
wanted to see how much faster the add would be with compression
disabled. So I ran:

git config core.compression = 0

I then completely reran the test above starting with a clean
repository. This time the add took only 2.08s.  I repeated these two
tests about 10 times using the same file each time to verify it wasn't
related to disk caching, etc.

At this point I decided that this was likely a good setting for this
repository, so I also created a .gitattributes file and added a few
entries often seen in suggestions for dealing with binary files:

*.mp4 binary -delta

The goal here was to disable any delta compression done during a
gc/pack and use the other settings 'binary' applies. Unfortunately
when I ran the test again (still using compression = 0), the test was
back to taking 6.5s and the file inside the .git/objects/ folder was
compressed again.

I narrowed this down to the '-text' attribute that is set when
specifying 'binary'.  For some reason this flag is cancelling out the
core.compression = 0 setting and I think this is a bug?

Unfortunately core.compression = 0 is also global. Ideally it would be
great if there was a separate 'compression' attribute that could be
specified in .gitattributes per wildcard similar to how -delta can be
used. This way we would still be able to get compression for
text/source files, while still getting the speed of skipping
compression for binary files that do not compress well.

Has there been any discussion on having an attribute similar to this?

Thanks,
-Doug

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug with disabling compression and 'binary' files.
  2016-11-16  0:21 Bug with disabling compression and 'binary' files Douglas Cox
@ 2016-11-16  1:42 ` Junio C Hamano
  2016-11-16  3:12   ` Douglas Cox
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2016-11-16  1:42 UTC (permalink / raw)
  To: Douglas Cox; +Cc: git

Douglas Cox <ziflin@gmail.com> writes:

> I narrowed this down to the '-text' attribute that is set when
> specifying 'binary'.  For some reason this flag is cancelling out the
> core.compression = 0 setting and I think this is a bug?
>
> Unfortunately core.compression = 0 is also global. Ideally it would be
> great if there was a separate 'compression' attribute that could be
> specified in .gitattributes per wildcard similar to how -delta can be
> used. This way we would still be able to get compression for
> text/source files, while still getting the speed of skipping
> compression for binary files that do not compress well.
>
> Has there been any discussion on having an attribute similar to this?

Nope.  

I do not offhand think of a way for '-text' attribute (or any
attribute for what matter) to interfere with compression level, but
while reading the various parts of the system that futz with the
compression level configuration, I noticed one thing.  When we do an
initial "bulk-checkin" optimization that sends all objects to a
single packfile upon "git add", the packfile creation uses its own
compression level that is not affected by any configuration or
command line option.  This may or may not be related to the symptom
you are observing (if it is, then you would see a packfile created
in objects/pack/, not in loose objects in object/??/ directories).




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug with disabling compression and 'binary' files.
  2016-11-16  1:42 ` Junio C Hamano
@ 2016-11-16  3:12   ` Douglas Cox
  2016-11-16  5:23     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Douglas Cox @ 2016-11-16  3:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> This may or may not be related to the symptom
> you are observing (if it is, then you would see a packfile created
> in objects/pack/, not in loose objects in object/??/ directories).

No, the file is loose (it's in .git/objects/eb in this case). This is
seen immediately after the add, though I believe it's the same way
when doing a commit on a changed file.


On Tue, Nov 15, 2016 at 8:42 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Douglas Cox <ziflin@gmail.com> writes:
>
>> I narrowed this down to the '-text' attribute that is set when
>> specifying 'binary'.  For some reason this flag is cancelling out the
>> core.compression = 0 setting and I think this is a bug?
>>
>> Unfortunately core.compression = 0 is also global. Ideally it would be
>> great if there was a separate 'compression' attribute that could be
>> specified in .gitattributes per wildcard similar to how -delta can be
>> used. This way we would still be able to get compression for
>> text/source files, while still getting the speed of skipping
>> compression for binary files that do not compress well.
>>
>> Has there been any discussion on having an attribute similar to this?
>
> Nope.
>
> I do not offhand think of a way for '-text' attribute (or any
> attribute for what matter) to interfere with compression level, but
> while reading the various parts of the system that futz with the
> compression level configuration, I noticed one thing.  When we do an
> initial "bulk-checkin" optimization that sends all objects to a
> single packfile upon "git add", the packfile creation uses its own
> compression level that is not affected by any configuration or
> command line option.  This may or may not be related to the symptom
> you are observing (if it is, then you would see a packfile created
> in objects/pack/, not in loose objects in object/??/ directories).
>
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug with disabling compression and 'binary' files.
  2016-11-16  3:12   ` Douglas Cox
@ 2016-11-16  5:23     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2016-11-16  5:23 UTC (permalink / raw)
  To: Douglas Cox; +Cc: git

Douglas Cox <ziflin@gmail.com> writes:

>> This may or may not be related to the symptom
>> you are observing (if it is, then you would see a packfile created
>> in objects/pack/, not in loose objects in object/??/ directories).
>
> No, the file is loose (it's in .git/objects/eb in this case). This is
> seen immediately after the add, though I believe it's the same way
> when doing a commit on a changed file.

Then I do not have a guess as to where the symptom you are seeing is
coming from.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-11-16  5:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-16  0:21 Bug with disabling compression and 'binary' files Douglas Cox
2016-11-16  1:42 ` Junio C Hamano
2016-11-16  3:12   ` Douglas Cox
2016-11-16  5:23     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).