git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git as data archive
@ 2019-12-06 18:54 Andreas Kalz
  2019-12-07 16:54 ` Philip Oakley
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Kalz @ 2019-12-06 18:54 UTC (permalink / raw)
  To: git

Hello,
I am using git as archive and versioning also for photos. Apart from
performance issues, I wanted to ask if there are hard limits and
configurable limits (how to configure?) for maximum single file size and
maximum .git archive size (Windows 64 Bit system)?
Thanks in advance for your answer.
All the best,
Andreas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as data archive
  2019-12-06 18:54 Git as data archive Andreas Kalz
@ 2019-12-07 16:54 ` Philip Oakley
  2019-12-07 18:04   ` Thomas Braun
  0 siblings, 1 reply; 6+ messages in thread
From: Philip Oakley @ 2019-12-07 16:54 UTC (permalink / raw)
  To: Andreas Kalz, git

Hi Andreas,

On 06/12/2019 18:54, Andreas Kalz wrote:
> Hello,
> I am using git as archive and versioning also for photos. Apart from
> performance issues, I wanted to ask if there are hard limits and
> configurable limits (how to configure?) for maximum single file size and
> maximum .git archive size (Windows 64 Bit system)?
> Thanks in advance for your answer.
> All the best,
> Andreas

On Git the file size is currently limited to size of `long`, rather than 
`size_t`. Hence on Git-for Windows the size limit is 32bit ~4GiB

Any change will be a big change as it ripples through many places in the 
code base and, for some, will feel 'wrong'. I did some work [1-4] on top 
of those of many others that was almost there, but...

The alternative is git-lfs, which I don't personally use (see [4]).

Philip

[1] https://github.com/git-for-windows/git/pull/2179
[2] https://github.com/gitgitgadget/git/pull/115
[3] https://github.com/git-for-windows/git/issues/1063
[4] https://github.com/git-lfs/git-lfs/issues/2434


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as data archive
  2019-12-07 16:54 ` Philip Oakley
@ 2019-12-07 18:04   ` Thomas Braun
  2019-12-08 18:44     ` Andreas Kalz
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Braun @ 2019-12-07 18:04 UTC (permalink / raw)
  To: Philip Oakley, Andreas Kalz, git

On 07.12.2019 17:54, Philip Oakley wrote:
> Hi Andreas,
> 
> On 06/12/2019 18:54, Andreas Kalz wrote:
>> Hello,
>> I am using git as archive and versioning also for photos. Apart from
>> performance issues, I wanted to ask if there are hard limits and
>> configurable limits (how to configure?) for maximum single file size and
>> maximum .git archive size (Windows 64 Bit system)?
>> Thanks in advance for your answer.
>> All the best,
>> Andreas
> 
> On Git the file size is currently limited to size of `long`, rather than
> `size_t`. Hence on Git-for Windows the size limit is 32bit ~4GiB
> 
> Any change will be a big change as it ripples through many places in the
> code base and, for some, will feel 'wrong'. I did some work [1-4] on top
> of those of many others that was almost there, but...

Adding to what Philip said. On Windows the size of exported archives
(git archive) is currently also limited to 4GB. The reason being also
the long vs size_t issue (which is not present on linux though).

So if you can switch to Linux or even MacOSX these issues are gone.

The number of files in .git, only the number packfiles would be of
interest here I guess, do not have the long vs size_t issue. So
packfiles can be larger than 4GB on 64bit Windows (with 64bit git of
course).

But depending on how large the biggest files are, it might be worth
tweaking some of the settings, so that the created packfiles are
readable on all platforms. I once created a repo on linux which could
not be checked on windows, and that is a bit annoying.

So the questions are how large is each file? And what repository size do
you expect? Are we talking about 20MB files and 10GB repository? Or a
factor 100 more? And are you just adding files or are you modifying the
added files? Depending on the file sizes it might then also be
beneficial to tweak the delta compression settings and/or the big file
threshold limits.

Thomas

> The alternative is git-lfs, which I don't personally use (see [4]).
> 
> Philip
> 
> [1] https://github.com/git-for-windows/git/pull/2179
> [2] https://github.com/gitgitgadget/git/pull/115
> [3] https://github.com/git-for-windows/git/issues/1063
> [4] https://github.com/git-lfs/git-lfs/issues/2434
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as data archive
  2019-12-07 18:04   ` Thomas Braun
@ 2019-12-08 18:44     ` Andreas Kalz
  2019-12-09  1:18       ` Thomas Braun
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Kalz @ 2019-12-08 18:44 UTC (permalink / raw)
  To: Thomas Braun; +Cc: Philip Oakley, git

Hi,

thanks to you both.

@Thomas: are you Thomas Braun who studied at FH Regensburg?

Well, currently the .git repository is 715GB and the maximum file size
is 9.5GB, but I did not get error messages due to that even if the
performance is quite low. The biggest pack* file is 24GB. There are some
files which are modified, but most are not modified.

My question came up as I did not find a documentation about limits of
git, only a lot of entries about github and forum users who are
discussing about old bugs of git. I read about git-lfs and also that it
is not working very stable, due to that I did not use it yet.

How can the delta compression settings and/or the big filethreshold
limits be modified?
Thanks in advance.

All the best,
Andreas


Am 07.12.2019 um 19:04 schrieb Thomas Braun:
> On 07.12.2019 17:54, Philip Oakley wrote:
>> Hi Andreas,
>>
>> On 06/12/2019 18:54, Andreas Kalz wrote:
>>> Hello,
>>> I am using git as archive and versioning also for photos. Apart from
>>> performance issues, I wanted to ask if there are hard limits and
>>> configurable limits (how to configure?) for maximum single file size and
>>> maximum .git archive size (Windows 64 Bit system)?
>>> Thanks in advance for your answer.
>>> All the best,
>>> Andreas
>> On Git the file size is currently limited to size of `long`, rather than
>> `size_t`. Hence on Git-for Windows the size limit is 32bit ~4GiB
>>
>> Any change will be a big change as it ripples through many places in the
>> code base and, for some, will feel 'wrong'. I did some work [1-4] on top
>> of those of many others that was almost there, but...
> Adding to what Philip said. On Windows the size of exported archives
> (git archive) is currently also limited to 4GB. The reason being also
> the long vs size_t issue (which is not present on linux though).
>
> So if you can switch to Linux or even MacOSX these issues are gone.
>
> The number of files in .git, only the number packfiles would be of
> interest here I guess, do not have the long vs size_t issue. So
> packfiles can be larger than 4GB on 64bit Windows (with 64bit git of
> course).
>
> But depending on how large the biggest files are, it might be worth
> tweaking some of the settings, so that the created packfiles are
> readable on all platforms. I once created a repo on linux which could
> not be checked on windows, and that is a bit annoying.
>
> So the questions are how large is each file? And what repository size do
> you expect? Are we talking about 20MB files and 10GB repository? Or a
> factor 100 more? And are you just adding files or are you modifying the
> added files? Depending on the file sizes it might then also be
> beneficial to tweak the delta compression settings and/or the big file
> threshold limits.
>
> Thomas
>
>> The alternative is git-lfs, which I don't personally use (see [4]).
>>
>> Philip
>>
>> [1] https://github.com/git-for-windows/git/pull/2179
>> [2] https://github.com/gitgitgadget/git/pull/115
>> [3] https://github.com/git-for-windows/git/issues/1063
>> [4] https://github.com/git-lfs/git-lfs/issues/2434
>>
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as data archive
  2019-12-08 18:44     ` Andreas Kalz
@ 2019-12-09  1:18       ` Thomas Braun
  2019-12-09 16:39         ` Andreas Kalz
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Braun @ 2019-12-09  1:18 UTC (permalink / raw)
  To: Andreas Kalz; +Cc: Philip Oakley, git

On 08.12.2019 19:44, Andreas Kalz wrote:

Hi Andreas,

> @Thomas: are you Thomas Braun who studied at FH Regensburg?

nope, sorry.

> Well, currently the .git repository is 715GB and the maximum file size
> is 9.5GB, but I did not get error messages due to that even if the
> performance is quite low. The biggest pack* file is 24GB. There are some
> files which are modified, but most are not modified.

Okay that is kind-of-large. How did you add the 9.5GB file? AFAIK this
could not have be done on windows.

Do you push that to a remote repository as well?

> My question came up as I did not find a documentation about limits of
> git, only a lot of entries about github and forum users who are
> discussing about old bugs of git. I read about git-lfs and also that it
> is not working very stable, due to that I did not use it yet.

Although I'm not using git-lfs myself, from what I know it works well.
But it does have the same limitation as stock git for windows as Philip
pointed out already.

> How can the delta compression settings and/or the big filethreshold
> limits be modified?

These are plain git config settings. Have a look at [1]. The attributes
are explained in [2-3]. Basically you can set in .gitattributes

*.bin -delta, -diff

which would tell git that files with suffix bin should not be delta
compressed and are always binary.

You could also play around with turning compression completely off via
core.compression or pack.compression.

Hope that helps,
Thomas

PS: If you have resources to help fixing that long-standing bug in git
for windows, there is a PR open [4] which has a WIP version. But beware
you need good C skills and better-than-average git skills, or a
Santa-Claus-style bag with monetary resources.

[1]:
https://git-scm.com/docs/git-config#Documentation/git-config.txt-corebigFileThreshold
[2]: https://git-scm.com/docs/gitattributes#_code_delta_code
[3]: https://git-scm.com/docs/gitattributes#_marking_files_as_binary
[4]: https://github.com/git-for-windows/git/pull/2179

> Am 07.12.2019 um 19:04 schrieb Thomas Braun:
>> On 07.12.2019 17:54, Philip Oakley wrote:
>>> Hi Andreas,
>>>
>>> On 06/12/2019 18:54, Andreas Kalz wrote:
>>>> Hello,
>>>> I am using git as archive and versioning also for photos. Apart from
>>>> performance issues, I wanted to ask if there are hard limits and
>>>> configurable limits (how to configure?) for maximum single file size
>>>> and
>>>> maximum .git archive size (Windows 64 Bit system)?
>>>> Thanks in advance for your answer.
>>>> All the best,
>>>> Andreas
>>> On Git the file size is currently limited to size of `long`, rather than
>>> `size_t`. Hence on Git-for Windows the size limit is 32bit ~4GiB
>>>
>>> Any change will be a big change as it ripples through many places in the
>>> code base and, for some, will feel 'wrong'. I did some work [1-4] on top
>>> of those of many others that was almost there, but...
>> Adding to what Philip said. On Windows the size of exported archives
>> (git archive) is currently also limited to 4GB. The reason being also
>> the long vs size_t issue (which is not present on linux though).
>>
>> So if you can switch to Linux or even MacOSX these issues are gone.
>>
>> The number of files in .git, only the number packfiles would be of
>> interest here I guess, do not have the long vs size_t issue. So
>> packfiles can be larger than 4GB on 64bit Windows (with 64bit git of
>> course).
>>
>> But depending on how large the biggest files are, it might be worth
>> tweaking some of the settings, so that the created packfiles are
>> readable on all platforms. I once created a repo on linux which could
>> not be checked on windows, and that is a bit annoying.
>>
>> So the questions are how large is each file? And what repository size do
>> you expect? Are we talking about 20MB files and 10GB repository? Or a
>> factor 100 more? And are you just adding files or are you modifying the
>> added files? Depending on the file sizes it might then also be
>> beneficial to tweak the delta compression settings and/or the big file
>> threshold limits.
>>
>> Thomas
>>
>>> The alternative is git-lfs, which I don't personally use (see [4]).
>>>
>>> Philip
>>>
>>> [1] https://github.com/git-for-windows/git/pull/2179
>>> [2] https://github.com/gitgitgadget/git/pull/115
>>> [3] https://github.com/git-for-windows/git/issues/1063
>>> [4] https://github.com/git-lfs/git-lfs/issues/2434
>>>
>>>
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as data archive
  2019-12-09  1:18       ` Thomas Braun
@ 2019-12-09 16:39         ` Andreas Kalz
  0 siblings, 0 replies; 6+ messages in thread
From: Andreas Kalz @ 2019-12-09 16:39 UTC (permalink / raw)
  To: Thomas Braun; +Cc: Philip Oakley, git

Hi Thomas,
I committed it only on a local repository (git add . / git commit
-m"..."). But I never tested to restore the big files from the archive
:( and then I stepped over the bug description.

Now I tried it out and something bad happened:
E:\bilder_git>git checkout -- Hochzeitsmesse.mp4
error: bad object header
fatal: packed object 5c1403a85829c1c9e03bf04ac814d65bb72b617f (stored in
.git/objects/pack/pack-00246783dc8e6b7365220e75563b5cecfa358e11.pack) is
corrupt

During add / commit there was no problem, but now this is not a good
thing...

My C-Skills are not bad - I worked about 10 years in embedded SW
development. But, currently my time is limited as I have a 3 week old
baby child :)

All the best,
Andreas



Am 09.12.2019 um 02:18 schrieb Thomas Braun:
> On 08.12.2019 19:44, Andreas Kalz wrote:
>
> Hi Andreas,
>
>> @Thomas: are you Thomas Braun who studied at FH Regensburg?
> nope, sorry.
>
>> Well, currently the .git repository is 715GB and the maximum file size
>> is 9.5GB, but I did not get error messages due to that even if the
>> performance is quite low. The biggest pack* file is 24GB. There are some
>> files which are modified, but most are not modified.
> Okay that is kind-of-large. How did you add the 9.5GB file? AFAIK this
> could not have be done on windows.
>
> Do you push that to a remote repository as well?
>
>> My question came up as I did not find a documentation about limits of
>> git, only a lot of entries about github and forum users who are
>> discussing about old bugs of git. I read about git-lfs and also that it
>> is not working very stable, due to that I did not use it yet.
> Although I'm not using git-lfs myself, from what I know it works well.
> But it does have the same limitation as stock git for windows as Philip
> pointed out already.
>
>> How can the delta compression settings and/or the big filethreshold
>> limits be modified?
> These are plain git config settings. Have a look at [1]. The attributes
> are explained in [2-3]. Basically you can set in .gitattributes
>
> *.bin -delta, -diff
>
> which would tell git that files with suffix bin should not be delta
> compressed and are always binary.
>
> You could also play around with turning compression completely off via
> core.compression or pack.compression.
>
> Hope that helps,
> Thomas
>
> PS: If you have resources to help fixing that long-standing bug in git
> for windows, there is a PR open [4] which has a WIP version. But beware
> you need good C skills and better-than-average git skills, or a
> Santa-Claus-style bag with monetary resources.
>
> [1]:
> https://git-scm.com/docs/git-config#Documentation/git-config.txt-corebigFileThreshold
> [2]: https://git-scm.com/docs/gitattributes#_code_delta_code
> [3]: https://git-scm.com/docs/gitattributes#_marking_files_as_binary
> [4]: https://github.com/git-for-windows/git/pull/2179
>
>> Am 07.12.2019 um 19:04 schrieb Thomas Braun:
>>> On 07.12.2019 17:54, Philip Oakley wrote:
>>>> Hi Andreas,
>>>>
>>>> On 06/12/2019 18:54, Andreas Kalz wrote:
>>>>> Hello,
>>>>> I am using git as archive and versioning also for photos. Apart from
>>>>> performance issues, I wanted to ask if there are hard limits and
>>>>> configurable limits (how to configure?) for maximum single file size
>>>>> and
>>>>> maximum .git archive size (Windows 64 Bit system)?
>>>>> Thanks in advance for your answer.
>>>>> All the best,
>>>>> Andreas
>>>> On Git the file size is currently limited to size of `long`, rather than
>>>> `size_t`. Hence on Git-for Windows the size limit is 32bit ~4GiB
>>>>
>>>> Any change will be a big change as it ripples through many places in the
>>>> code base and, for some, will feel 'wrong'. I did some work [1-4] on top
>>>> of those of many others that was almost there, but...
>>> Adding to what Philip said. On Windows the size of exported archives
>>> (git archive) is currently also limited to 4GB. The reason being also
>>> the long vs size_t issue (which is not present on linux though).
>>>
>>> So if you can switch to Linux or even MacOSX these issues are gone.
>>>
>>> The number of files in .git, only the number packfiles would be of
>>> interest here I guess, do not have the long vs size_t issue. So
>>> packfiles can be larger than 4GB on 64bit Windows (with 64bit git of
>>> course).
>>>
>>> But depending on how large the biggest files are, it might be worth
>>> tweaking some of the settings, so that the created packfiles are
>>> readable on all platforms. I once created a repo on linux which could
>>> not be checked on windows, and that is a bit annoying.
>>>
>>> So the questions are how large is each file? And what repository size do
>>> you expect? Are we talking about 20MB files and 10GB repository? Or a
>>> factor 100 more? And are you just adding files or are you modifying the
>>> added files? Depending on the file sizes it might then also be
>>> beneficial to tweak the delta compression settings and/or the big file
>>> threshold limits.
>>>
>>> Thomas
>>>
>>>> The alternative is git-lfs, which I don't personally use (see [4]).
>>>>
>>>> Philip
>>>>
>>>> [1] https://github.com/git-for-windows/git/pull/2179
>>>> [2] https://github.com/gitgitgadget/git/pull/115
>>>> [3] https://github.com/git-for-windows/git/issues/1063
>>>> [4] https://github.com/git-lfs/git-lfs/issues/2434
>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-09 16:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-06 18:54 Git as data archive Andreas Kalz
2019-12-07 16:54 ` Philip Oakley
2019-12-07 18:04   ` Thomas Braun
2019-12-08 18:44     ` Andreas Kalz
2019-12-09  1:18       ` Thomas Braun
2019-12-09 16:39         ` Andreas Kalz

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).