git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Philip Oakley <philipoakley@iee.email>
To: "Jason D. Hatton" <jason.hatton@gmail.com>,
	Jason Hatton <jhatton@globalfinishing.com>,
	Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, l.s.r@web.de
Subject: Re: [PATCH] Prevent git from rehashing 4GBi files
Date: Wed, 11 May 2022 23:24:07 +0100	[thread overview]
Message-ID: <972cb306-04ce-133d-9d09-5da40afd675f@iee.email> (raw)
In-Reply-To: <1a56b96c-2c58-ccaf-11ae-5e8264a323b1@iee.email>

On 11/05/2022 18:47, Jason D. Hatton wrote:
>> Is there a problem that 1<<31, when on a 32bit long is MAX_NEG, 
>> rather than being MAX_POS? And the size would need to be positive to 
>> be an acceptable file size?
>> (The code is a bit of a mish-mash on the Windows LLP64 side, where 
>> long is only 32 bits).
>>
>> Philip
>> Apologies for the terseness.
>
> Philip
>
> I made a little test script and tried out several different
> things.
>
> tldr; It didn't make any difference.
>
> Files tested:
> 1, 2 and 4 GiB with and without LFS. Tested with 0, 1, 1<<30,
> and 1<<31 mung builds. I'm only listing the problems unless
> stated otherwise. The mung didn't appear to introduce any
> new issues with my limited tests.
>
> git 2.36.0.windows.1 release:    fails on 4GiB w/o LFS - corrupts pack 
> file
>    git status is very slow.
>    Sometimes stores zero file instead of corruption.
>
> git 2.36.0.windows.1 custom compile w/o patches:
>    fails on 4GiB w/o LFS - stores zero file
>    git status is very slow.
>
> git 2.36.0.windows.1 with 1U<<31 mung:
>    fails on 4GiB w/o LFS - stores zero file
>
> git 2.36.0.windows.1 with 1U<<30 mung:
>    fails on 4GiB w/o LFS - stores zero file
>
> git 2.36.0.windows.1 with 1 mung:
>    fails on 4GiB w/o LFS - stores zero file
>
> git 2.36.0 Ubuntu
>    unpatched works, but has the slow status issue.
>
> The test script I used is below:

Without the git-lfs (to grossly shorten the file size in the pack) I 
wasn't expecting much, given the use of 'long' in places in the code 
base for the file sizes, so 2GiB and 4GiB files would likely fail on the 
Windows LP32 parts.

I was under the impression that the core code for packs had been size_t 
hardened, but there may be some paths either in git-lfs or the actual 
file checkout that cause that fail.

There was a previous series by Matt Cooper on:
  "Allow clean/smudge filters to handle huge files in the LLP64 data model"
(https://lore.kernel.org/git/pull.1068.git.1635320952.gitgitgadget@gmail.com/t/#u)
  Merge commit f9ba6acaa9348ea7b733bf78adc2f084247a912f
'mc/clean-smudge-with-llp64'

That series had some in-code checks, and some test-suite tests, though 
the latter classed as EXPENSIVE (i.e. not normally run), which may add 
more insight.

>
>
> #!/bin/sh
>
> GB1=$((1 * 1024*1024*1024))
> GB2=$((2 * 1024*1024*1024))
> GB4=$((4 * 1024*1024*1024))
>
> die()
> {
>    echo "$1"
>    exit 1
> }
>
> test_file()
> {
>    echo "=== TESTING $2 ==="
>    rm -rf .git .gitattributes .gitignore .gitmodules &&
>        git init &&
>        git lfs track '*.big' &&
>        truncate --size "$1" "$2" &&
>        git add "$2" &&
>        git commit -m "$2" &&
>        git fsck &&
>        mv "$2" bak &&
>        git restore "$2" &&
>        cmp "$2" bak || die "$2"
>    git status && timeout 5 git status || die "$2 git status slow"
>    rm -rf .git .gitattributes .gitignore .gitmodules "$2" bak
> }
>
> test_file "$GB1" gb1.big
> test_file "$GB2" gb2.big
> test_file "$GB4" gb4.big
> test_file "$GB1" gb1
> test_file "$GB2" gb2
> test_file "$GB4" gb4
> echo done 
--
Philip

  reply	other threads:[~2022-05-11 22:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-07  2:15 [PATCH] Prevent git from rehashing 4GBi files Jason Hatton
     [not found] ` <1DFD3E42-3EF3-4420-8E01-748EF3DBE7A1@iee.email>
2022-05-07 15:22   ` René Scharfe
2022-05-10 22:45 ` Philip Oakley
2022-05-11 22:24   ` Philip Oakley [this message]
     [not found] <philipoakley@iee.email>
2022-05-07 18:58 ` Jason D. Hatton
     [not found] <CY4PR16MB165501ED1B535592033C76F2AFC49@CY4PR16MB1655.namprd16.prod.outlook.com>
2022-05-07 18:10 ` Jason Hatton
  -- strict thread matches above, loose matches on Subject: below --
2022-05-06 17:08 Jason Hatton
2022-05-06 18:32 ` Junio C Hamano
2022-05-06  0:26 Jason Hatton
2022-05-06  4:37 ` Torsten Bögershausen
2022-05-06 10:22 ` Philip Oakley
2022-05-06 16:36   ` Junio C Hamano
2022-05-06 21:17     ` Philip Oakley
2022-05-06 21:23       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=972cb306-04ce-133d-9d09-5da40afd675f@iee.email \
    --to=philipoakley@iee.email \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jason.hatton@gmail.com \
    --cc=jhatton@globalfinishing.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).