From: "René Scharfe" <l.s.r@web.de>
To: Philip Oakley <philipoakley@iee.email>,
Jason Hatton <jhatton@globalfinishing.com>,
Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] Prevent git from rehashing 4GBi files
Date: Sat, 7 May 2022 17:22:32 +0200 [thread overview]
Message-ID: <0b79c442-9b2f-6de7-2424-9f13e9ef9878@web.de> (raw)
In-Reply-To: <1DFD3E42-3EF3-4420-8E01-748EF3DBE7A1@iee.email>
Am 07.05.22 um 14:33 schrieb Philip Oakley:
>
>
> On 7 May 2022 03:15:00 BST, Jason Hatton <jhatton@globalfinishing.com> wrote:
>
> Philip Oakley <philipoakley@iee.email> writes:
>
> This may treat non-zero multiple of 4GiB as "not racy", but has
> anybody double checked the concern Réne brought up earlier that a
> 4GiB file that was added and then got rewritten to 2GiB within the
> same second would suddenly start getting treated as not racy?
>
> This is the pre-existing problem, that ~1in 2^31 size changes might not
> get noticed for size change. The 0 byte / 4GiB change is an identical
> issue, as is changing from 3 bytes to 4GiB+3 bytes, etc., so that's no
> worse than before (well maybe twice as 'unlikely').
>
>
> OK, it added one more case to 2^32-1 existing cases, I guess.
>
> The patch (the firnal version of it anyway) needs to be accompanied
> by a handful of test additions to tickle corner cases like that.
>
> They'd be protected by the EXPENSIVE prerequisite I would assume.
>
>
> Oh, absolutely. Thanks for spelling that out.
>
>
> I have been testing out the patch a bit and have good and (mostly) bad news.
>
> What works using a munge value of 1.
>
> $ git add
> $ git status
>
> Racy seems to work.
>
> $ touch .git/index 4GiB # 4GiB is now racy
> $ git status # Git will rehash the racy file
> $ git status # Git cached the file. Second status is fast.
>
> What doesn't work.
>
> $ git checkout 4GiB
> $ fatal: packed object is corrupt!
>
> Using a munge value of 1<<31 causes even more problems. The file hash in the
> index for 4GiB files (git ls-files -s --debug) are set to the zero file hash.
>
> I looked up and down the code base and couldn't figure out how the munged
> value was leaking out of read-cache.c and breaking things. Most of the code
> I found tends to use stat and then convert that to a size_t, not using the
> munged unsigned int at all.
>
> Maybe someone else will have better luck. This seems over my head :(
>
> Thanks
> --
> Jason
>
>
> Is this on Git for Windows or a 64 bit Linux?
> There are still some issues on GfW for 2GiB+ files (long Vs long long int).
Which would explain the zero file hash. And make the platform unfit for
handling big files at all at this time.
FWIW, on MacOS I get this with the patch applied:
$ git init --quiet /tmp/a
$ cd /tmp/a
$ : >size-0
$ dd if=/dev/zero bs=1 oseek=4294967295 count=1 of=size-4294967296
1+0 records in
1+0 records out
1 bytes transferred in 0.000365 secs (2740 bytes/sec)
$ dd if=/dev/zero bs=1 oseek=4294967296 count=1 of=size-4294967297
1+0 records in
1+0 records out
1 bytes transferred in 0.000293 secs (3413 bytes/sec)
$ dd if=/dev/zero bs=1 oseek=6442450943 count=1 of=size-6442450944
1+0 records in
1+0 records out
1 bytes transferred in 0.000266 secs (3759 bytes/sec)
$ git add size-*
$ git commit -m initial
[master (root-commit) d9c2a0a] initial
4 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 size-0
create mode 100644 size-4294967296
create mode 100644 size-4294967297
create mode 100644 size-6442450944
$ time git checkout size-*
Updated 0 paths from the index
git checkout size-* 0.01s user 0.01s system 65% cpu 0.020 total
$ git ls-files -s --debug | grep size
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 size-0
size: 0 flags: 0
100644 451971a31ea5a207a10b391df2d5949910133565 0 size-4294967296
size: 2147483648 flags: 0
100644 3eb7feb1413c757f0d8181deb28d1dab03d64846 0 size-4294967297
size: 1 flags: 0
100644 741285bddfa7863072c238f34e27144c2501832d 0 size-6442450944
size: 2147483648 flags: 0
So checkout skips all of the files and their cached sizes have the
expected values.
René
next prev parent reply other threads:[~2022-05-07 15:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-07 2:15 [PATCH] Prevent git from rehashing 4GBi files Jason Hatton
[not found] ` <1DFD3E42-3EF3-4420-8E01-748EF3DBE7A1@iee.email>
2022-05-07 15:22 ` René Scharfe [this message]
2022-05-10 22:45 ` Philip Oakley
2022-05-11 22:24 ` Philip Oakley
[not found] <philipoakley@iee.email>
2022-05-07 18:58 ` Jason D. Hatton
[not found] <CY4PR16MB165501ED1B535592033C76F2AFC49@CY4PR16MB1655.namprd16.prod.outlook.com>
2022-05-07 18:10 ` Jason Hatton
-- strict thread matches above, loose matches on Subject: below --
2022-05-06 17:08 Jason Hatton
2022-05-06 18:32 ` Junio C Hamano
2022-05-06 0:26 Jason Hatton
2022-05-06 4:37 ` Torsten Bögershausen
2022-05-06 10:22 ` Philip Oakley
2022-05-06 16:36 ` Junio C Hamano
2022-05-06 21:17 ` Philip Oakley
2022-05-06 21:23 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0b79c442-9b2f-6de7-2424-9f13e9ef9878@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jhatton@globalfinishing.com \
--cc=philipoakley@iee.email \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).