git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Philip Oakley <philipoakley@iee.email>
To: Jason Hatton <jhatton@globalfinishing.com>,
	Junio C Hamano <gitster@pobox.com>
Cc: "René Scharfe" <l.s.r@web.de>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] Prevent git from rehashing 4GBi files
Date: Tue, 10 May 2022 23:45:14 +0100	[thread overview]
Message-ID: <1a56b96c-2c58-ccaf-11ae-5e8264a323b1@iee.email> (raw)
In-Reply-To: <CY4PR16MB1655F83010A128D4ED67C7EDAFC49@CY4PR16MB1655.namprd16.prod.outlook.com>

On 07/05/2022 03:15, Jason Hatton wrote:
>> Philip Oakley <philipoakley@iee.email> writes:
>>
>>>> This may treat non-zero multiple of 4GiB as "not racy", but has
>>>> anybody double checked the concern Réne brought up earlier that a
>>>> 4GiB file that was added and then got rewritten to 2GiB within the
>>>> same second would suddenly start getting treated as not racy?
>>> This is the pre-existing problem, that ~1in 2^31 size changes might not
>>> get noticed for size change. The 0 byte / 4GiB change is an identical
>>> issue, as is changing from 3 bytes to 4GiB+3 bytes, etc., so that's no
>>> worse than before (well maybe twice as 'unlikely').
>> OK, it added one more case to 2^32-1 existing cases, I guess.
>>
>>>> The patch (the firnal version of it anyway) needs to be accompanied
>>>> by a handful of test additions to tickle corner cases like that.
>>> They'd be protected by the EXPENSIVE prerequisite I would assume.
>> Oh, absolutely.  Thanks for spelling that out.
> I have been testing out the patch a bit and have good and (mostly) bad news.
>
> What works using a munge value of 1.
>
> $ git add
> $ git status
>
> Racy seems to work.
>
> $ touch .git/index 4GiB # 4GiB is now racy
> $ git status # Git will rehash the racy file
> $ git status # Git cached the file. Second status is fast.
>
> What doesn't work.
>
> $ git checkout 4GiB
> $ fatal: packed object is corrupt!
>
> Using a munge value of 1<<31 causes even more problems. The file hash in the
> index for 4GiB files (git ls-files -s --debug) are set to the zero file hash.
>
> I looked up and down the code base and couldn't figure out how the munged
> value was leaking out of read-cache.c and breaking things. Most of the code
> I found tends to use stat and then convert that to a size_t, not using the
> munged unsigned int at all.
>
> Maybe someone else will have better luck. This seems over my head :(
>
> Thanks
> --
> Jason
>
Is there a problem that 1<<31, when on a 32bit long is MAX_NEG, rather 
than being MAX_POS? And the size would need to be positive to be an 
acceptable file size?
(The code is a bit of a mish-mash on the Windows LLP64 side, where long 
is only 32 bits).

Philip
Apologies for the terseness.

  parent reply	other threads:[~2022-05-10 22:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-07  2:15 Jason Hatton
     [not found] ` <1DFD3E42-3EF3-4420-8E01-748EF3DBE7A1@iee.email>
2022-05-07 15:22   ` René Scharfe
2022-05-10 22:45 ` Philip Oakley [this message]
2022-05-11 22:24   ` Philip Oakley
     [not found] <philipoakley@iee.email>
2022-05-07 18:58 ` Jason D. Hatton
     [not found] <CY4PR16MB165501ED1B535592033C76F2AFC49@CY4PR16MB1655.namprd16.prod.outlook.com>
2022-05-07 18:10 ` Jason Hatton
  -- strict thread matches above, loose matches on Subject: below --
2022-05-06 17:08 Jason Hatton
2022-05-06 18:32 ` Junio C Hamano
2022-05-06  0:26 Jason Hatton
2022-05-06  4:37 ` Torsten Bögershausen
2022-05-06 10:22 ` Philip Oakley
2022-05-06 16:36   ` Junio C Hamano
2022-05-06 21:17     ` Philip Oakley
2022-05-06 21:23       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a56b96c-2c58-ccaf-11ae-5e8264a323b1@iee.email \
    --to=philipoakley@iee.email \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jhatton@globalfinishing.com \
    --cc=l.s.r@web.de \
    --subject='Re: [PATCH] Prevent git from rehashing 4GBi files' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).