git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jason Hatton <jhatton@globalfinishing.com>
Cc: "Philip Oakley" <philipoakley@iee.email>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	"René Scharfe" <l.s.r@web.de>
Subject: Re: [PATCH] Prevent git from rehashing 4GBi files
Date: Fri, 06 May 2022 11:32:11 -0700	[thread overview]
Message-ID: <xmqqbkwaldg4.fsf@gitster.g> (raw)
In-Reply-To: <CY4PR16MB1655EE6CC2218AEA35B451E3AFC59@CY4PR16MB1655.namprd16.prod.outlook.com> (Jason Hatton's message of "Fri, 6 May 2022 17:08:16 +0000")

Jason Hatton <jhatton@globalfinishing.com> writes:

>>Philip Oakley <philipoakley@iee.email> writes:
>>
>>> This "Munge" above isn't telling the reader 'why'/'what' is going on.
>>> The comment should in some way highlight that a zero size result is
>>> special, and that we have the roll over issue when the stored in 32 bits
>>> - the double duty of racy vs changed in the stat data heuristic.
>>> Synonyms of 'munge' ?
>
> mangle?
> hash?
>
>>>
>>>
>>>> + */
>>>> +unsigned int munge_st_size(off_t st_size) {
>>>> +    unsigned int sd_size = st_size;
>>>> +
>>>> +    if(!sd_size && st_size)
>>
>>Style.
>
> Something like 1<<31?

Sorry, missing SP between "if" and "(" was what stood out like a
sore thumb.

>>
>>>> +        return 0x80000000;

The .sd_size member is merely defined as "unsigned int" and so is
the return value from this helper.  They have no idea how big an
integer they are dealing with.  It is limited to 32-bit explicitly
only because create_from_disk() uses get_be32() on ondisk->size to
get the value to be assigned to the member.

So I agree with writing it as 31-bit shift for ease of reading, but
perhaps a comment to indicate where that size comes from would help
the readers while we are at it, perhaps?

		return 1U<<31; /* ondisk_cache_entry.size */

I dunno.

>>>> +    else
>>>> +        return sd_size;
>>>> +}
>>
>>This may treat non-zero multiple of 4GiB as "not racy", but has
>>anybody double checked the concern Réne brought up earlier that a
>>4GiB file that was added and then got rewritten to 2GiB within the
>>same second would suddenly start getting treated as not racy?
>>
>>The patch (the firnal version of it anyway) needs to be accompanied
>>by a handful of test additions to tickle corner cases like that.
>>
>>Thanks, all, for working on this.
>
> If the file size is changed by exactly 2GiB is a concern. This is an issue for
> files exactly a multiple of 4GiB. However, all files that are changed by a
> multiple of 4GiB are vulnerable.

So if you have a 4GiB file, "git add" it, then rewrite it with a
different contents to make it a 8GiB file within the same second,
would Git mistakenly think that there is no change, because the racy
git protection is gone with this change?  I think that was one of
the concerns (there may have been others I am forgetting).


  reply	other threads:[~2022-05-06 18:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-06 17:08 [PATCH] Prevent git from rehashing 4GBi files Jason Hatton
2022-05-06 18:32 ` Junio C Hamano [this message]
     [not found] <philipoakley@iee.email>
2022-05-07 18:58 ` Jason D. Hatton
     [not found] <CY4PR16MB165501ED1B535592033C76F2AFC49@CY4PR16MB1655.namprd16.prod.outlook.com>
2022-05-07 18:10 ` Jason Hatton
  -- strict thread matches above, loose matches on Subject: below --
2022-05-07  2:15 Jason Hatton
     [not found] ` <1DFD3E42-3EF3-4420-8E01-748EF3DBE7A1@iee.email>
2022-05-07 15:22   ` René Scharfe
2022-05-10 22:45 ` Philip Oakley
2022-05-11 22:24   ` Philip Oakley
2022-05-06  0:26 Jason Hatton
2022-05-06  4:37 ` Torsten Bögershausen
2022-05-06 10:22 ` Philip Oakley
2022-05-06 16:36   ` Junio C Hamano
2022-05-06 21:17     ` Philip Oakley
2022-05-06 21:23       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbkwaldg4.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jhatton@globalfinishing.com \
    --cc=l.s.r@web.de \
    --cc=philipoakley@iee.email \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).