git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jason Hatton <jhatton@globalfinishing.com>
To: Junio C Hamano <gitster@pobox.com>,
	Philip Oakley <philipoakley@iee.email>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Cc: "René Scharfe" <l.s.r@web.de>
Subject: Re: [PATCH] Prevent git from rehashing 4GBi files
Date: Fri, 6 May 2022 17:08:16 +0000	[thread overview]
Message-ID: <CY4PR16MB1655EE6CC2218AEA35B451E3AFC59@CY4PR16MB1655.namprd16.prod.outlook.com> (raw)

>Philip Oakley <philipoakley@iee.email> writes:
>
>> This "Munge" above isn't telling the reader 'why'/'what' is going on.
>> The comment should in some way highlight that a zero size result is
>> special, and that we have the roll over issue when the stored in 32 bits
>> - the double duty of racy vs changed in the stat data heuristic.
>> Synonyms of 'munge' ?

mangle?
hash?

>>
>>
>>> + */
>>> +unsigned int munge_st_size(off_t st_size) {
>>> +    unsigned int sd_size = st_size;
>>> +
>>> +    if(!sd_size && st_size)
>
>Style.

Something like 1<<31?

>
>>> +        return 0x80000000;
>>> +    else
>>> +        return sd_size;
>>> +}
>
>This may treat non-zero multiple of 4GiB as "not racy", but has
>anybody double checked the concern Réne brought up earlier that a
>4GiB file that was added and then got rewritten to 2GiB within the
>same second would suddenly start getting treated as not racy?
>
>The patch (the firnal version of it anyway) needs to be accompanied
>by a handful of test additions to tickle corner cases like that.
>
>Thanks, all, for working on this.

If the file size is changed by exactly 2GiB is a concern. This is an issue for
files exactly a multiple of 4GiB. However, all files that are changed by a
multiple of 4GiB are vulnerable. Say 4GiB + 42 and 8GiB + 42 would appear the
same with the current version of git. I'm sure the true fix involves updating
the index file format with 64 bit files sizes and an explicit racy flag. I'm
hopeful the rehashing issue for 4GiB files can be mitigated until than.

I have a question about the coding style. Torsten indicated that there should
be an explicit type cast. The original code did not use an explicit type cast,
so I'm unsure what is going on. One of you experts may have to make the final
patch. I hope my proof of concept gets the idea across.

Thanks
--
Jason


             reply	other threads:[~2022-05-06 17:09 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-06 17:08 Jason Hatton [this message]
2022-05-06 18:32 ` [PATCH] Prevent git from rehashing 4GBi files Junio C Hamano
     [not found] <philipoakley@iee.email>
2022-05-07 18:58 ` Jason D. Hatton
     [not found] <CY4PR16MB165501ED1B535592033C76F2AFC49@CY4PR16MB1655.namprd16.prod.outlook.com>
2022-05-07 18:10 ` Jason Hatton
  -- strict thread matches above, loose matches on Subject: below --
2022-05-07  2:15 Jason Hatton
     [not found] ` <1DFD3E42-3EF3-4420-8E01-748EF3DBE7A1@iee.email>
2022-05-07 15:22   ` René Scharfe
2022-05-10 22:45 ` Philip Oakley
2022-05-11 22:24   ` Philip Oakley
2022-05-06  0:26 Jason Hatton
2022-05-06  4:37 ` Torsten Bögershausen
2022-05-06 10:22 ` Philip Oakley
2022-05-06 16:36   ` Junio C Hamano
2022-05-06 21:17     ` Philip Oakley
2022-05-06 21:23       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR16MB1655EE6CC2218AEA35B451E3AFC59@CY4PR16MB1655.namprd16.prod.outlook.com \
    --to=jhatton@globalfinishing.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=philipoakley@iee.email \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).