From: "Jason D. Hatton" <jason.hatton@gmail.com> To: jhatton@globalfinishing.com Cc: git@vger.kernel.org, gitster@pobox.com, l.s.r@web.de Subject: [PATCH] Prevent git from rehashing 4GBi files Date: Sat, 7 May 2022 13:58:14 -0500 [thread overview] Message-ID: <20220507185813.1403802-1-jhatton@globalfinishing.com> (raw) In-Reply-To: <philipoakley@iee.email> Git cache stores file sizes using uint32_t. This causes any file that is a multiple of 2^32 to have a cached file size of zero. Zero is a special value used by racily clean. This causes git to rehash every file that is a multiple of 2^32 every time git status or git commit is run. This patch mitigates the problem by making all files that are a multiple of 2^32 appear to have a size of 1 instead of zero. Signed-off-by: Jason D. Hatton <jhatton@globalfinishing.com> --- read-cache.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/read-cache.c b/read-cache.c index 4df97e185e..d80c80ef90 100644 --- a/read-cache.c +++ b/read-cache.c @@ -163,6 +163,22 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n add_index_entry(istate, new_entry, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE); } +/* + * stat_data only stores file sizes as an unsigned int. Any file that is an + * exact multiple of 4GiB will get a cached file size of zero. Unfortunately, + * this is a special flag used by the racy update logic. Substitute a new file + * size if a non-zero sized file would would be cached as zero. 1U<<31 is used + * as the substitute because it is the furthest away from 0 and 4GiB. + */ +static inline unsigned int munge_st_size(off_t st_size) { + unsigned int sd_size = (unsigned int) st_size; + + if (!sd_size && st_size) + return 1U<<31; + else + return sd_size; +} + void fill_stat_data(struct stat_data *sd, struct stat *st) { sd->sd_ctime.sec = (unsigned int)st->st_ctime; @@ -173,7 +189,7 @@ void fill_stat_data(struct stat_data *sd, struct stat *st) sd->sd_ino = st->st_ino; sd->sd_uid = st->st_uid; sd->sd_gid = st->st_gid; - sd->sd_size = st->st_size; + sd->sd_size = munge_st_size(st->st_size); } int match_stat_data(const struct stat_data *sd, struct stat *st) @@ -212,7 +228,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st) changed |= INODE_CHANGED; #endif - if (sd->sd_size != (unsigned int) st->st_size) + if (sd->sd_size != munge_st_size(st->st_size)) changed |= DATA_CHANGED; return changed; -- 2.36.0.3
next parent reply other threads:[~2022-05-07 18:59 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <philipoakley@iee.email> 2022-05-07 18:58 ` Jason D. Hatton [this message] [not found] <CY4PR16MB165501ED1B535592033C76F2AFC49@CY4PR16MB1655.namprd16.prod.outlook.com> 2022-05-07 18:10 ` Jason Hatton 2022-05-07 2:15 Jason Hatton [not found] ` <1DFD3E42-3EF3-4420-8E01-748EF3DBE7A1@iee.email> 2022-05-07 15:22 ` René Scharfe 2022-05-10 22:45 ` Philip Oakley 2022-05-11 22:24 ` Philip Oakley -- strict thread matches above, loose matches on Subject: below -- 2022-05-06 17:08 Jason Hatton 2022-05-06 18:32 ` Junio C Hamano 2022-05-06 0:26 Jason Hatton 2022-05-06 4:37 ` Torsten Bögershausen 2022-05-06 10:22 ` Philip Oakley 2022-05-06 16:36 ` Junio C Hamano 2022-05-06 21:17 ` Philip Oakley 2022-05-06 21:23 ` Junio C Hamano
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220507185813.1403802-1-jhatton@globalfinishing.com \ --to=jason.hatton@gmail.com \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=jhatton@globalfinishing.com \ --cc=l.s.r@web.de \ --subject='Re: [PATCH] Prevent git from rehashing 4GBi files' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).