From: "Jason D. Hatton" <jason.hatton@gmail.com>
To: jhatton@globalfinishing.com
Cc: git@vger.kernel.org, gitster@pobox.com, l.s.r@web.de
Subject: [PATCH] Prevent git from rehashing 4GBi files
Date: Sat, 7 May 2022 13:58:14 -0500 [thread overview]
Message-ID: <20220507185813.1403802-1-jhatton@globalfinishing.com> (raw)
In-Reply-To: <philipoakley@iee.email>
Git cache stores file sizes using uint32_t. This causes any file
that is a multiple of 2^32 to have a cached file size of zero.
Zero is a special value used by racily clean. This causes git to
rehash every file that is a multiple of 2^32 every time git status
or git commit is run.
This patch mitigates the problem by making all files that are a
multiple of 2^32 appear to have a size of 1 instead of zero.
Signed-off-by: Jason D. Hatton <jhatton@globalfinishing.com>
---
read-cache.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/read-cache.c b/read-cache.c
index 4df97e185e..d80c80ef90 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,6 +163,22 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
add_index_entry(istate, new_entry, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
}
+/*
+ * stat_data only stores file sizes as an unsigned int. Any file that is an
+ * exact multiple of 4GiB will get a cached file size of zero. Unfortunately,
+ * this is a special flag used by the racy update logic. Substitute a new file
+ * size if a non-zero sized file would would be cached as zero. 1U<<31 is used
+ * as the substitute because it is the furthest away from 0 and 4GiB.
+ */
+static inline unsigned int munge_st_size(off_t st_size) {
+ unsigned int sd_size = (unsigned int) st_size;
+
+ if (!sd_size && st_size)
+ return 1U<<31;
+ else
+ return sd_size;
+}
+
void fill_stat_data(struct stat_data *sd, struct stat *st)
{
sd->sd_ctime.sec = (unsigned int)st->st_ctime;
@@ -173,7 +189,7 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
sd->sd_ino = st->st_ino;
sd->sd_uid = st->st_uid;
sd->sd_gid = st->st_gid;
- sd->sd_size = st->st_size;
+ sd->sd_size = munge_st_size(st->st_size);
}
int match_stat_data(const struct stat_data *sd, struct stat *st)
@@ -212,7 +228,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
changed |= INODE_CHANGED;
#endif
- if (sd->sd_size != (unsigned int) st->st_size)
+ if (sd->sd_size != munge_st_size(st->st_size))
changed |= DATA_CHANGED;
return changed;
--
2.36.0.3
next parent reply other threads:[~2022-05-07 18:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <philipoakley@iee.email>
2022-05-07 18:58 ` Jason D. Hatton [this message]
[not found] <CY4PR16MB165501ED1B535592033C76F2AFC49@CY4PR16MB1655.namprd16.prod.outlook.com>
2022-05-07 18:10 ` [PATCH] Prevent git from rehashing 4GBi files Jason Hatton
2022-05-07 2:15 Jason Hatton
[not found] ` <1DFD3E42-3EF3-4420-8E01-748EF3DBE7A1@iee.email>
2022-05-07 15:22 ` René Scharfe
2022-05-10 22:45 ` Philip Oakley
2022-05-11 22:24 ` Philip Oakley
-- strict thread matches above, loose matches on Subject: below --
2022-05-06 17:08 Jason Hatton
2022-05-06 18:32 ` Junio C Hamano
2022-05-06 0:26 Jason Hatton
2022-05-06 4:37 ` Torsten Bögershausen
2022-05-06 10:22 ` Philip Oakley
2022-05-06 16:36 ` Junio C Hamano
2022-05-06 21:17 ` Philip Oakley
2022-05-06 21:23 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220507185813.1403802-1-jhatton@globalfinishing.com \
--to=jason.hatton@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jhatton@globalfinishing.com \
--cc=l.s.r@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).