git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Allow clean/smudge filters to handle huge files in the LLP64 data model
@ 2021-10-27  7:49 Johannes Schindelin via GitGitGadget
  2021-10-27  7:49 ` [PATCH 1/5] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 78+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2021-10-27  7:49 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

This patch series came in via the Git for Windows fork
[https://github.com/git-for-windows/git/pull/3487], and I intend to merge it
before v2.34.0-rc0, therefore I appreciate every careful review you gentle
people can spare.

The x86_64 variant of Windows uses the LLP64 data model, where the long data
type is 32-bit. This is very different from the LP64 data model used e.g. by
x86_64 Linux, where unsigned long is 64-bit.

Most notably, this means that sizeof(unsigned long) != sizeof(size_t) in
general.

However, since Git was born in the Linux ecosystem, where that inequality
does not hold true, it is understandable that unsigned long is used in many
code locations where size_t should have been used. As a consequence, quite a
few things are broken e.g. on Windows, when it comes to 4GB file contents or
larger.

Using Git LFS [https://git-lfs.github.io/] trying to work around such issues
is one such a broken scenario. You cannot git checkout, say, 5GB files. Huge
files will be truncated to whatever the file size is modulo 4GB (in the case
of a 5GB file, it would be truncated to 1GB).

This patch series primarily fixes the Git LFS scenario, by allowing clean
filters to accept 5GB files, and by allowing smudge filters to produce 5GB
files.

The much larger project to teach Git to use size_t instead of unsigned long
in all the appropriate places is hardly scratched by this patch series.

Side note: The fix for the clean filter included in this series does not
actually affect Git LFS! The reason is that Git LFS marks its filter as
required, and therefore Git streams the file contents to Git LFS via a file
descriptor (which is unaffected by LLP64). A "clean" filter that is not
marked as required, however, lets Git take the code path that is fixed by
this patch series.

Johannes Schindelin (1):
  git-compat-util: introduce more size_t helpers

Matt Cooper (4):
  t1051: introduce a smudge filter test for extremely large files
  odb: teach read_blob_entry to use size_t
  odb: guard against data loss checking out a huge file
  clean/smudge: allow clean filters to process extremely large files

 convert.c                   |  2 +-
 delta.h                     |  6 +++---
 entry.c                     |  8 +++++---
 entry.h                     |  2 +-
 git-compat-util.h           | 25 +++++++++++++++++++++++++
 object-file.c               |  6 +++---
 packfile.c                  |  6 +++---
 parallel-checkout.c         |  2 +-
 t/t1051-large-conversion.sh | 22 ++++++++++++++++++++++
 9 files changed, 64 insertions(+), 15 deletions(-)


base-commit: ebf3c04b262aa27fbb97f8a0156c2347fecafafb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1068%2Fdscho%2Fhuge-file-smudge-clean-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1068/dscho/huge-file-smudge-clean-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1068
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2021-11-04 17:27 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-27  7:49 [PATCH 0/5] Allow clean/smudge filters to handle huge files in the LLP64 data model Johannes Schindelin via GitGitGadget
2021-10-27  7:49 ` [PATCH 1/5] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
2021-10-28  7:15   ` Carlo Arenas
2021-10-28  8:54     ` [PATCH] helper/test-genzeros: allow more than 2G zeros in Windows Carlo Marcelo Arenas Belón
2021-10-28 20:32       ` Johannes Schindelin
2021-10-27  7:49 ` [PATCH 2/5] odb: teach read_blob_entry to use size_t Matt Cooper via GitGitGadget
2021-10-27  7:49 ` [PATCH 3/5] git-compat-util: introduce more size_t helpers Johannes Schindelin via GitGitGadget
2021-10-27  7:49 ` [PATCH 4/5] odb: guard against data loss checking out a huge file Matt Cooper via GitGitGadget
2021-10-27  7:49 ` [PATCH 5/5] clean/smudge: allow clean filters to process extremely large files Matt Cooper via GitGitGadget
2021-10-28 20:50 ` [PATCH v2 0/7] Allow clean/smudge filters to handle huge files in the LLP64 data model Johannes Schindelin via GitGitGadget
2021-10-28 20:50   ` [PATCH v2 1/7] test-genzeros: allow more than 2G zeros in Windows Carlo Marcelo Arenas Belón via GitGitGadget
2021-10-28 20:50   ` [PATCH v2 2/7] test-tool genzeros: generate large amounts of data more efficiently Johannes Schindelin via GitGitGadget
2021-10-28 22:55     ` Junio C Hamano
2021-10-28 20:50   ` [PATCH v2 3/7] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
2021-10-28 20:50   ` [PATCH v2 4/7] odb: teach read_blob_entry to use size_t Matt Cooper via GitGitGadget
2021-10-28 22:14     ` Carlo Arenas
2021-10-28 22:21       ` Johannes Schindelin
2021-10-28 20:50   ` [PATCH v2 5/7] git-compat-util: introduce more size_t helpers Johannes Schindelin via GitGitGadget
2021-10-28 23:05     ` Junio C Hamano
2021-10-28 20:50   ` [PATCH v2 6/7] odb: guard against data loss checking out a huge file Matt Cooper via GitGitGadget
2021-10-28 20:50   ` [PATCH v2 7/7] clean/smudge: allow clean filters to process extremely large files Matt Cooper via GitGitGadget
2021-10-28 22:32   ` [PATCH v2 0/7] Allow clean/smudge filters to handle huge files in the LLP64 data model brian m. carlson
2021-10-28 23:07     ` Junio C Hamano
2021-10-29 13:59   ` [PATCH v3 0/8] " Johannes Schindelin via GitGitGadget
2021-10-29 13:59     ` [PATCH v3 1/8] test-genzeros: allow more than 2G zeros in Windows Carlo Marcelo Arenas Belón via GitGitGadget
2021-10-29 13:59     ` [PATCH v3 2/8] test-tool genzeros: generate large amounts of data more efficiently Johannes Schindelin via GitGitGadget
2021-10-29 22:50       ` Junio C Hamano
2021-10-29 13:59     ` [PATCH v3 3/8] test-lib: add prerequisite for 64-bit platforms Carlo Marcelo Arenas Belón via GitGitGadget
2021-10-29 22:52       ` Junio C Hamano
2021-11-02 14:35         ` Johannes Schindelin
2021-10-29 13:59     ` [PATCH v3 4/8] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
2021-10-29 23:00       ` Junio C Hamano
2021-10-29 23:21         ` Junio C Hamano
2021-11-02 14:56           ` Johannes Schindelin
2021-11-02 14:57         ` Johannes Schindelin
2021-10-29 13:59     ` [PATCH v3 5/8] odb: teach read_blob_entry to use size_t Matt Cooper via GitGitGadget
2021-10-29 23:17       ` Junio C Hamano
2021-11-02 15:10         ` Johannes Schindelin
2021-10-29 13:59     ` [PATCH v3 6/8] git-compat-util: introduce more size_t helpers Johannes Schindelin via GitGitGadget
2021-10-29 23:10       ` Junio C Hamano
2021-10-29 13:59     ` [PATCH v3 7/8] odb: guard against data loss checking out a huge file Matt Cooper via GitGitGadget
2021-10-29 23:13       ` Junio C Hamano
2021-10-29 13:59     ` [PATCH v3 8/8] clean/smudge: allow clean filters to process extremely large files Matt Cooper via GitGitGadget
2021-10-29 23:17       ` Junio C Hamano
2021-11-02 14:59         ` Johannes Schindelin
2021-10-29 18:34     ` [PATCH v3 0/8] Allow clean/smudge filters to handle huge files in the LLP64 data model Junio C Hamano
     [not found]       ` <nycvar.QRO.7.76.6.2110292239170.56@tvgsbejvaqbjf.bet>
2021-10-29 21:12         ` Johannes Schindelin
2021-10-29 23:25           ` Junio C Hamano
2021-10-30 15:16           ` Philip Oakley
2021-10-30 17:35             ` Torsten Bögershausen
2021-10-30 19:29               ` Philip Oakley
2021-11-02 14:41       ` Johannes Schindelin
2021-11-02 15:46     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 1/8] test-genzeros: allow more than 2G zeros in Windows Carlo Marcelo Arenas Belón via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 2/8] test-tool genzeros: generate large amounts of data more efficiently Johannes Schindelin via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 3/8] test-lib: add prerequisite for 64-bit platforms Carlo Marcelo Arenas Belón via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 4/8] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 5/8] odb: teach read_blob_entry to use size_t Matt Cooper via GitGitGadget
2021-11-02 20:40         ` Torsten Bögershausen
2021-11-04  0:09           ` Johannes Schindelin
2021-11-04 12:24             ` Philip Oakley
2021-11-02 15:46       ` [PATCH v4 6/8] git-compat-util: introduce more size_t helpers Johannes Schindelin via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 7/8] odb: guard against data loss checking out a huge file Matt Cooper via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 8/8] clean/smudge: allow clean filters to process extremely large files Matt Cooper via GitGitGadget
2021-11-02 20:47         ` Torsten Bögershausen
2021-11-04  0:11           ` Johannes Schindelin
2021-11-04  8:33             ` Torsten Bögershausen
2021-11-04 17:26         ` Junio C Hamano
2021-11-02 21:46       ` [PATCH v4 0/8] Allow clean/smudge filters to handle huge files in the LLP64 data model Torsten Bögershausen
2021-11-03  6:31         ` Johannes Sixt
2021-10-28 20:56 ` [PATCH 0/3] " Carlo Marcelo Arenas Belón
2021-10-28 20:56   ` [PATCH 1/3] test-lib: add prerequisite for 64-bit platforms Carlo Marcelo Arenas Belón
2021-10-28 21:45     ` Johannes Schindelin
2021-10-28 22:09       ` Carlo Arenas
2021-10-28 22:38         ` Junio C Hamano
2021-11-02 15:20           ` Johannes Schindelin
2021-10-28 20:56   ` [PATCH 2/3] fixup! t1051: introduce a smudge filter test for extremely large files Carlo Marcelo Arenas Belón
2021-10-28 20:56   ` [PATCH 3/3] fixup! clean/smudge: allow clean filters to process " Carlo Marcelo Arenas Belón

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).