git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Large object issue (Windows)
@ 2019-03-05  0:04 Patrick Hogg
  2019-03-05  3:35 ` brian m. carlson
  0 siblings, 1 reply; 3+ messages in thread
From: Patrick Hogg @ 2019-03-05  0:04 UTC (permalink / raw)
  To: git-for-windows, Git Mailing List, Johannes Schindelin

Hi all,

While investigating the last issue I reported (and fixed) I was trying
to come up with a good test case for repos with large objects. In the
process I found an issue on Windows with objects at least 4g large:

git init test
cd test
echo "*.exe binary" > .gitattributes
truncate -s 4g nullbytes.exe
git stage .
git commit -m "Test"
# This will break, complaining that the object is corrupt.
git fsck --full
# This will also break, complaining that the object is corrupt.
#git gc

I did some investigation and I think that this is a porting issue.
unpack_object_header_buffer in packfile.c uses an unsigned long for the
size. On Linux this will be 64 bits (at least on the Linux systems I've
tried) but on Windows it's 32 bits. The code then decides that the
object header is bad and bombs. However, if I move the repo to a Linux
machine it can handle the data just fine. (And ironically git generated
the object header when storing the object!)

Is there any reason not to switch the unsigned longs in
unpack_object_header_buffer (and its callers, wherever that may lead)
to uint64_t? (Or any potential pitfalls in doing so that I would need
to look out for?)

Thanks,
-Patrick

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Large object issue (Windows)
  2019-03-05  0:04 Large object issue (Windows) Patrick Hogg
@ 2019-03-05  3:35 ` brian m. carlson
  2019-03-07 17:29   ` Philip Oakley
  0 siblings, 1 reply; 3+ messages in thread
From: brian m. carlson @ 2019-03-05  3:35 UTC (permalink / raw)
  To: Patrick Hogg; +Cc: git-for-windows, Git Mailing List, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]

On Mon, Mar 04, 2019 at 07:04:02PM -0500, Patrick Hogg wrote:
> Hi all,
> 
> While investigating the last issue I reported (and fixed) I was trying
> to come up with a good test case for repos with large objects. In the
> process I found an issue on Windows with objects at least 4g large:
> 
> git init test
> cd test
> echo "*.exe binary" > .gitattributes
> truncate -s 4g nullbytes.exe
> git stage .
> git commit -m "Test"
> # This will break, complaining that the object is corrupt.
> git fsck --full
> # This will also break, complaining that the object is corrupt.
> #git gc
> 
> I did some investigation and I think that this is a porting issue.
> unpack_object_header_buffer in packfile.c uses an unsigned long for the
> size. On Linux this will be 64 bits (at least on the Linux systems I've
> tried) but on Windows it's 32 bits. The code then decides that the
> object header is bad and bombs. However, if I move the repo to a Linux
> machine it can handle the data just fine. (And ironically git generated
> the object header when storing the object!)
> 
> Is there any reason not to switch the unsigned longs in
> unpack_object_header_buffer (and its callers, wherever that may lead)
> to uint64_t? (Or any potential pitfalls in doing so that I would need
> to look out for?)

It's known that there are several problems with this, affecting various
parts of the code. Patches to fix this are of course welcome.

I think we've chosen to specify size_t for values which are stored
entirely in memory, since a buffer can't be larger than this size, and
off_t for sizes which refer to files or object sizes. The latter will be
64-bit on 32-bit systems when compiled with _FILE_OFFSET_BITS set to 64,
while the former will be 32-bit.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Large object issue (Windows)
  2019-03-05  3:35 ` brian m. carlson
@ 2019-03-07 17:29   ` Philip Oakley
  0 siblings, 0 replies; 3+ messages in thread
From: Philip Oakley @ 2019-03-07 17:29 UTC (permalink / raw)
  To: brian m. carlson, Patrick Hogg, git-for-windows, Git Mailing List,
	Johannes Schindelin

On 05/03/2019 03:35, brian m. carlson wrote:
> On Mon, Mar 04, 2019 at 07:04:02PM -0500, Patrick Hogg wrote:
>> Hi all,
>>
>> While investigating the last issue I reported (and fixed) I was trying
>> to come up with a good test case for repos with large objects. In the
>> process I found an issue on Windows with objects at least 4g large:
>>
>> git init test
>> cd test
>> echo "*.exe binary" > .gitattributes
>> truncate -s 4g nullbytes.exe
>> git stage .
>> git commit -m "Test"
>> # This will break, complaining that the object is corrupt.
>> git fsck --full
>> # This will also break, complaining that the object is corrupt.
>> #git gc
>>
>> I did some investigation and I think that this is a porting issue.
>> unpack_object_header_buffer in packfile.c uses an unsigned long for the
>> size. On Linux this will be 64 bits (at least on the Linux systems I've
>> tried) but on Windows it's 32 bits. The code then decides that the
>> object header is bad and bombs. However, if I move the repo to a Linux
>> machine it can handle the data just fine. (And ironically git generated
>> the object header when storing the object!)
>>
>> Is there any reason not to switch the unsigned longs in
>> unpack_object_header_buffer (and its callers, wherever that may lead)
>> to uint64_t? (Or any potential pitfalls in doing so that I would need
>> to look out for?)
> It's known that there are several problems with this, affecting various
> parts of the code. Patches to fix this are of course welcome.
>
> I think we've chosen to specify size_t for values which are stored
> entirely in memory, since a buffer can't be larger than this size, and
> off_t for sizes which refer to files or object sizes. The latter will be
> 64-bit on 32-bit systems when compiled with _FILE_OFFSET_BITS set to 64,
> while the former will be 32-bit.

Hi Patrick,

There is also a thread on the Git-for-Windows list at 
https://github.com/git-for-windows/git/issues/1063 and also here at 
https://public-inbox.org/git/994568940.109648.1548957557643@ox.hosteurope.de/ 


Part of the issues is that zlib on windows 'sort of' fails to do >4Gb - 
see their FAQ32 - in that the length value is only 'long' which is only 
32 bit, while in fact the zlib copes fine but returns a length modulo 
that limit.

Trying to get all the places that should be upcast to size_t (ptr) or 
ptdiff_t rather than coerced down to windows 32bit long is part of the 
struggle.

Philip


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-07 17:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-05  0:04 Large object issue (Windows) Patrick Hogg
2019-03-05  3:35 ` brian m. carlson
2019-03-07 17:29   ` Philip Oakley

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).