git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* test suite: why does git add large_file create a pack, rather than an object?
@ 2019-03-30 14:13 Philip Oakley
  2019-04-01 10:47 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Philip Oakley @ 2019-03-30 14:13 UTC (permalink / raw)
  To: Git List; +Cc: Torsten Bögershausen, Thomas Braun, Johannes Schindelin

I'm looking at the Git-for-Windows(GfW) >4Gb  large file problem 
following Torsten and Thomas's work (and others). [1,2,3, etc.]

I've added a few more changes to my branch [2] to get the zlib to 
properly count past its 32bit limit when wrapped in the git_inflate 
(etc) wrapper code. [4]

At the moment I'm using an extended _test_ case that starts by adding a 
~5.1Gb file and then using verify-pack, which aborts with an error.

         dd if=/dev/zero of=file bs=1M count=5100 &&
         git config core.compression 0 &&
         git config core.looseCompression 0 &&
         git add file &&
         git verify-pack -s .git/objects/pack/*.pack &&
         git fsck --verbose --strict --full &&
         ...

If however I simple execute the commands from the GfW bash, the added 
file is stored as a blob object, rather than a pack.

I'm at a loss to understand the reason for the change in behaviour 
[store file as pack, vs store as object] between running the code as a 
test script and at the terminal. What am I missing?

I have 'good' output from the test script on the WSL (and have 
identified the packs' specific byte differences), but my gdb experience 
is limited so executing that test while within the test script meant I 
couldn't start debugging there. Hence the direct execution from the 
terminal that raised the issue.

-- 

Philip
[1] <20190131203842.633ztr4yckn7kl2d@tb-raspi4>
[2] https://github.com/gitgitgadget/git/pull/115#issuecomment-4753008375
[3] https://github.com/git-for-windows/git/issues/1848
[4] https://github.com/PhilipOakley/git/tree/size_t2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: test suite: why does git add large_file create a pack, rather than an object?
  2019-03-30 14:13 test suite: why does git add large_file create a pack, rather than an object? Philip Oakley
@ 2019-04-01 10:47 ` Junio C Hamano
  2019-04-01 15:09   ` Philip Oakley
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2019-04-01 10:47 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Git List, Torsten Bögershausen, Thomas Braun,
	Johannes Schindelin

Philip Oakley <philipoakley@iee.org> writes:

> At the moment I'm using an extended _test_ case that starts by adding
> a ~5.1Gb file and then using verify-pack, which aborts with an error.
>
>         dd if=/dev/zero of=file bs=1M count=5100 &&
>         git config core.compression 0 &&
>         git config core.looseCompression 0 &&
>         git add file &&
>         git verify-pack -s .git/objects/pack/*.pack &&
>         git fsck --verbose --strict --full &&
>         ...
>
> If however I simple execute the commands from the GfW bash, the added
> file is stored as a blob object, rather than a pack.
>
> I'm at a loss to understand the reason for the change in behaviour
> [store file as pack, vs store as object] between running the code as a
> test script and at the terminal. What am I missing?

To which test are you adding the above piece?  Perhaps one of those
that configures core.bigfilethreashold?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: test suite: why does git add large_file create a pack, rather than an object?
  2019-04-01 10:47 ` Junio C Hamano
@ 2019-04-01 15:09   ` Philip Oakley
  2019-04-02 10:35     ` Duy Nguyen
  0 siblings, 1 reply; 4+ messages in thread
From: Philip Oakley @ 2019-04-01 15:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git List, Torsten Bögershausen, Thomas Braun,
	Johannes Schindelin

hi Junio,
On 01/04/2019 11:47, Junio C Hamano wrote:
> Philip Oakley <philipoakley@iee.org> writes:
>
>> At the moment I'm using an extended _test_ case that starts by adding
>> a ~5.1Gb file and then using verify-pack, which aborts with an error.
>>
>>          dd if=/dev/zero of=file bs=1M count=5100 &&
>>          git config core.compression 0 &&
>>          git config core.looseCompression 0 &&
>>          git add file &&
>>          git verify-pack -s .git/objects/pack/*.pack &&
>>          git fsck --verbose --strict --full &&
>>          ...
>>
>> If however I simple execute the commands from the GfW bash, the added
>> file is stored as a blob object, rather than a pack.
>>
>> I'm at a loss to understand the reason for the change in behaviour
>> [store file as pack, vs store as object] between running the code as a
>> test script and at the terminal. What am I missing?
> To which test are you adding the above piece?  Perhaps one of those
> that configures core.bigfilethreashold?
The test script (t-large-files-on-winows.sh: [1] below) was specific to 
this debugging.

I didn't set core.bigfilethreshold - Is that done (or unset) by the test 
setup at all?

It does prompt me to check that all the bigfilethreshold checks are 
actually size_t, rather than a simple 'long'/uInt which would only be 
32bits on Windows and potentially a downcast comparison, resulting in 
mistaken bigfile actions because of the modulo 2^32 action.

So when I run the test script [1] on Windows I get my error from 
verify-pack, and the trash directory contains a single pack file.
I tried doing the commands singly on a fresh repo, but that time found 
that the add/verify produced a blob object (rather than a pack with one 
object), so it got me wondering if I was testing like for like.

When I tried using gdb at the add stage, with a break point, I got a 
back trace [2], and when run to completion it had the loose object, so I 
was confused. (my fixup code is at [3])

Does the test_expect_success clear all the local/user config variable to 
sandbox the tests?

Philip


[1] 
https://github.com/PhilipOakley/git/blob/size_t2/t/t-large-files-on-windows.sh 


    #!/bin/sh

    test_description='test large file handling on windows'
    . ./test-lib.sh

    test_expect_success SIZE_T_IS_64BIT 'blah blubb' '

    	test-tool zlib-compile-flags >zlibFlags.txt &&
    	dd if=/dev/zero of=file bs=1M count=5100 &&
    	git config core.compression 0 &&
    	git config core.looseCompression 0 &&
    	gdb git  &&
    	git verify-pack -s .git/objects/pack/*.pack &&
    	git fsck --verbose --strict --full &&
    	git commit -m msg file &&
    	git verify-pack -s .git/objects/pack/*.pack &&
    	git log --stat &&
    	git fsck --verbose --strict --full &&
    	git repack -a -f &&
    	git verify-pack -s .git/objects/pack/*.pack &&
    	git verify-pack -v .git/objects/pack/*.pack &&
    	git gc
    '

    test_done


[2] Thread 1 hit Breakpoint 1, git_deflate (strm=0x138dac0, flush=0) at 
zlib.c:235
235                     zlib_pre_call(strm);
(gdb) bt
#0  git_deflate (strm=0x138dac0, flush=0) at zlib.c:235
#1  0x00000000005f4d11 in write_loose_object (oid=0x7ff4fda90070,
     hdr=0x138f560 "blob 5347737600", hdrlen=16, buf=0x7fff0000,
     len=5347737600, mtime=0) at sha1-file.c:1707
#2  0x00000000005f50db in write_object_file (buf=0x7fff0000, len=5347737600,
     type=0x6e3e9c <__ac_HASH_UPPER+20> "blob", oid=0x7ff4fda90070)
     at sha1-file.c:1779
#3  0x00000000005f5696 in index_mem (istate=0x78a3c0 <the_index>,
     oid=0x7ff4fda90070, buf=0x7fff0000, size=5347737600, type=OBJ_BLOB,
     path=0x13b6c2c "file", flags=1) at sha1-file.c:1901
#4  0x00000000005f5aca in index_core (istate=0x78a3c0 <the_index>,
     oid=0x7ff4fda90070, fd=4, size=5347737600, type=OBJ_BLOB,
     path=0x13b6c2c "file", flags=1) at sha1-file.c:1975
#5  0x00000000005f5c5d in index_fd (istate=0x78a3c0 <the_index>,
     oid=0x7ff4fda90070, fd=4, st=0x138f860, type=OBJ_BLOB,
     path=0x13b6c2c "file", flags=1) at sha1-file.c:2019
#6  0x00000000005f5d9a in index_path (istate=0x78a3c0 <the_index>,
     oid=0x7ff4fda90070, path=0x13b6c2c "file", st=0x138f860, flags=1)
     at sha1-file.c:2040
#7  0x000000000059e818 in add_to_index (istate=0x78a3c0 <the_index>,
     path=0x13b6c2c "file", st=0x138f860, flags=0) at read-cache.c:763
#8  0x000000000059ea04 in add_file_to_index (istate=0x78a3c0 <the_index>,
     path=0x13b6c2c "file", flags=0) at read-cache.c:796
#9  0x0000000000404b73 in add_files (dir=0x138f9c0, flags=0)
     at builtin/add.c:378
#10 0x0000000000405286 in cmd_add (argc=1, argv=0x1390658, prefix=0x0)
     at builtin/add.c:534
#11 0x0000000000402e86 in run_builtin (p=0x68a040 <commands>, argc=2,
     argv=0x1390650) at git.c:422
#12 0x000000000040326a in handle_builtin (argc=2, argv=0x1390650) at 
git.c:654
#13 0x00000000004034ac in run_argv (argcp=0x138fd90, argv=0x138fd38)
     at git.c:708
#14 0x00000000004038ca in cmd_main (argc=2, argv=0x1390650) at git.c:830
#15 0x00000000004c2c5f in main (argc=3, argv=0x1390648) at common-main.c:45
(gdb)

[3] https://github.com/PhilipOakley/git/tree/size_t2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: test suite: why does git add large_file create a pack, rather than an object?
  2019-04-01 15:09   ` Philip Oakley
@ 2019-04-02 10:35     ` Duy Nguyen
  0 siblings, 0 replies; 4+ messages in thread
From: Duy Nguyen @ 2019-04-02 10:35 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Junio C Hamano, Git List, Torsten Bögershausen, Thomas Braun,
	Johannes Schindelin

On Mon, Apr 1, 2019 at 10:10 PM Philip Oakley <philipoakley@iee.org> wrote:
>
> hi Junio,
> On 01/04/2019 11:47, Junio C Hamano wrote:
> > Philip Oakley <philipoakley@iee.org> writes:
> >
> >> At the moment I'm using an extended _test_ case that starts by adding
> >> a ~5.1Gb file and then using verify-pack, which aborts with an error.
> >>
> >>          dd if=/dev/zero of=file bs=1M count=5100 &&
> >>          git config core.compression 0 &&
> >>          git config core.looseCompression 0 &&
> >>          git add file &&
> >>          git verify-pack -s .git/objects/pack/*.pack &&
> >>          git fsck --verbose --strict --full &&
> >>          ...
> >>
> >> If however I simple execute the commands from the GfW bash, the added
> >> file is stored as a blob object, rather than a pack.
> >>
> >> I'm at a loss to understand the reason for the change in behaviour
> >> [store file as pack, vs store as object] between running the code as a
> >> test script and at the terminal. What am I missing?
> > To which test are you adding the above piece?  Perhaps one of those
> > that configures core.bigfilethreashold?
> The test script (t-large-files-on-winows.sh: [1] below) was specific to
> this debugging.
>
> I didn't set core.bigfilethreshold - Is that done (or unset) by the test
> setup at all?
>
> It does prompt me to check that all the bigfilethreshold checks are
> actually size_t, rather than a simple 'long'/uInt which would only be
> 32bits on Windows and potentially a downcast comparison, resulting in
> mistaken bigfile actions because of the modulo 2^32 action.
>
> So when I run the test script [1] on Windows I get my error from
> verify-pack, and the trash directory contains a single pack file.
> I tried doing the commands singly on a fresh repo, but that time found
> that the add/verify produced a blob object (rather than a pack with one
> object), so it got me wondering if I was testing like for like.
>
> When I tried using gdb at the add stage, with a break point, I got a
> back trace [2], and when run to completion it had the loose object, so I
> was confused. (my fixup code is at [3])

Streaming a blob directly to a pack is done by index_stream(). I
suggest you force a crashwhen that function is called (from your test
script) then examine with gdb for more info. You should be able to see
what's its caller (in case it's not index_fd), then perhaps you could
add a bunch of printtfs to show all the conditions that lead (or not
lead) to that function?

There are some would_convert_ calls in index_fd(). Maybe some other
config keys are affecting this.

PS. I also don't know what index_stream_convert_blob() does. Not sure
if it's really streaming to blob or streaming from somewhee to a
converter. You might want to check that too.
-- 
Duy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-04-02 10:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-30 14:13 test suite: why does git add large_file create a pack, rather than an object? Philip Oakley
2019-04-01 10:47 ` Junio C Hamano
2019-04-01 15:09   ` Philip Oakley
2019-04-02 10:35     ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).