git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Does "git push" open a pack for read before closing it?
@ 2018-12-21 12:46 git-mailinglist
  2018-12-22 23:12 ` brian m. carlson
  0 siblings, 1 reply; 3+ messages in thread
From: git-mailinglist @ 2018-12-21 12:46 UTC (permalink / raw)
  To: git

[Major ignorance alert]

I'm writing software to implement a FUSE mount for a decentralised file
system and during testing with git I see some strange behaviour which
I'd like to investigate. It might be a bug in my code, or even the FUSE
lib I'm using, or it might be intended behaviour by git.

So one thing I'd like to do is check if this is expected in git.

SYSTEM
OS: Ubuntu 18.10
git version 2.19.1
Decentralised storage mounted at ~/SAFE

What I'm doing
I'm testing my FUSE implementation for SAFE Network while exploring the
use of git with decentralised storage, so not necessarily in a sensible
arrangement (comments on that also welcome).

I have a folder at ~/SAFE/_public/tests/data1/ and want to create a bare
repo there to use as a remote from my local drive for an existing git
repo at ~/src/safe/sjs.git

Anyway, I do the following sequence of commands which are all fine up
until the last one which eventually fails:

  cd ~/SAFE/_public/tests/data1
  git init --bare blah
  cd ~/src/safe/sjs.git
  git remote remove origin
  git remote add origin ~/SAFE/_public/tests/data1/blah
  git push origin master

Here's the output from the last command above:

Enumerating objects: 373, done.
Counting objects: 100% (373/373), done.
Delta compression using up to 8 threads
Compressing objects: 100% (371/371), done.
Writing objects: 100% (373/373), 187.96 KiB | 33.00 KiB/s, done.
Total 373 (delta 254), reused 0 (delta 0)
remote: fatal: unable to open
/home/mrh/SAFE/_public/tests/data1/blah/./objects/incoming-73lbb6/pack/tmp_pack_pL28kQ:
Remote I/O error
error: remote unpack failed: index-pack abnormal exit
To /home/mrh/SAFE/_public/tests/data1/blah
 ! [remote rejected] master -> master (unpacker error)
error: failed to push some refs to '/home/mrh/SAFE/_public/tests/data1/blah'

Inspecting the logs from my FUSE implementation I see that there's a
problem related to this file on the mounted storage:

 /_public/tests/data1/blah/objects/incoming-73lbb6/pack/tmp_pack_pL28kQ

Prior to the error the file is written to multiple times by git - all
good (about 200kB in all). Then, before the file is closed I see an
attempt to open it for read, which fails. The failure is because I don't
support read on a file that is open for write yet, and I'm not sure if
that is sensible or what git might be expecting to do given the file has
not even been flushed to disk at this point.

So I'd like to know if this is expected behaviour by git (or where to
look to find out), and if it is expected, then what might git expect to
do if the file were opened successfully?

N.B. After the failure, the file is closed and then deleted!

Also note that it is possible the behaviour I'm seeing is not really git
but another issue, such as a bug in the sync/async aspect of my code.

Thanks

Mark
-- 
Secure Access For Everyone:
- SAFE Network
- First Autonomous Decentralised Internet
https://safenetwork.tech


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Does "git push" open a pack for read before closing it?
  2018-12-21 12:46 Does "git push" open a pack for read before closing it? git-mailinglist
@ 2018-12-22 23:12 ` brian m. carlson
  2019-01-07 15:56   ` git-mailinglist
  0 siblings, 1 reply; 3+ messages in thread
From: brian m. carlson @ 2018-12-22 23:12 UTC (permalink / raw)
  To: git-mailinglist; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2681 bytes --]

On Fri, Dec 21, 2018 at 12:46:35PM +0000, git-mailinglist@happybeing.com wrote:
> Here's the output from the last command above:
> 
> Enumerating objects: 373, done.
> Counting objects: 100% (373/373), done.
> Delta compression using up to 8 threads
> Compressing objects: 100% (371/371), done.
> Writing objects: 100% (373/373), 187.96 KiB | 33.00 KiB/s, done.
> Total 373 (delta 254), reused 0 (delta 0)
> remote: fatal: unable to open
> /home/mrh/SAFE/_public/tests/data1/blah/./objects/incoming-73lbb6/pack/tmp_pack_pL28kQ:
> Remote I/O error
> error: remote unpack failed: index-pack abnormal exit
> To /home/mrh/SAFE/_public/tests/data1/blah
>  ! [remote rejected] master -> master (unpacker error)
> error: failed to push some refs to '/home/mrh/SAFE/_public/tests/data1/blah'
> 
> Inspecting the logs from my FUSE implementation I see that there's a
> problem related to this file on the mounted storage:
> 
>  /_public/tests/data1/blah/objects/incoming-73lbb6/pack/tmp_pack_pL28kQ
> 
> Prior to the error the file is written to multiple times by git - all
> good (about 200kB in all). Then, before the file is closed I see an
> attempt to open it for read, which fails. The failure is because I don't
> support read on a file that is open for write yet, and I'm not sure if
> that is sensible or what git might be expecting to do given the file has
> not even been flushed to disk at this point.

What I expect is happening is that Git receives the objects and writes
them to a temporary file (which you see in "objects/incoming") and then
they're passed to either git unpack-objects or git index-pack, which
then attempts to read it.

> So I'd like to know if this is expected behaviour by git (or where to
> look to find out), and if it is expected, then what might git expect to
> do if the file were opened successfully?

This behavior is expected. POSIX says that a read that can be proved to
have occurred after a write must contain the new data, so it's possible
that a separate process may choose to read the file and index it,
knowing that the index process was started after all the writes.

This is definitely an important invariant to preserve if your FUSE
file system is going to be used on a Unix system. In other words,
consistency (in the CAP sense) is required.

> N.B. After the failure, the file is closed and then deleted!

Right, if this had succeeded, we would have renamed it into place (or
unpacked it and deleted it), but since it failed, we clean up after
ourselves so as not to leave large temporary files around.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Does "git push" open a pack for read before closing it?
  2018-12-22 23:12 ` brian m. carlson
@ 2019-01-07 15:56   ` git-mailinglist
  0 siblings, 0 replies; 3+ messages in thread
From: git-mailinglist @ 2019-01-07 15:56 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson

On 22/12/2018 23:12, brian m. carlson wrote:
Thanks Brian, you helped me make some progress. I'm stuck again trying
to understand git behaviour though and wondering if there are better
ways of me seeing into git (source, debug o/p etc) than posting here.

As a reminder, I'm doing the following to create a bare repository on my
FUSE mounted decentralised storage:

  cd ~/SAFE/_public/tests/data1
  git init --bare blah
  cd ~/src/safe/sjs.git
  git remote remove origin
  git remote add origin ~/SAFE/_public/tests/data1/blah
  git push origin master

The bugs are in my implementation of FUSE on the SAFE storage.

I get additional output from git using the following (but it doesn't
help me):
 set -x; GIT_TRACE=2 GIT_CURL_VERBOSE=2 GIT_TRACE_PERFORMANCE=2 \
 GIT_TRACE_PACK_ACCESS=2 GIT_TRACE_PACKET=2 GIT_TRACE_PACKFILE=2 \
 GIT_TRACE_SETUP=2 GIT_TRACE_SHALLOW=2 git push origin master -v -v \
 2>&1 |tee ~/git-trace.log; set +x

Anyway, to add a little to your observations...

> What I expect is happening is that Git receives the objects and writes
> them to a temporary file (which you see in "objects/incoming") and then
> they're passed to either git unpack-objects or git index-pack, which
> then attempts to read it.
The git console output seems to confirm it is 'git index-pack' that
encounters the error, which is currently:

  Enumerating objects: 373, done.
  Counting objects: 100% (373/373), done.
  Delta compression using up to 8 threads
  Compressing objects: 100% (371/371), done.
  Writing objects: 100% (373/373), 192.43 KiB | 54.00 KiB/s, done.
  Total 373 (delta 255), reused 0 (delta 0)
  remote: fatal: premature end of pack file, 36 bytes missing
  remote: fatal: premature end of pack file, 65 bytes missing
  error: remote unpack failed: index-pack abnormal exit
  To /home/mrh/SAFE/_public/tests/data1/blah
   ! [remote rejected] master -> master (unpacker error)
  error: failed to push some refs to
'/home/mrh/SAFE/_public/tests/data/blah'

So I conclude I'm either not writing the file properly, or not reading
it back properly. I can continue looking into that of course, but
looking at the file requests I'm curious about what git is doing and how
to learn more about it as it looks odd.

I have quite a few questions, but will focus on just the point at which
it bails out. In summary, what I see is:

- The pack file is created and written with multiple calls, ending up
about 200k long.

- While still open for write, it is opened *four* times, so git has five
handles active on it. One write and four read.

- At this point I see the following FUSE read operation:

  read('/_public/tests/data1/blah/objects/incoming-quFPHB
        /pack/tmp_pack_E4ea92', 58, buf, 4096, 16384)

  58 is the file handle, 4096 the length of buf, and 16384 the position

- Presumably this is where git encounters a problem because it then
closes everything and cleans up the incoming directory.

It seems odd to me that it is starting to read the pack file at position
16384 rather than at 0 (or at 12 after the header). I can surmise it
might open it four times to speed access, but would expect to see it
read the beginning of the file (or at position 12) before trying to
interpret the content and bailing out.

So I'm wondering what git is doing there. Any comments on this, or a
pointer to the relevant git code so I can look myself would be great.

Thanks,

Mak

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-07 15:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-21 12:46 Does "git push" open a pack for read before closing it? git-mailinglist
2018-12-22 23:12 ` brian m. carlson
2019-01-07 15:56   ` git-mailinglist

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).