git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Daniel Barkalow <barkalow@iabervon.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Jason McMullan <jason.mcmullan@timesys.com>,
	Linus Torvalds <torvalds@osdl.org>,
	git@vger.kernel.org
Subject: Re: [PATCH] pull: gracefully recover from delta retrieval failure.
Date: Sun, 5 Jun 2005 16:02:57 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.21.0506051523280.30848-100000@iabervon.org> (raw)
In-Reply-To: <7vekbg3de7.fsf@assigned-by-dhcp.cox.net>

On Sun, 5 Jun 2005, Junio C Hamano wrote:

> >>>>> "JM" == Jason McMullan <jason.mcmullan@timesys.com> writes:
> 
> JM> Sorry about being a pest, but this worries me. Please assuage my fears.
> 
> Earlier I said I suspected that the original code mishandled
> recovery from a botched tree/commit dependent transfer, but that
> was not the case.  The last test in the new test script I added
> in the patch you are responding to covers that case.

It does the O(history) method for correctness at the expense of
efficiency; my hope is that a bit of caching can fix the efficiency issue
as well. So the question is not really "not safe" as "slow". Of course, it
takes a while for this to become an issue, given the relationship of
remote access latency to local access bandwidth. That is, you need a
really big history and to be getting very little new data before you'll
complain.

> JM> (Or, if you'd like, I can rework pull.c to use the
> JM>  verification-before-store technique I used in my git-daemon patch, so
> JM>  all the *-pull mechanisms will be 'safe')
> 
> I would appreciate the offer.  I, however, would have to warn
> you that the "problem" lies in the way the current pull
> structure devides responsibility between the pull.c and transfer
> backends.  The pull.c implements the dependency logic, and
> transfer backends are to populate the database while being
> oblivious of that logic.  From the purist point of view (I am
> sympathetic to your "place only the verified objects in the
> database" principle), I am not entirely happy with that
> division, but at the same time I understand why it is done that
> way and even like it from practical standpoint.

At one point I'd written a patch that split out the tmpfile usage of
write_sha1_file(), made the filenames predictable, and used it for
everything that writes those files. It had an "open" part and a
"close" part (where the close also moved the file into place). This would
give the code better atomicity and protect against races between reading
and validation. On the other hand, there's no reason to use an anonymous
temp file; just <filename>.partial or similar (with the proper open
flags) would be sufficient and easier to clean or commit. Note that we
want to support /tmp and the object directory being on different
filesystems, also. (And all the open and place logic is nicely wrapped up
in sha1_file.c)

Aside from the question of whether we want to insist that the object
database only includes objects such that everything reachable is also
present, we certainly want to only include objects which we have
completely fetched, which are generically well-formed, and which have the
advertized hash, and having there never be an unvalidated file at the
filename would be good.

By this reasoning, a file should only be renamed after all of the delta
requirements are satisfied, but before tree and commit requirements are
satisfied. We certainly aren't going to have much use for files whose
contents we cannot get. This means that we'd like to have multiple
unplaced files, but we don't need to read the contents of an unplaced
file.

	-Daniel
*This .sig left intentionally blank*


  reply	other threads:[~2005-06-05 20:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-05  6:11 [PATCH] pull: gracefully recover from delta retrieval failure Junio C Hamano
2005-06-05 16:38 ` Jason McMullan
2005-06-05 17:24   ` Daniel Barkalow
2005-06-05 17:46   ` Junio C Hamano
2005-06-05 20:02     ` Daniel Barkalow [this message]
2005-06-06 13:50       ` Database consistency after a successful pull McMullan, Jason
2005-06-06 16:21         ` Daniel Barkalow
2005-06-06 18:30           ` McMullan, Jason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.21.0506051523280.30848-100000@iabervon.org \
    --to=barkalow@iabervon.org \
    --cc=git@vger.kernel.org \
    --cc=jason.mcmullan@timesys.com \
    --cc=junkio@cox.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).