git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: [PATCH 3/4] peel_ref: check object type before loading
Date: Thu, 04 Oct 2012 13:41:40 -0700	[thread overview]
Message-ID: <7vd30yx94r.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20121004194150.GA13955@sigill.intra.peff.net> (Jeff King's message of "Thu, 4 Oct 2012 15:41:50 -0400")

Jeff King <peff@peff.net> writes:

> [1] One thing I've been toying is with "external alternates"; dumping
>     your large objects in some realtively slow data store (e.g., a
>     RESTful HTTP service). You could cache and cheaply query a list of
>     "sha1 / size / type" for each object from the store, but getting the
>     actual objects would be much more expensive. But again, it would
>     depend on whether you would actually have such a store directly
>     accessible by a ref.

Yeah, that actually has been another thing we were discussing
locally, without coming to something concrete enough to present to
the list.

The basic idea is to mark such paths with attributes, and use a
variant of smudge/clean filter that is _not_ a filter (as we do not
want to have the interface to this external helper to be "we feed
the whole big blob to you").  Instead, these smudgex/cleanx things
work on a pathname.

 - Your in-tree objects store a blob that records a description of
   the large thing.  Call such a blob a surrogate.  "clone", "fetch"
   and "push" all deal only with surrogates so your in-history data
   will stay small.

 - When checking out, the attributes mechanism kicks in and runs the
   "not filter" variant of smudge with the data in the surrogate.

   The surrogate records how to get the real thing from where, and
   how to validate what you got is correct.  A hand-wavy example may
   look like this:

   	get: download http://cdn.example.com/67def20
        sha1sum: f84667def209e4a84e37e8488a08e9eca3f208c1

   to tell you to download a single URL with whatever means suitable
   for your platform (perhaps curl or wget), and verify the result
   by running sha1sum.  Or it may involve

	get: git-fetch git://git.example.com/images.git/ master
        object: 85a094f22f02c54c740448f6716da608a5e89a80

   to tell you to "git fetch" from the given git-reachable resource
   into some place and grab the object via "git cat-file", possibly
   streaming it out.  The details do not matter at this point in the
   design process.

   The smudgex helper is responsible for caching previously fetched
   large contents, maintaining association between the surrogate
   blob and its real data, so that once the real thing is
   downloaded, and the contents of the path needs to change to
   something else (e.g. user checks out a different branch) and
   then change to the previous thing again (e.g. user comes back to
   the original branch), it does not download it again.

 - When checking if the working tree is clean relative to the index,
   the smudgex/cleanx helper will be consulted.  It will be given
   the surrogate data in the index and the path in the working tree.
   We may want to allow the helper implementation to give a read-only
   hardlink directly into helper's cache storage, so that it can
   consult its database of surrogate-to-real mapping and perform
   this verification cheaply by inode comparison, or something.

 - When running "git add" a modified large stuff prepared in the
   working tree, cleanx helper is called to prepare a new surrogate,
   and that is what is registered in the index.  The helper is also
   responsible for storing the new large stuff away and arrange it
   to be retrievable when others see and use this surrogate.

The initial scope of supporting something like that in core-git
would be to add the necessary infrastracture to arrange such smudgex
and cleanx helpers are called when a path is marked as a surrogate
in the attribute system, and supply a sample helper.

  reply	other threads:[~2012-10-04 22:15 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-03 12:36 upload-pack is slow with lots of refs Ævar Arnfjörð Bjarmason
2012-10-03 13:06 ` Nguyen Thai Ngoc Duy
2012-10-03 18:03 ` Jeff King
2012-10-03 18:53   ` Junio C Hamano
2012-10-03 18:55     ` Jeff King
2012-10-03 19:41       ` Shawn Pearce
2012-10-03 20:13         ` Jeff King
2012-10-04 21:52           ` Sascha Cunz
2012-10-05  0:20             ` Jeff King
2012-10-05  6:24         ` Johannes Sixt
2012-10-05 16:57           ` Shawn Pearce
2012-10-08 15:05             ` Johannes Sixt
2012-10-09  6:46               ` Shawn Pearce
2012-10-09 20:30                 ` Johannes Sixt
2012-10-09 20:46                   ` Johannes Sixt
2012-10-03 20:16   ` Ævar Arnfjörð Bjarmason
2012-10-03 21:20     ` Jeff King
2012-10-03 22:15       ` Ævar Arnfjörð Bjarmason
2012-10-03 23:15         ` Jeff King
2012-10-03 23:54           ` Ævar Arnfjörð Bjarmason
2012-10-04  7:56             ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King
2012-10-04  7:58               ` [PATCH 1/4] peel_ref: use faster deref_tag_noverify Jeff King
2012-10-04 18:24                 ` Junio C Hamano
2012-10-04  8:00               ` [PATCH 2/4] peel_ref: do not return a null sha1 Jeff King
2012-10-04 18:32                 ` Junio C Hamano
2012-10-04  8:02               ` [PATCH 3/4] peel_ref: check object type before loading Jeff King
2012-10-04 19:06                 ` Junio C Hamano
2012-10-04 19:41                   ` Jeff King
2012-10-04 20:41                     ` Junio C Hamano [this message]
2012-10-04 21:59                       ` Jeff King
2012-10-04  8:03               ` [PATCH 4/4] upload-pack: use peel_ref for ref advertisements Jeff King
2012-10-04  8:04               ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King
2012-10-04  9:01                 ` Ævar Arnfjörð Bjarmason
2012-10-04 12:14                   ` Nazri Ramliy
2012-10-03 22:32   ` upload-pack is slow with lots of refs Ævar Arnfjörð Bjarmason
2012-10-03 23:21     ` Jeff King
2012-10-03 23:47       ` Ævar Arnfjörð Bjarmason
2012-10-03 19:13 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vd30yx94r.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --subject='Re: [PATCH 3/4] peel_ref: check object type before loading' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).