From: Mark Thomas <markbt@efaref.net>
To: git@vger.kernel.org
Cc: Mark Thomas <markbt@efaref.net>
Subject: [RFC 0/4] Shallow clones with on-demand fetch
Date: Sat, 4 Mar 2017 19:18:57 +0000 [thread overview]
Message-ID: <20170304191901.9622-1-markbt@efaref.net> (raw)
Hello everyone,
This is an RFC for an enhancement to shallow repositories to make them
behave more like full clones.
I was inspired a bit by Microsoft's announcement of their Git VFS. I
saw that people have talked in the past about making git fetch objects
from remotes as they are needed, and decided to give it a try.
The patch series adds a "--on-demand" option to git clone, which, when
used in conjunction with the existing shallow clone operations, clones
the full history of the repository's commits, but only the files that
would be included in the shallow clone.
When a file that is missing is required, git requests the file on-demand
from the remote, via a new 'upload-file' service.
Public git servers are unlikely to want to enable this, due to the
addition load it may cause, but within an organization's own network, it
will allow full access to the repository history without needing a full
initial clone.
The patch set is in four parts:
1:
Adds the "upload-file" command, which starts a new protocol
conversation with the client allowing it to request file info and
file contents. The connection is kept open so that the client
can make as many requests as it likes. The client terminates the
connection by sending a packet containing "end".
2:
Adds the ability for file info and content to be requested from
the remote if the file cannot be found in any pack, or loose in
the repository. Currently this only looks at the default remote,
but the intention is this would be configurable.
3:
Adds the "on-demand" capability to "upload-pack". When a client
requests this capability, "upload-pack" includes in the pack
all commits, even those that would normally be dropped by the
shallow clone.
4:
Adds the "--on-demand" option to clone, to request a shallow
clone.
This is a proof-of-concept, so it is in no way complete. It contains a
few hacks to make it work, but these can be ironed out with a bit more
work. What I have so far is sufficient to try out the idea. I'd like
to get people's opinions on it before I spend any more time working on
it, plus also I'm not very familiar with the git codebase, so some help
would be appreciated.
As an example, the Linux repository currently stands at 2.0GB of packed
data. A "git clone --shallow-since=2016-01-01 --on-demand" is only
561MB, and yet remains fully functional. A git blame on the Makefile,
for example, shows all changes to the file, right back to Linus's
original commit in 2005.
Still to do:
- Fix up the hacks and make everything work correctly.
- Make fetching of further updates work correctly.
- Store the retrieved files in an LRU cache, possibly with the option
of storing them in the main repo data, too.
- Add a gc/enshallow operation to make the repo shallower by forgetting
old files, or moving them to the LRU cache.
- Add configurable remote to fetch from.
- Documentation.
- Much more.
Please let me know what you think, and if an experienced git developer
would like to help out with finishing this, that would be even better.
Mark Thomas (4):
upload-file: Add upload-file command
on-demand: Fetch missing files from remote
upload-pack: Send all commits if client requests on-demand
clone: Request on-demand shallow clones
.gitignore | 1 +
Makefile | 3 +
builtin/clone.c | 7 +-
builtin/pack-objects.c | 26 ++++++-
cache-tree.c | 2 +-
cache.h | 3 +-
daemon.c | 6 ++
fetch-pack.c | 3 +
fetch-pack.h | 1 +
list-objects.c | 12 ++--
list-objects.h | 13 +++-
object.h | 1 +
on_demand.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++++
on_demand.h | 12 ++++
sha1_file.c | 8 ++-
shallow.c | 2 +-
transport.c | 3 +
transport.h | 4 ++
upload-file.c | 87 +++++++++++++++++++++++
upload-pack.c | 8 ++-
20 files changed, 370 insertions(+), 15 deletions(-)
create mode 100644 on_demand.c
create mode 100644 on_demand.h
create mode 100644 upload-file.c
--
2.7.4
next reply other threads:[~2017-03-04 19:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-04 19:18 Mark Thomas [this message]
2017-03-04 19:18 ` [RFC 1/4] upload-file: Add upload-file command Mark Thomas
2017-03-04 19:18 ` [RFC 2/4] on-demand: Fetch missing files from remote Mark Thomas
2017-03-04 19:19 ` [RFC 3/4] upload-pack: Send all commits if client requests on-demand Mark Thomas
2017-03-04 19:19 ` [RFC 4/4] clone: Request on-demand shallow clones Mark Thomas
2017-03-06 19:16 ` [RFC 0/4] Shallow clones with on-demand fetch Jonathan Tan
2017-03-06 20:01 ` Stefan Beller
2017-03-06 19:18 ` Junio C Hamano
2017-03-07 9:42 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170304191901.9622-1-markbt@efaref.net \
--to=markbt@efaref.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).