From: Jeff Hostetler <git@jeffhostetler.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, ethomson@edwardthomson.com,
jonathantanmy@google.com, jrnieder@gmail.com,
jeffhost@microsoft.com
Subject: [PATCH v2 00/19] WIP object filtering for partial clone
Date: Thu, 13 Jul 2017 17:34:40 +0000 [thread overview]
Message-ID: <20170713173459.3559-1-git@jeffhostetler.com> (raw)
From: Jeff Hostetler <jeffhost@microsoft.com>
This WIP is a follow up to my earlier patch series to teach
pack-objects to omit large blobs from packfiles. [1]
Like the previous version, this version builds upon a suggestion from
Peff [2] to use the traverse_commit_list() machinery to allow custom
object filtering using a filter callback. This hides the filtering
logic in list-objects.c and list-objects-filters.c and minimizes the
changes to actual commands, such as pack-objects.
This version adds that same filtering capability to rev-list allowing
filtering to be demonstrated without building a packfile. Filtered
blobs are printed with a leading "~" (along with their sizes).
$ ./git rev-list --objects HEAD~1..HEAD
74f806c70507317b8bdbcf3b08459c7c83906bee
818617707aac81ae4620239182b514f65638e37e
d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
306c16551e548ace12c709a332bfea22adcc395f builtin/fetch.c
$ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest HEAD~1..HEAD
74f806c70507317b8bdbcf3b08459c7c83906bee
818617707aac81ae4620239182b514f65638e37e
d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
~306c16551e548ace12c709a332bfea22adcc395f 40732
$ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest --quiet HEAD~1..HEAD
~306c16551e548ace12c709a332bfea22adcc395f 40732
This version contains 3 filters:
1. filter-omit-all-blobs to exclude all blobs (trees and commits only).
2. filter-omit-large-blobs=<n>[kmg] to exclude blobs larger than <n>
(but always including ".git*" special files).
3. filter-use-sparse=<blob-ish> to exclude blobs not needed by the
corresponding sparse-checkout.
Sparse-checkout filtering is currently limited to filtering unneeded blobs.
A later enhancement should be able to also filter unneeded tree objects.
This version updates clone, fetch, fetch-pack, and upload-pack commands
to pass the additional object-filter parameters.
As a (possibly) temporary measure, some commands have been updated to
relax missing blob errors during consistency checks. Maintining info
on missing blobs is currently being discussed in [3].
TODO
1. Incorporate with a patch series like [4] to dynamically fetch a
missing blob from the server in read_object on demand.
2. Resolve missing blob consistency check issue.
3. Store filter options from clone in config or .git/info and default
to them in subsequent fetches.
4. fsck, gc, and assorted commands.
5. testing.
[1] https://public-inbox.org/git/20170622203615.34135-1-git@jeffhostetler.com/
[2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@sigill.intra.peff.net/
[3] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/
[4] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/
Jeff Hostetler (19):
dir: refactor add_excludes()
oidset2: create oidset subclass with object length and pathname
list-objects: filter objects in traverse_commit_list
list-objects-filters: add omit-all-blobs filter
list-objects-filters: add omit-large-blobs filter
list-objects-filters: add use-sparse-checkout filter
object-filter: common declarations for object filtering
rev-list: add object filtering support
rev-list: add filtering help text
t6112: rev-list object filtering test
pack-objects: add object filtering support
pack-objects: add filtering help text
upload-pack: add filter-objects to protocol documentation
upload-pack: add object filtering
fetch-pack: add object filtering support
connected: add filter_allow_omitted option to API
clone: add filter arguments
index-pack: relax consistency checks for omitted objects
fetch: add object filtering to fetch
Documentation/git-pack-objects.txt | 14 +
Documentation/git-rev-list.txt | 7 +-
Documentation/rev-list-options.txt | 26 ++
Documentation/technical/pack-protocol.txt | 16 +
Documentation/technical/protocol-capabilities.txt | 7 +
Makefile | 3 +
builtin/clone.c | 28 ++
builtin/fetch-pack.c | 3 +
builtin/fetch.c | 27 +-
builtin/index-pack.c | 15 +
builtin/pack-objects.c | 33 +-
builtin/rev-list.c | 58 +++-
connected.c | 3 +
connected.h | 6 +
dir.c | 53 +++-
dir.h | 4 +
fetch-pack.c | 28 ++
fetch-pack.h | 2 +
list-objects-filters.c | 361 ++++++++++++++++++++++
list-objects-filters.h | 45 +++
list-objects.c | 66 +++-
list-objects.h | 30 ++
object-filter.c | 201 ++++++++++++
object-filter.h | 145 +++++++++
oidset2.c | 101 ++++++
oidset2.h | 56 ++++
t/t6112-rev-list-filters-objects.sh | 37 +++
transport.c | 27 ++
transport.h | 8 +
upload-pack.c | 39 ++-
30 files changed, 1425 insertions(+), 24 deletions(-)
create mode 100644 list-objects-filters.c
create mode 100644 list-objects-filters.h
create mode 100644 object-filter.c
create mode 100644 object-filter.h
create mode 100644 oidset2.c
create mode 100644 oidset2.h
create mode 100644 t/t6112-rev-list-filters-objects.sh
--
2.9.3
next reply other threads:[~2017-07-13 17:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-13 17:34 Jeff Hostetler [this message]
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170713173459.3559-1-git@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=ethomson@edwardthomson.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).