git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/13] RFC object filtering for parital clone
@ 2017-09-22 20:26 Jeff Hostetler
  2017-09-22 20:26 ` [PATCH 01/13] dir: refactor add_excludes() Jeff Hostetler
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-09-22 20:26 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>


This patch series contains WIP code demonstrating object (blob) filtering
in rev-list and pack-objects using a common filtering API in
list-objects and traverse-commit-list that allows both commands
to perform the same type of filter operations.  And serve as the
basis of partial-clone and partial-fetch.

This draft contains filters to:
() omit all blobs
() omit blobs larger than some size
() omit blobs using a sparse-checkout specification

In addition to specifying the filter criteria, the rev-list command
was updated to include options to:
() print a list of the omitted objects (due to the current filtering
   criteria)
() print a list of missing objects (probably from a prior partial
   clone/fetch).

This latter print option can be used with or without a new filter
criteria allowing it to be used with a pre-checkout bulk pre-fetch
command.

For example, if blobs were omitted during the clone or a fetch, the
client can do:

   git rev-list --quiet --objects --filter-print-missing NEWBRANCH

and get a list of just the objects that are required to checkout
NEWBRANCH.

Or if a sparse-checkout is in effect, the client can specify the
same criteria to look for just the missing blobs needed to do the
sparse-checkout:

   git rev-list --quiet --objects --filter-print-missing \
       --filter-use-path=./git/info/sparse-checkout NEWBRANCH

It does not matter why a blob is missing; that is, what filter
criteria was used during the clone or fetch.  All that matters
is the blob is missing and is now needed.

These commands output a list of missing blobs that can be fed
into a bulk fetch object request.  The goal here is to minimize
the need for dynamic object fetch mechanisms currently being
discussed.  (We cannot eliminate the need for dynamic fetching,
but we can use this to precompute/prefetch in bulk.)

Pack-objects was updated to allow the server to build incomplete
packfiles without unwanted blobs.

This is the first step to support partial-clone and -fetch. I've
omitted from this patch series corresponding changes to fetch-pack,
upload-pack, index-pack, verify-pack, fsck, gc, and the git protocol.
I can make these available if there is interest.  I omit them from
this RFC to not distract from the basic filtering ideas.

It also does not address the promisor/promised ideas currently
being discussed [2,3].  These should be considered independently.

The code in this patch series can be seen here [1].

[1] https://github.com/jeffhostetler/git/pull/3
[2] https://public-inbox.org/git/xmqq8thbqlqf.fsf@gitster.mtv.corp.google.com/t/
[3] https://github.com/jonathantanmy/git/commits/partialclone2


Jeff Hostetler (13):
  dir: refactor add_excludes()
  oidset2: create oidset subclass with object length and pathname
  list-objects: filter objects in traverse_commit_list
  list-objects-filter-all: add filter to omit all blobs
  list-objects-filter-large: add large blob filter to list-objects
  list-objects-filter-sparse: add sparse-checkout based filter
  object-filter: common declarations for object filtering
  list-objects: add traverse_commit_list_filtered method
  rev-list: add object filtering support
  rev-list: add filtering help text
  t6112: rev-list object filtering test
  pack-objects: add object filtering support
  pack-objects: add filtering help text

 Documentation/git-pack-objects.txt  |  17 +++
 Documentation/git-rev-list.txt      |   9 +-
 Documentation/rev-list-options.txt  |  32 +++++
 Makefile                            |   5 +
 builtin/pack-objects.c              |  24 +++-
 builtin/rev-list.c                  |  73 +++++++++-
 dir.c                               |  53 ++++++-
 dir.h                               |   4 +
 list-objects-filter-all.c           |  85 ++++++++++++
 list-objects-filter-all.h           |  18 +++
 list-objects-filter-large.c         | 108 +++++++++++++++
 list-objects-filter-large.h         |  18 +++
 list-objects-filter-sparse.c        | 221 +++++++++++++++++++++++++++++
 list-objects-filter-sparse.h        |  30 ++++
 list-objects.c                      | 100 +++++++++++---
 list-objects.h                      |  41 ++++++
 object-filter.c                     | 269 ++++++++++++++++++++++++++++++++++++
 object-filter.h                     | 173 +++++++++++++++++++++++
 oidset2.c                           | 104 ++++++++++++++
 oidset2.h                           |  58 ++++++++
 t/t6112-rev-list-filters-objects.sh | 237 +++++++++++++++++++++++++++++++
 21 files changed, 1657 insertions(+), 22 deletions(-)
 create mode 100644 list-objects-filter-all.c
 create mode 100644 list-objects-filter-all.h
 create mode 100644 list-objects-filter-large.c
 create mode 100644 list-objects-filter-large.h
 create mode 100644 list-objects-filter-sparse.c
 create mode 100644 list-objects-filter-sparse.h
 create mode 100644 object-filter.c
 create mode 100644 object-filter.h
 create mode 100644 oidset2.c
 create mode 100644 oidset2.h
 create mode 100755 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH 00/13] WIP Partial clone part 1: object filtering
@ 2017-10-24 18:53 Jeff Hostetler
  2017-10-24 18:53 ` [PATCH 03/13] list-objects: filter objects in traverse_commit_list Jeff Hostetler
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Hostetler @ 2017-10-24 18:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

I've been working with Jonathan Tan to combine our partial clone
proposals.  This patch series represents a first step in that effort
and introduces an object filtering mechanism to select unwanted
objects.

[1] traverse_commit_list and list-objects is extended to allow
    various filters.
[2] rev-list is extended to expose filtering.  This allows testing
    of the filtering options.  And can be used later to predict
    missing objects before commands like checkout or merge.
[3] pack-objects is extended to use filtering parameters and build
    packfiles that omit unwanted objects.

This patch series lays the ground work for subsequent parts which
will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.


Jeff Hostetler (13):
  dir: allow exclusions from blob in addition to file
  list-objects-filter-map: extend oidmap to collect omitted objects
  list-objects: filter objects in traverse_commit_list
  list-objects-filter-blobs-none: add filter to omit all blobs
  list-objects-filter-blobs-limit: add large blob filtering
  list-objects-filter-sparse: add sparse filter
  list-objects-filter-options: common argument parsing
  list-objects: add traverse_commit_list_filtered method
  extension.partialclone: introduce partial clone extension
  rev-list: add list-objects filtering support
  t6112: rev-list object filtering test
  pack-objects: add list-objects filtering
  t5317: pack-objects object filtering test

 Documentation/git-pack-objects.txt             |   8 +-
 Documentation/git-rev-list.txt                 |   5 +-
 Documentation/rev-list-options.txt             |  30 ++
 Documentation/technical/repository-version.txt |  22 ++
 Makefile                                       |   6 +
 builtin/pack-objects.c                         |  18 +-
 builtin/rev-list.c                             |  84 +++++-
 cache.h                                        |   4 +
 config.h                                       |   3 +
 dir.c                                          |  51 +++-
 dir.h                                          |   3 +
 environment.c                                  |   2 +
 list-objects-filter-blobs-limit.c              | 146 ++++++++++
 list-objects-filter-blobs-limit.h              |  18 ++
 list-objects-filter-blobs-none.c               |  83 ++++++
 list-objects-filter-blobs-none.h               |  18 ++
 list-objects-filter-map.c                      |  63 ++++
 list-objects-filter-map.h                      |  26 ++
 list-objects-filter-options.c                  | 101 +++++++
 list-objects-filter-options.h                  |  50 ++++
 list-objects-filter-sparse.c                   | 241 ++++++++++++++++
 list-objects-filter-sparse.h                   |  30 ++
 list-objects.c                                 | 111 +++++--
 list-objects.h                                 |  43 ++-
 partial-clone-utils.c                          |  99 +++++++
 partial-clone-utils.h                          |  34 +++
 setup.c                                        |  15 +
 t/t5317-pack-objects-filter-objects.sh         | 384 +++++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh            | 223 ++++++++++++++
 29 files changed, 1897 insertions(+), 24 deletions(-)
 create mode 100644 list-objects-filter-blobs-limit.c
 create mode 100644 list-objects-filter-blobs-limit.h
 create mode 100644 list-objects-filter-blobs-none.c
 create mode 100644 list-objects-filter-blobs-none.h
 create mode 100644 list-objects-filter-map.c
 create mode 100644 list-objects-filter-map.h
 create mode 100644 list-objects-filter-options.c
 create mode 100644 list-objects-filter-options.h
 create mode 100644 list-objects-filter-sparse.c
 create mode 100644 list-objects-filter-sparse.h
 create mode 100644 partial-clone-utils.c
 create mode 100644 partial-clone-utils.h
 create mode 100755 t/t5317-pack-objects-filter-objects.sh
 create mode 100755 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-10-25 19:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22 20:26 [PATCH 00/13] RFC object filtering for parital clone Jeff Hostetler
2017-09-22 20:26 ` [PATCH 01/13] dir: refactor add_excludes() Jeff Hostetler
2017-09-22 20:26 ` [PATCH 02/13] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-09-22 20:42   ` Brandon Williams
2017-09-26 22:20   ` Jonathan Tan
2017-09-27 14:47     ` Jeff Hostetler
2017-09-22 20:26 ` [PATCH 03/13] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-09-26 22:31   ` Jonathan Tan
2017-09-27 17:04     ` Jeff Hostetler
2017-09-27 18:00       ` Jonathan Tan
2017-09-27 19:09         ` Jeff Hostetler
2017-09-27 20:49           ` Jonathan Tan
2017-09-22 20:26 ` [PATCH 04/13] list-objects-filter-all: add filter to omit all blobs Jeff Hostetler
2017-09-23  0:39 ` [PATCH 00/13] RFC object filtering for parital clone Jonathan Tan
2017-09-26 14:55   ` Jeff Hostetler
2017-09-26 19:23     ` Jeff Hostetler
  -- strict thread matches above, loose matches on Subject: below --
2017-10-24 18:53 [PATCH 00/13] WIP Partial clone part 1: object filtering Jeff Hostetler
2017-10-24 18:53 ` [PATCH 03/13] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-10-25  4:05   ` Jonathan Tan
2017-10-25 19:25     ` Jeff Hostetler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).