From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
Jonathan Nieder <jrnieder@gmail.com>,
Jeff Hostetler <git@jeffhostetler.com>,
Ben Peart <peartben@gmail.com>,
Philip Oakley <philipoakley@iee.org>
Subject: Re: [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects
Date: Thu, 20 Jul 2017 11:07:29 -0700 [thread overview]
Message-ID: <CAGZ79kZ45Z3R9FbsKQwotXh_Ek=StUdSTHj4er6HMhx79VbV4g@mail.gmail.com> (raw)
In-Reply-To: <3420d9ae9ef86b78af1abe721891233e3f5865a2.1500508695.git.jonathantanmy@google.com>
On Wed, Jul 19, 2017 at 5:21 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> Currently, Git does not support repos with very large numbers of objects
> or repos that wish to minimize manipulation of certain blobs (for
> example, because they are very large) very well, even if the user
> operates mostly on part of the repo, because Git is designed on the
> assumption that every referenced object is available somewhere in the
> repo storage.
>
> As a first step to reducing this problem, introduce the concept of
> promised objects. Each Git repo can contain a list of promised objects
> and their sizes (if blobs) at $GIT_DIR/objects/promised. This patch
> contains functions to query them; functions for creating and modifying
> that file will be introduced in later patches.
>
> A repository that is missing an object but has that object promised is not
> considered to be in error, so also teach fsck this. As part of doing
> this, object.{h,c} has been modified to generate "struct object" based
> on only the information available to promised objects, without requiring
> the object itself.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Documentation/technical/repository-version.txt | 6 ++
> Makefile | 1 +
> builtin/fsck.c | 18 +++-
> cache.h | 2 +
> environment.c | 1 +
> fsck.c | 6 +-
> object.c | 19 ++++
> object.h | 19 ++++
> promised-object.c | 130 +++++++++++++++++++++++++
> promised-object.h | 22 +++++
> setup.c | 7 +-
> t/t3907-promised-object.sh | 41 ++++++++
> t/test-lib-functions.sh | 6 ++
> 13 files changed, 273 insertions(+), 5 deletions(-)
> create mode 100644 promised-object.c
> create mode 100644 promised-object.h
> create mode 100755 t/t3907-promised-object.sh
>
> diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
> index 00ad37986..f8b82c1c7 100644
> --- a/Documentation/technical/repository-version.txt
> +++ b/Documentation/technical/repository-version.txt
> @@ -86,3 +86,9 @@ for testing format-1 compatibility.
> When the config key `extensions.preciousObjects` is set to `true`,
> objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
> `git repack -d`).
> +
> +`promisedObjects`
> +~~~~~~~~~~~~~~~~~
> +
> +(Explain this - basically a string containing a command to be run
> +whenever a missing object needs to be fetched.)
> diff --git a/Makefile b/Makefile
> index 9c9c42f8f..c1446d5ef 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -828,6 +828,7 @@ LIB_OBJS += preload-index.o
> LIB_OBJS += pretty.o
> LIB_OBJS += prio-queue.o
> LIB_OBJS += progress.o
> +LIB_OBJS += promised-object.o
> LIB_OBJS += prompt.o
> LIB_OBJS += quote.o
> LIB_OBJS += reachable.o
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 462b8643b..49e21f361 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -15,6 +15,7 @@
> #include "progress.h"
> #include "streaming.h"
> #include "decorate.h"
> +#include "promised-object.h"
>
> #define REACHABLE 0x0001
> #define SEEN 0x0002
> @@ -44,6 +45,7 @@ static int name_objects;
> #define ERROR_REACHABLE 02
> #define ERROR_PACK 04
> #define ERROR_REFS 010
> +#define ERROR_PROMISED_OBJECT 011
>
> static const char *describe_object(struct object *obj)
> {
> @@ -436,7 +438,7 @@ static int fsck_handle_ref(const char *refname, const struct object_id *oid,
> {
> struct object *obj;
>
> - obj = parse_object(oid);
> + obj = parse_or_promise_object(oid);
> if (!obj) {
> error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
> errors_found |= ERROR_REACHABLE;
> @@ -592,7 +594,7 @@ static int fsck_cache_tree(struct cache_tree *it)
> fprintf(stderr, "Checking cache tree\n");
>
> if (0 <= it->entry_count) {
> - struct object *obj = parse_object(&it->oid);
> + struct object *obj = parse_or_promise_object(&it->oid);
> if (!obj) {
> error("%s: invalid sha1 pointer in cache-tree",
> oid_to_hex(&it->oid));
> @@ -635,6 +637,12 @@ static int mark_packed_for_connectivity(const struct object_id *oid,
> return 0;
> }
>
> +static int mark_have_promised_object(const struct object_id *oid, void *data)
> +{
> + mark_object_for_connectivity(oid);
> + return 0;
> +}
> +
> static char const * const fsck_usage[] = {
> N_("git fsck [<options>] [<object>...]"),
> NULL
> @@ -690,6 +698,11 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>
> git_config(fsck_config, NULL);
>
> + if (fsck_promised_objects()) {
> + error("Errors found in promised object list");
> + errors_found |= ERROR_PROMISED_OBJECT;
> + }
This got me thinking: It is an error if we do not have an object
and also do not promise it, but what about the other case:
having and object and promising it, too?
That is not harmful to the operation, except that the promise
machinery may be slower due to its size. (Should that be a soft
warning then? Do we have warnings in fsck?)
> * The object type is stored in 3 bits.
> */
We may want to remove this comment while we're here as it
sounds stale despite being technically correct.
1974632c66 (Remove TYPE_* constant macros and use
object_type enums consistently., 2006-07-11)
> struct object {
> + /*
> + * Set if this object is parsed. If set, "type" is populated and this
> + * object can be casted to "struct commit" or an equivalent.
> + */
> unsigned parsed : 1;
> + /*
> + * Set if this object is not in the repo but is promised. If set,
> + * "type" is populated, but this object cannot be casted to "struct
> + * commit" or an equivalent.
> + */
> + unsigned promised : 1;
Would it make sense to have a bit field instead:
#define STATE_BITS 2
#define STATE_PARSED (1<<0)
#define STATE_PROMISED (1<<1)
unsigned state : STATE_BITS
This would be similar to the types and flags?
> +test_expect_success 'fsck fails on missing objects' '
> + test_create_repo repo &&
> +
> + test_commit -C repo 1 &&
> + test_commit -C repo 2 &&
> + test_commit -C repo 3 &&
> + git -C repo tag -a annotated_tag -m "annotated tag" &&
> + C=$(git -C repo rev-parse 1) &&
> + T=$(git -C repo rev-parse 2^{tree}) &&
> + B=$(git hash-object repo/3.t) &&
> + AT=$(git -C repo rev-parse annotated_tag) &&
> +
> + # missing commit, tree, blob, and tag
> + rm repo/.git/objects/$(echo $C | cut -c1-2)/$(echo $C | cut -c3-40) &&
> + rm repo/.git/objects/$(echo $T | cut -c1-2)/$(echo $T | cut -c3-40) &&
> + rm repo/.git/objects/$(echo $B | cut -c1-2)/$(echo $B | cut -c3-40) &&
> + rm repo/.git/objects/$(echo $AT | cut -c1-2)/$(echo $AT | cut -c3-40) &&
This is a pretty cool test as it promises all sorts of objects
from different parts of the graph.
next prev parent reply other threads:[~2017-07-20 18:07 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-11 19:48 [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Jonathan Tan
2017-07-11 19:48 ` [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs Jonathan Tan
2017-07-11 22:02 ` Stefan Beller
2017-07-19 23:37 ` Jonathan Tan
2017-07-12 17:29 ` Jeff Hostetler
2017-07-12 19:28 ` Jonathan Nieder
2017-07-13 14:48 ` Jeff Hostetler
2017-07-13 15:05 ` Jeff Hostetler
2017-07-13 19:39 ` Jonathan Tan
2017-07-14 20:03 ` Jeff Hostetler
2017-07-14 21:30 ` Jonathan Nieder
2017-07-11 19:48 ` [RFC PATCH 2/3] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-11 22:06 ` Stefan Beller
2017-07-19 23:56 ` Jonathan Tan
2017-07-20 0:06 ` Stefan Beller
2017-07-11 19:48 ` [RFC PATCH 3/3] sha1_file: add promised blob hook support Jonathan Tan
2017-07-11 22:38 ` Stefan Beller
2017-07-12 17:40 ` Ben Peart
2017-07-12 20:38 ` Jonathan Nieder
2017-07-16 15:23 ` [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Philip Oakley
2017-07-17 17:43 ` Ben Peart
2017-07-25 20:48 ` Philip Oakley
2017-07-17 18:03 ` Jonathan Nieder
2017-07-29 12:51 ` Philip Oakley
2017-07-20 0:21 ` [RFC PATCH v2 0/4] Partial clone: promised objects (not only blobs) Jonathan Tan
2017-07-20 0:21 ` [RFC PATCH v2 1/4] object: remove "used" field from struct object Jonathan Tan
2017-07-20 0:36 ` Stefan Beller
2017-07-20 0:55 ` Jonathan Tan
2017-07-20 17:44 ` Ben Peart
2017-07-20 21:20 ` Junio C Hamano
2017-07-20 0:21 ` [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects Jonathan Tan
2017-07-20 18:07 ` Stefan Beller [this message]
2017-07-20 19:17 ` Jonathan Tan
2017-07-20 19:58 ` Ben Peart
2017-07-20 21:13 ` Jonathan Tan
2017-07-21 16:24 ` Ben Peart
2017-07-21 20:33 ` Jonathan Tan
2017-07-25 15:10 ` Ben Peart
2017-07-29 13:26 ` Philip Oakley
2017-07-20 0:21 ` [RFC PATCH v2 3/4] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-20 0:21 ` [RFC PATCH v2 4/4] sha1_file: support promised object hook Jonathan Tan
2017-07-20 18:23 ` Stefan Beller
2017-07-20 20:58 ` Ben Peart
2017-07-20 21:18 ` Jonathan Tan
2017-07-21 16:27 ` Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGZ79kZ45Z3R9FbsKQwotXh_Ek=StUdSTHj4er6HMhx79VbV4g@mail.gmail.com' \
--to=sbeller@google.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=peartben@gmail.com \
--cc=philipoakley@iee.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).