git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Jeff Hostetler <git@jeffhostetler.com>,
	Ben Peart <peartben@gmail.com>,
	Philip Oakley <philipoakley@iee.org>
Subject: Re: [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects
Date: Thu, 20 Jul 2017 11:07:29 -0700	[thread overview]
Message-ID: <CAGZ79kZ45Z3R9FbsKQwotXh_Ek=StUdSTHj4er6HMhx79VbV4g@mail.gmail.com> (raw)
In-Reply-To: <3420d9ae9ef86b78af1abe721891233e3f5865a2.1500508695.git.jonathantanmy@google.com>

On Wed, Jul 19, 2017 at 5:21 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> Currently, Git does not support repos with very large numbers of objects
> or repos that wish to minimize manipulation of certain blobs (for
> example, because they are very large) very well, even if the user
> operates mostly on part of the repo, because Git is designed on the
> assumption that every referenced object is available somewhere in the
> repo storage.
>
> As a first step to reducing this problem, introduce the concept of
> promised objects. Each Git repo can contain a list of promised objects
> and their sizes (if blobs) at $GIT_DIR/objects/promised. This patch
> contains functions to query them; functions for creating and modifying
> that file will be introduced in later patches.
>
> A repository that is missing an object but has that object promised is not
> considered to be in error, so also teach fsck this. As part of doing
> this, object.{h,c} has been modified to generate "struct object" based
> on only the information available to promised objects, without requiring
> the object itself.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Documentation/technical/repository-version.txt |   6 ++
>  Makefile                                       |   1 +
>  builtin/fsck.c                                 |  18 +++-
>  cache.h                                        |   2 +
>  environment.c                                  |   1 +
>  fsck.c                                         |   6 +-
>  object.c                                       |  19 ++++
>  object.h                                       |  19 ++++
>  promised-object.c                              | 130 +++++++++++++++++++++++++
>  promised-object.h                              |  22 +++++
>  setup.c                                        |   7 +-
>  t/t3907-promised-object.sh                     |  41 ++++++++
>  t/test-lib-functions.sh                        |   6 ++
>  13 files changed, 273 insertions(+), 5 deletions(-)
>  create mode 100644 promised-object.c
>  create mode 100644 promised-object.h
>  create mode 100755 t/t3907-promised-object.sh
>
> diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
> index 00ad37986..f8b82c1c7 100644
> --- a/Documentation/technical/repository-version.txt
> +++ b/Documentation/technical/repository-version.txt
> @@ -86,3 +86,9 @@ for testing format-1 compatibility.
>  When the config key `extensions.preciousObjects` is set to `true`,
>  objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
>  `git repack -d`).
> +
> +`promisedObjects`
> +~~~~~~~~~~~~~~~~~
> +
> +(Explain this - basically a string containing a command to be run
> +whenever a missing object needs to be fetched.)
> diff --git a/Makefile b/Makefile
> index 9c9c42f8f..c1446d5ef 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -828,6 +828,7 @@ LIB_OBJS += preload-index.o
>  LIB_OBJS += pretty.o
>  LIB_OBJS += prio-queue.o
>  LIB_OBJS += progress.o
> +LIB_OBJS += promised-object.o
>  LIB_OBJS += prompt.o
>  LIB_OBJS += quote.o
>  LIB_OBJS += reachable.o
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 462b8643b..49e21f361 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -15,6 +15,7 @@
>  #include "progress.h"
>  #include "streaming.h"
>  #include "decorate.h"
> +#include "promised-object.h"
>
>  #define REACHABLE 0x0001
>  #define SEEN      0x0002
> @@ -44,6 +45,7 @@ static int name_objects;
>  #define ERROR_REACHABLE 02
>  #define ERROR_PACK 04
>  #define ERROR_REFS 010
> +#define ERROR_PROMISED_OBJECT 011
>
>  static const char *describe_object(struct object *obj)
>  {
> @@ -436,7 +438,7 @@ static int fsck_handle_ref(const char *refname, const struct object_id *oid,
>  {
>         struct object *obj;
>
> -       obj = parse_object(oid);
> +       obj = parse_or_promise_object(oid);
>         if (!obj) {
>                 error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
>                 errors_found |= ERROR_REACHABLE;
> @@ -592,7 +594,7 @@ static int fsck_cache_tree(struct cache_tree *it)
>                 fprintf(stderr, "Checking cache tree\n");
>
>         if (0 <= it->entry_count) {
> -               struct object *obj = parse_object(&it->oid);
> +               struct object *obj = parse_or_promise_object(&it->oid);
>                 if (!obj) {
>                         error("%s: invalid sha1 pointer in cache-tree",
>                               oid_to_hex(&it->oid));
> @@ -635,6 +637,12 @@ static int mark_packed_for_connectivity(const struct object_id *oid,
>         return 0;
>  }
>
> +static int mark_have_promised_object(const struct object_id *oid, void *data)
> +{
> +       mark_object_for_connectivity(oid);
> +       return 0;
> +}
> +
>  static char const * const fsck_usage[] = {
>         N_("git fsck [<options>] [<object>...]"),
>         NULL
> @@ -690,6 +698,11 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>
>         git_config(fsck_config, NULL);
>
> +       if (fsck_promised_objects()) {
> +               error("Errors found in promised object list");
> +               errors_found |= ERROR_PROMISED_OBJECT;
> +       }

This got me thinking: It is an error if we do not have an object
and also do not promise it, but what about the other case:
having and object and promising it, too?
That is not harmful to the operation, except that the promise
machinery may be slower due to its size. (Should that be a soft
warning then? Do we have warnings in fsck?)

>   * The object type is stored in 3 bits.
>   */

We may want to remove this comment while we're here as it
sounds stale despite being technically correct.
1974632c66 (Remove TYPE_* constant macros and use
object_type enums consistently., 2006-07-11)

>  struct object {
> +       /*
> +        * Set if this object is parsed. If set, "type" is populated and this
> +        * object can be casted to "struct commit" or an equivalent.
> +        */
>         unsigned parsed : 1;
> +       /*
> +        * Set if this object is not in the repo but is promised. If set,
> +        * "type" is populated, but this object cannot be casted to "struct
> +        * commit" or an equivalent.
> +        */
> +       unsigned promised : 1;

Would it make sense to have a bit field instead:

#define STATE_BITS 2
#define STATE_PARSED (1<<0)
#define STATE_PROMISED (1<<1)

    unsigned state : STATE_BITS

This would be similar to the types and flags?


> +test_expect_success 'fsck fails on missing objects' '
> +       test_create_repo repo &&
> +
> +       test_commit -C repo 1 &&
> +       test_commit -C repo 2 &&
> +       test_commit -C repo 3 &&
> +       git -C repo tag -a annotated_tag -m "annotated tag" &&
> +       C=$(git -C repo rev-parse 1) &&
> +       T=$(git -C repo rev-parse 2^{tree}) &&
> +       B=$(git hash-object repo/3.t) &&
> +       AT=$(git -C repo rev-parse annotated_tag) &&
> +
> +       # missing commit, tree, blob, and tag
> +       rm repo/.git/objects/$(echo $C | cut -c1-2)/$(echo $C | cut -c3-40) &&
> +       rm repo/.git/objects/$(echo $T | cut -c1-2)/$(echo $T | cut -c3-40) &&
> +       rm repo/.git/objects/$(echo $B | cut -c1-2)/$(echo $B | cut -c3-40) &&
> +       rm repo/.git/objects/$(echo $AT | cut -c1-2)/$(echo $AT | cut -c3-40) &&

This is a pretty cool test as it promises all sorts of objects
from different parts of the graph.

  reply	other threads:[~2017-07-20 18:07 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11 19:48 [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Jonathan Tan
2017-07-11 19:48 ` [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs Jonathan Tan
2017-07-11 22:02   ` Stefan Beller
2017-07-19 23:37     ` Jonathan Tan
2017-07-12 17:29   ` Jeff Hostetler
2017-07-12 19:28     ` Jonathan Nieder
2017-07-13 14:48       ` Jeff Hostetler
2017-07-13 15:05         ` Jeff Hostetler
2017-07-13 19:39     ` Jonathan Tan
2017-07-14 20:03       ` Jeff Hostetler
2017-07-14 21:30         ` Jonathan Nieder
2017-07-11 19:48 ` [RFC PATCH 2/3] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-11 22:06   ` Stefan Beller
2017-07-19 23:56     ` Jonathan Tan
2017-07-20  0:06       ` Stefan Beller
2017-07-11 19:48 ` [RFC PATCH 3/3] sha1_file: add promised blob hook support Jonathan Tan
2017-07-11 22:38   ` Stefan Beller
2017-07-12 17:40   ` Ben Peart
2017-07-12 20:38     ` Jonathan Nieder
2017-07-16 15:23 ` [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Philip Oakley
2017-07-17 17:43   ` Ben Peart
2017-07-25 20:48     ` Philip Oakley
2017-07-17 18:03   ` Jonathan Nieder
2017-07-29 12:51     ` Philip Oakley
2017-07-20  0:21 ` [RFC PATCH v2 0/4] Partial clone: promised objects (not only blobs) Jonathan Tan
2017-07-20  0:21 ` [RFC PATCH v2 1/4] object: remove "used" field from struct object Jonathan Tan
2017-07-20  0:36   ` Stefan Beller
2017-07-20  0:55     ` Jonathan Tan
2017-07-20 17:44       ` Ben Peart
2017-07-20 21:20   ` Junio C Hamano
2017-07-20  0:21 ` [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects Jonathan Tan
2017-07-20 18:07   ` Stefan Beller [this message]
2017-07-20 19:17     ` Jonathan Tan
2017-07-20 19:58   ` Ben Peart
2017-07-20 21:13     ` Jonathan Tan
2017-07-21 16:24       ` Ben Peart
2017-07-21 20:33         ` Jonathan Tan
2017-07-25 15:10           ` Ben Peart
2017-07-29 13:26             ` Philip Oakley
2017-07-20  0:21 ` [RFC PATCH v2 3/4] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-20  0:21 ` [RFC PATCH v2 4/4] sha1_file: support promised object hook Jonathan Tan
2017-07-20 18:23   ` Stefan Beller
2017-07-20 20:58     ` Ben Peart
2017-07-20 21:18       ` Jonathan Tan
2017-07-21 16:27         ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kZ45Z3R9FbsKQwotXh_Ek=StUdSTHj4er6HMhx79VbV4g@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peartben@gmail.com \
    --cc=philipoakley@iee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).