git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"René Scharfe" <l.s.r@web.de>,
	"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 6/7] fsck: use oidset for skiplist
Date: Mon, 27 Aug 2018 22:15:00 +0200	[thread overview]
Message-ID: <87lg8refcr.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20180827194323.17055-7-avarab@gmail.com>


On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:

> From: René Scharfe <l.s.r@web.de>
>
> Object IDs to skip are stored in a shared static oid_array.  Lookups do
> a binary search on the sorted array.  The code checks if the object IDs
> are already in the correct order while loading and skips sorting in that
> case.  Lookups are done before reporting a (non-fatal) corruption and
> before checking .gitmodules files.
>
> Simplify the code by using an oidset instead.  Memory usage is a bit
> higher, but we don't need to worry about any sort order anymore.  Embed
> the oidset into struct fsck_options to make its ownership clear (no
> hidden sharing) and avoid unnecessary pointer indirection.
>
> Performance on repositories with a low number of reported issues and
> .gitmodules files (i.e. the usual case) won't be affected much.  The
> oidset should be a bit quicker with higher numbers of bad objects in
> the skipList.

I didn't change this commit message at all, but FWIW this still has me
somewhat confused. What is the interaction with .gitmodules being
described here? You also mentioned it in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
(but I don't get that either)

Does that just mean that when cloning with --recursive with
transfer.fsckObjects=true we'll re-read the file for each "clone"
invocation, both for the main project and everything listed in
.gitmodules?

If so, I think something like this commit message would be clearer:

    fsck: use oidset instead of oid_array for skipList

    Change the implementation of the skipList feature to use oidset
    instead of oid_array to store SHA-1s for later lookup.

    This list is parsed once on startup by fsck, fetch-pack or
    receive-pack depending on the *.skipList config in use, so for fetch
    it's currently re-parsed for each submodule fetch.

    Memory usage is a bit higher, but we don't need to keep track of the
    sort order anymore. Embed the oidset into struct fsck_options to
    make its ownership clear (no hidden sharing) and avoid unnecessary
    pointer indirection.

Then let's just attach the test program you wrote in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
and note the results and let them speak for themselves.

I can do all that if that seems fine to you. I also notice that the test
case only sets up a case where all the items on the skip list are bad
commits in the repo, it would be interesting to test with a few needles
in a large haystack (I can modify it to do that...).

> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  Documentation/config.txt | 11 ++++++-----
>  fsck.c                   | 23 ++---------------------
>  fsck.h                   |  8 +++++---
>  3 files changed, 13 insertions(+), 29 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a8dfafa61d..3d0556e85d 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1729,11 +1729,12 @@ all three of them they must all set to the same values.
>  +
>  Older versions of Git (before 2.20) documented that the object names
>  list should be sorted. This was never a requirement, the object names
> -can appear in any order, but when reading the list we track whether
> -the list is sorted for the purposes of an internal binary search
> -implementation, which can save itself some work with an already sorted
> -list.  Unless you have a humongous list there's no reason to go out of
> -your way to pre-sort the list.
> +could appear in any order, but when reading the list we tracked whether
> +the list was sorted for the purposes of an internal binary search
> +implementation, which could save itself some work with an already sorted
> +list. Unless you had a humongous list there was no reason to go out of
> +your way to pre-sort the list. After Git version 2.20 a hash implementation
> +is used instead, so there's now no reason to pre-sort the list.
>
>  gc.aggressiveDepth::
>  	The depth parameter used in the delta compression
> diff --git a/fsck.c b/fsck.c
> index 972a26b9ba..4c643f1d40 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -10,7 +10,6 @@
>  #include "fsck.h"
>  #include "refs.h"
>  #include "utf8.h"
> -#include "sha1-array.h"
>  #include "decorate.h"
>  #include "oidset.h"
>  #include "packfile.h"
> @@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
>
>  static void init_skiplist(struct fsck_options *options, const char *path)
>  {
> -	static struct oid_array skiplist = OID_ARRAY_INIT;
> -	int sorted;
>  	FILE *fp;
>  	struct strbuf sb = STRBUF_INIT;
>  	struct object_id oid;
>
> -	if (options->skiplist)
> -		sorted = options->skiplist->sorted;
> -	else {
> -		sorted = 1;
> -		options->skiplist = &skiplist;
> -	}
> -
>  	fp = fopen(path, "r");
>  	if (!fp)
>  		die("Could not open skip list: %s", path);
> @@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, const char *path)
>  		const char *p;
>  		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
>  			die("Invalid SHA-1: %s", sb.buf);
> -		oid_array_append(&skiplist, &oid);
> -		if (sorted && skiplist.nr > 1 &&
> -				oidcmp(&skiplist.oid[skiplist.nr - 2],
> -				       &oid) > 0)
> -			sorted = 0;
> +		oidset_insert(&options->skiplist, &oid);
>  	}
>  	if (ferror(fp))
>  		die_errno("Could not read '%s'", path);
>  	fclose(fp);
>  	strbuf_release(&sb);
> -
> -	if (sorted)
> -		skiplist.sorted = 1;
>  }
>
>  static int parse_msg_type(const char *str)
> @@ -319,9 +302,7 @@ static void append_msg_id(struct strbuf *sb, const char *msg_id)
>
>  static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
>  {
> -	if (opts && opts->skiplist && obj)
> -		return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
> -	return 0;
> +	return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
>  }
>
>  __attribute__((format (printf, 4, 5)))
> diff --git a/fsck.h b/fsck.h
> index 0c7e8c9428..b95595ae5f 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -1,6 +1,8 @@
>  #ifndef GIT_FSCK_H
>  #define GIT_FSCK_H
>
> +#include "oidset.h"
> +
>  #define FSCK_ERROR 1
>  #define FSCK_WARN 2
>  #define FSCK_IGNORE 3
> @@ -35,12 +37,12 @@ struct fsck_options {
>  	fsck_error error_func;
>  	unsigned strict:1;
>  	int *msg_type;
> -	struct oid_array *skiplist;
> +	struct oidset skiplist;
>  	struct decoration *object_names;
>  };
>
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL }
> +#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> +#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
>
>  /* descend in all linked child objects
>   * the return value is:

  reply	other threads:[~2018-08-27 20:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 2/7] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 3/7] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 4/7] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason [this message]
2018-08-28  9:52     ` [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 1/8] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 2/8] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 3/8] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 4/8] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 5/8] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:59       ` René Scharfe
2018-09-03 14:49         ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 01/10] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 02/10] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 03/10] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 04/10] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 06/10] fsck: add a performance test Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 07/10] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 10/10] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:29     ` [PATCH v3 6/7] fsck: use oidset for skiplist René Scharfe
2018-08-27 19:43 ` [PATCH v3 7/7] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-27 19:46 ` [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lg8refcr.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).