From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"René Scharfe" <l.s.r@web.de>,
"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>,
"Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 6/7] fsck: use oidset for skiplist
Date: Mon, 27 Aug 2018 22:15:00 +0200 [thread overview]
Message-ID: <87lg8refcr.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20180827194323.17055-7-avarab@gmail.com>
On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:
> From: René Scharfe <l.s.r@web.de>
>
> Object IDs to skip are stored in a shared static oid_array. Lookups do
> a binary search on the sorted array. The code checks if the object IDs
> are already in the correct order while loading and skips sorting in that
> case. Lookups are done before reporting a (non-fatal) corruption and
> before checking .gitmodules files.
>
> Simplify the code by using an oidset instead. Memory usage is a bit
> higher, but we don't need to worry about any sort order anymore. Embed
> the oidset into struct fsck_options to make its ownership clear (no
> hidden sharing) and avoid unnecessary pointer indirection.
>
> Performance on repositories with a low number of reported issues and
> .gitmodules files (i.e. the usual case) won't be affected much. The
> oidset should be a bit quicker with higher numbers of bad objects in
> the skipList.
I didn't change this commit message at all, but FWIW this still has me
somewhat confused. What is the interaction with .gitmodules being
described here? You also mentioned it in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
(but I don't get that either)
Does that just mean that when cloning with --recursive with
transfer.fsckObjects=true we'll re-read the file for each "clone"
invocation, both for the main project and everything listed in
.gitmodules?
If so, I think something like this commit message would be clearer:
fsck: use oidset instead of oid_array for skipList
Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.
This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use, so for fetch
it's currently re-parsed for each submodule fetch.
Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to
make its ownership clear (no hidden sharing) and avoid unnecessary
pointer indirection.
Then let's just attach the test program you wrote in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
and note the results and let them speak for themselves.
I can do all that if that seems fine to you. I also notice that the test
case only sets up a case where all the items on the skip list are bad
commits in the repo, it would be interesting to test with a few needles
in a large haystack (I can modify it to do that...).
> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> Documentation/config.txt | 11 ++++++-----
> fsck.c | 23 ++---------------------
> fsck.h | 8 +++++---
> 3 files changed, 13 insertions(+), 29 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a8dfafa61d..3d0556e85d 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1729,11 +1729,12 @@ all three of them they must all set to the same values.
> +
> Older versions of Git (before 2.20) documented that the object names
> list should be sorted. This was never a requirement, the object names
> -can appear in any order, but when reading the list we track whether
> -the list is sorted for the purposes of an internal binary search
> -implementation, which can save itself some work with an already sorted
> -list. Unless you have a humongous list there's no reason to go out of
> -your way to pre-sort the list.
> +could appear in any order, but when reading the list we tracked whether
> +the list was sorted for the purposes of an internal binary search
> +implementation, which could save itself some work with an already sorted
> +list. Unless you had a humongous list there was no reason to go out of
> +your way to pre-sort the list. After Git version 2.20 a hash implementation
> +is used instead, so there's now no reason to pre-sort the list.
>
> gc.aggressiveDepth::
> The depth parameter used in the delta compression
> diff --git a/fsck.c b/fsck.c
> index 972a26b9ba..4c643f1d40 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -10,7 +10,6 @@
> #include "fsck.h"
> #include "refs.h"
> #include "utf8.h"
> -#include "sha1-array.h"
> #include "decorate.h"
> #include "oidset.h"
> #include "packfile.h"
> @@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
>
> static void init_skiplist(struct fsck_options *options, const char *path)
> {
> - static struct oid_array skiplist = OID_ARRAY_INIT;
> - int sorted;
> FILE *fp;
> struct strbuf sb = STRBUF_INIT;
> struct object_id oid;
>
> - if (options->skiplist)
> - sorted = options->skiplist->sorted;
> - else {
> - sorted = 1;
> - options->skiplist = &skiplist;
> - }
> -
> fp = fopen(path, "r");
> if (!fp)
> die("Could not open skip list: %s", path);
> @@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, const char *path)
> const char *p;
> if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
> die("Invalid SHA-1: %s", sb.buf);
> - oid_array_append(&skiplist, &oid);
> - if (sorted && skiplist.nr > 1 &&
> - oidcmp(&skiplist.oid[skiplist.nr - 2],
> - &oid) > 0)
> - sorted = 0;
> + oidset_insert(&options->skiplist, &oid);
> }
> if (ferror(fp))
> die_errno("Could not read '%s'", path);
> fclose(fp);
> strbuf_release(&sb);
> -
> - if (sorted)
> - skiplist.sorted = 1;
> }
>
> static int parse_msg_type(const char *str)
> @@ -319,9 +302,7 @@ static void append_msg_id(struct strbuf *sb, const char *msg_id)
>
> static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
> {
> - if (opts && opts->skiplist && obj)
> - return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
> - return 0;
> + return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
> }
>
> __attribute__((format (printf, 4, 5)))
> diff --git a/fsck.h b/fsck.h
> index 0c7e8c9428..b95595ae5f 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -1,6 +1,8 @@
> #ifndef GIT_FSCK_H
> #define GIT_FSCK_H
>
> +#include "oidset.h"
> +
> #define FSCK_ERROR 1
> #define FSCK_WARN 2
> #define FSCK_IGNORE 3
> @@ -35,12 +37,12 @@ struct fsck_options {
> fsck_error error_func;
> unsigned strict:1;
> int *msg_type;
> - struct oid_array *skiplist;
> + struct oidset skiplist;
> struct decoration *object_names;
> };
>
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL }
> +#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> +#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
>
> /* descend in all linked child objects
> * the return value is:
next prev parent reply other threads:[~2018-08-27 20:15 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 2/7] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 3/7] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 4/7] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
2018-08-27 20:15 ` Ævar Arnfjörð Bjarmason [this message]
2018-08-28 9:52 ` [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 1/8] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 2/8] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 3/8] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 4/8] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 5/8] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:59 ` René Scharfe
2018-09-03 14:49 ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 01/10] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 02/10] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 03/10] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 04/10] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 06/10] fsck: add a performance test Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 07/10] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 10/10] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:29 ` [PATCH v3 6/7] fsck: use oidset for skiplist René Scharfe
2018-08-27 19:43 ` [PATCH v3 7/7] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-27 19:46 ` [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lg8refcr.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).