git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: Git List <git@vger.kernel.org>,
	Ramsay Jones <ramsay@ramsayjones.plus.com>,
	Johannes Schindelin <johannes.schindelin@gmx.de>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 2/2] fsck: use oidset for skiplist
Date: Mon, 27 Aug 2018 09:37:11 +0200	[thread overview]
Message-ID: <87r2ikdzvc.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <ae61a845-16fa-e2d6-935d-ce5eb1e33e5a@web.de>


On Sat, Aug 25 2018, René Scharfe wrote:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 2fa65b7516..80ab570579 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1715,7 +1715,7 @@ doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
>  will only cause git to warn.
>
>  fsck.skipList::
> -	The path to a sorted list of object names (i.e. one SHA-1 per
> +	The path to a list of object names (i.e. one SHA-1 per
>  	line) that are known to be broken in a non-fatal way and should
>  	be ignored. This feature is useful when an established project
>  	should be accepted despite early commits containing errors that

I was going to say that since this is a file format we're likely to
support across versions we should make a note that "up to version
so-and-so this needed to be sorted, but that's longer the case. So
e.g. someone wouldn't test this on 2.20 (or read the online docs) and
then deploy this for older clients..

But...

> -	if (options->skiplist)
> -		sorted = options->skiplist->sorted;
> -	else {
> -		sorted = 1;
> -		options->skiplist = &skiplist;
> -	}
> -
>  	fp = fopen(path, "r");
>  	if (!fp)
>  		die("Could not open skip list: %s", path);
> @@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, const char *path)
>  		const char *p;
>  		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
>  			die("Invalid SHA-1: %s", sb.buf);
> -		oid_array_append(&skiplist, &oid);
> -		if (sorted && skiplist.nr > 1 &&
> -				oidcmp(&skiplist.oid[skiplist.nr - 2],
> -				       &oid) > 0)
> -			sorted = 0;
> +		oidset_insert(&options->skiplist, &oid);

...reading this implementation, where we always called
oid_array_append(), but then just kept track of whether the set was
sorted...

>  	}
>  	if (ferror(fp))
>  		die_errno("Could not read '%s'", path);
>  	fclose(fp);
>  	strbuf_release(&sb);
> -
> -	if (sorted)
> -		skiplist.sorted = 1;

...and here where we assigned to the .sorted member of the oid_array...

>  static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
>  {
> -	if (opts && opts->skiplist && obj)
> -		return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
> -	return 0;
> +	return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
>  }

....and here where we'd always do the lookup if the skiplist was
initialized, *not* just if it's sorted, and how the sha1-array.c code
has looked ever since cd94c6f91 ("fsck: git receive-pack: support
excluding objects from fsck'ing", 2015-06-22) when this was first added:

    $ git show cd94c6f91:sha1-array.c|grep -A5 sha1_array_lookup
    int sha1_array_lookup(struct sha1_array *array, const unsigned char *sha1)
    {
            if (!array->sorted)
                    sha1_array_sort(array);
            return sha1_pos(sha1, array->sha1, array->nr, sha1_access);
    }

So I think it makes sense to make this series a three-part, where in the
first part we only change these docs to say s/sorted //, and modify the
tests I added in 65a836fa6b ("fsck: add stress tests for fsck.skipList",
2018-07-27) to assert that an unsorted & sorted list of SHA-1s works
just as well.

Then following up with your [12]/2, where the internal implementation is
changed, but we make it clear that it's *just* the internal
implementation. I.e. from a UI perspective the list never had to be
pre-sorted, we'd just spend some work sorting it on the first lookup if
it wasn't sorted already.

Now without some very careful reading it's not clear what "we don't need
to worry about any sort order anymore" in the commit message means,
i.e. what it really means is "for the purposes of the internal
implementation, and as an opt-in user-side optimization ...".

I.e. an alternate version of this whole patch series could also be:

    diff --git a/Documentation/config.txt b/Documentation/config.txt
    index 1c4236498..930807e43 100644
    --- a/Documentation/config.txt
    +++ b/Documentation/config.txt
    @@ -1709,5 +1709,5 @@ will only cause git to warn.

     fsck.skipList::
    -       The path to a sorted list of object names (i.e. one SHA-1 per
    +       The path to a list of object names (i.e. one SHA-1 per
            line) that are known to be broken in a non-fatal way and should
            be ignored. This feature is useful when an established project
    diff --git a/fsck.c b/fsck.c
    index a0cee0be5..9d4e938ad 100644
    --- a/fsck.c
    +++ b/fsck.c
    @@ -184,14 +184,10 @@ static void init_skiplist(struct fsck_options *options, const char *path)
     {
            static struct oid_array skiplist = OID_ARRAY_INIT;
    -       int sorted, fd;
    +       int fd;
            char buffer[GIT_MAX_HEXSZ + 1];
            struct object_id oid;

    -       if (options->skiplist)
    -               sorted = options->skiplist->sorted;
    -       else {
    -               sorted = 1;
    +       if (!options->skiplist)
                    options->skiplist = &skiplist;
    -       }

            fd = open(path, O_RDONLY);
    @@ -208,13 +204,6 @@ static void init_skiplist(struct fsck_options *options, const char *path)
                            die("Invalid SHA-1: %s", buffer);
                    oid_array_append(&skiplist, &oid);
    -               if (sorted && skiplist.nr > 1 &&
    -                               oidcmp(&skiplist.oid[skiplist.nr - 2],
    -                                      &oid) > 0)
    -                       sorted = 0;
            }
            close(fd);
    -
    -       if (sorted)
    -               skiplist.sorted = 1;
     }

Now, I like yours much better. I'm just saying that currently the
patch/commit message combo is confusing about *what* it's
doing. I.e. let's not mix up the correction of docs that were always
wrong with a non-change in the user facing implementation, and some
internal optimization all in one patch.

  reply	other threads:[~2018-08-27  7:37 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-11 15:39 [PATCH 1/2] fsck: use strbuf_getline() to read skiplist file René Scharfe
2018-08-11 15:47 ` [PATCH 2/2] fsck: use oidset for skiplist René Scharfe
2018-08-11 16:54   ` Ævar Arnfjörð Bjarmason
2018-08-25 18:49     ` René Scharfe
2018-08-11 17:02   ` Jeff King
2018-08-11 17:23     ` Jeff King
2018-08-11 20:59       ` René Scharfe
2018-08-13 17:15         ` René Scharfe
2018-08-14  1:58           ` Jeff King
2018-08-14  2:03             ` Jeff King
2018-08-26 11:37             ` René Scharfe
2018-08-27 23:03               ` Jeff King
2018-10-01 19:15                 ` René Scharfe
2018-10-01 20:26                   ` Jeff King
2018-10-02 19:05                     ` René Scharfe
2018-10-02 19:19                       ` Jeff King
2018-08-13 17:15       ` René Scharfe
2018-08-14  2:01         ` Jeff King
2018-08-11 20:48   ` Ramsay Jones
2018-08-25 18:49     ` René Scharfe
2018-08-13 18:43   ` Junio C Hamano
2018-08-13 20:26     ` René Scharfe
2018-08-13 21:07       ` Junio C Hamano
2018-08-13 23:09         ` René Scharfe
2018-08-11 16:48 ` [PATCH 1/2] fsck: use strbuf_getline() to read skiplist file Jeff King
2018-08-11 21:00   ` René Scharfe
2018-08-25 18:50 ` [PATCH v2 " René Scharfe
2018-08-27 23:00   ` Jeff King
2018-08-25 18:50 ` [PATCH v2 2/2] fsck: use oidset for skiplist René Scharfe
2018-08-27  7:37   ` Ævar Arnfjörð Bjarmason [this message]
2018-08-27 15:23     ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r2ikdzvc.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).