git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/9] fsck_tree(): fix shadowed variable
Date: Tue, 04 May 2021 12:10:16 +0200	[thread overview]
Message-ID: <87fsz2wupo.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YJBZYt8ivlTDHpRM@coredump.intra.peff.net>


On Mon, May 03 2021, Jeff King wrote:

> On Mon, May 03, 2021 at 01:15:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
>> > fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
>> > fsck_tree(), and we pass it to the report() function when we find
>> [...]
>> 
>> Have you considered just passing down the "obj" instead of the "oid"?
>> It's a bit of a bigger change, but seems to be worth it in this case as
>> the below diff (on top of master) shows. We spend quite a bit of effort
>> peeling an "obj" just to pass oid/type further down the stack.
>
> That would be undoing the point of the referenced commit above. More
> discussion in this thread:
>
>   https://lore.kernel.org/git/20191018044103.GA17625@sigill.intra.peff.net/
>
> but the gist is:
>
>   - there were bugs caused by looking at the object structs; not having
>     it make sure we can't introduce similar bugs
>
>   - using oids and buffers makes it possible for code to avoid having
>     object structs at all. index-pack could take advantage of this (to
>     reduce memory usage), but nobody has yet written that patch.

Ah, I'd managed to read this series + done some previous fsck hacking
without noticing that recent-ish change.

FWIW I'd been meaning to slowly mutate some of the fsck code somewhat in
the opposite direction. I.e. we have N object parsers now between
{blob,commit,tree,tag,fsck,tree-walk}.c and elsewhere. Ideally we could
share all/most of the logic and piggy-back in fsck on the "main" parser
in some more exhaustive mode.

Unfortunately most of our fsck tests are rather non-exhaustive, so I see
e.g. in reverting some of that on our current test suite e.g. a test in
directory.t6102-rev-list-unexpected-objects was changed from:

    error in commit 536c6159e861ef1ac7967a68324a8e7614632970: missingParent: parent objects missing

To:

    error in commit 536c6159e861ef1ac7967a68324a8e7614632970: broken links

So by using the "main" parser in this case we lost the ability to
selectively ignore that error, looks like nobody cared (and the OID
could still be added to the skiplist).

Another thing that got worse (but we arguably never supported well) is
fsck_commit() and expecting to validate at least a single commit
object. Before we'd notice missing parents AFAICT, after we don't at
that entry point, but will of course if entering via fsck_walk().

I don't think any current caller cares, but when I rewrote
builtin/mktag.c recently there was a suggestion to do that "fsck the new
object" for the low-level commit/tree/blob APIs too. Before (and again,
AFACT) that would work to a level of 1, after you'll need to instrument
some limited walk.

Anyway. This inspired me to try re-arranging the "struct object" to be a
superset of "struct object_id", i.e. to make it start with the oid
instead of "parsed" etc. It means you can do this:

	static inline struct object_id *object_to_oid(const struct object *obj)
	{
	        return (struct object_id *)obj;
	}

And change common cases like oid_to_hex(&obj->oid) to
oid_to_hex(object_to_oid(obj)), i.e. a wrapper for a pure cast. Just
that seems to speed up fsck by 1-3%, but I'm not 100% certain yet. The
cachegrind numbers look unambiguously better. This is with gcc and -O3.

More generally we have various other structures that wrap an "oid" in
one way or the other, where the common access pattern is just grabbing
the OID out of it, but it's not the first entry. E.g. "name_entry",
"directory" etc. By re-arranging it like that we could also have a
"object_id_type" which is a 2-member touple of "oid/type", so by having
that "oid/type" first in "struct object" you can cast "struct object" to
"struct object_id_type" for fsck.c use, and further cast a "struct
object_id_type" to "struct object_id" for anything that needs that.

Anyway, right now I don't have anything conclusive. Just thought I'd
poke you/others with the above in case it's an appraoch that's been
tried/considered before.

  reply	other threads:[~2021-05-04 12:26 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
2021-05-01 15:41 ` [PATCH 1/9] t7415: remove out-dated comment about translation Jeff King
2021-05-03  9:46   ` Ævar Arnfjörð Bjarmason
2021-05-03 20:29     ` Jeff King
2021-05-01 15:41 ` [PATCH 2/9] fsck_tree(): fix shadowed variable Jeff King
2021-05-03 11:15   ` Ævar Arnfjörð Bjarmason
2021-05-03 20:13     ` Jeff King
2021-05-04 10:10       ` Ævar Arnfjörð Bjarmason [this message]
2021-05-01 15:41 ` [PATCH 3/9] fsck_tree(): wrap some long lines Jeff King
2021-05-03 11:22   ` Ævar Arnfjörð Bjarmason
2021-05-03 20:23     ` Jeff King
2021-05-01 15:42 ` [PATCH 4/9] t7415: rename to expand scope Jeff King
2021-05-01 15:42 ` [PATCH 5/9] t7450: test verify_path() handling of gitmodules Jeff King
2021-05-01 18:55   ` Eric Sunshine
2021-05-01 19:03     ` Eric Sunshine
2021-05-03 19:39       ` Jeff King
2021-05-03 10:12   ` Ævar Arnfjörð Bjarmason
2021-05-03 20:32     ` Jeff King
2021-05-01 15:42 ` [PATCH 6/9] t7450: test .gitmodules symlink matching against obscured names Jeff King
2021-05-01 15:42 ` [PATCH 7/9] t0060: test ntfs/hfs-obscured dotfiles Jeff King
2021-05-01 15:43 ` [PATCH 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW Jeff King
2021-05-01 15:43 ` [PATCH 9/9] docs: document symlink restrictions for dot-files Jeff King
2021-05-01 19:16   ` Eric Sunshine
2021-05-03 20:33     ` Jeff King
2021-05-03  5:36 ` [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Junio C Hamano
2021-05-03 20:42 ` [PATCH v2 " Jeff King
2021-05-03 20:43   ` [PATCH v2 1/9] t7415: remove out-dated comment about translation Jeff King
2021-05-03 20:43   ` [PATCH v2 2/9] fsck_tree(): fix shadowed variable Jeff King
2021-05-03 20:43   ` [PATCH v2 3/9] fsck_tree(): wrap some long lines Jeff King
2021-05-03 20:43   ` [PATCH v2 4/9] t7415: rename to expand scope Jeff King
2021-05-03 20:43   ` [PATCH v2 5/9] t7450: test verify_path() handling of gitmodules Jeff King
2021-05-03 20:43   ` [PATCH v2 6/9] t7450: test .gitmodules symlink matching against obscured names Jeff King
2021-05-03 20:43   ` [PATCH v2 7/9] t0060: test ntfs/hfs-obscured dotfiles Jeff King
2021-05-03 20:43   ` [PATCH v2 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW Jeff King
2021-05-03 20:43   ` [PATCH v2 9/9] docs: document symlink restrictions for dot-files Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsz2wupo.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --subject='Re: [PATCH 2/9] fsck_tree(): fix shadowed variable' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git