From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Typesafer git hash patch
Date: Tue, 28 Feb 2017 12:25:20 -0800 [thread overview]
Message-ID: <CA+55aFzUhWinWqK30GBc1BKy-v6QtDdO2BLUODkiqg9XoKLrwA@mail.gmail.com> (raw)
In-Reply-To: <xmqqvarujdmv.fsf@gitster.mtv.corp.google.com>
On Tue, Feb 28, 2017 at 11:53 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>>
>> Having the hashes be more encapsulated does seem to make things better
>> in many ways. What I did was to also just unify the notion of "hash_t"
>> and "struct object_id", so the two are entirely interchangeable.
>
> Sorry, but at this point in your description, you completely lost
> me. I thought "struct object_id" was what you call "hash_t" in the
> above.
So what happened was that I started out just encapsulating
unsigned char sha1[20];
as a
hash_t hash;
and that made sense in a lot of situations. I always thought that code that used
struct object_id oid;
is just too ugly to live, so I'm not actually all that big of a fan of
the oid approach.
But the two approaches really are pretty much equivalent logically,
even if they don't look the same.
So I wanted to unify things: "One type to bring them all and in the
darkness bind them".
So I just basically made this:
typedef struct object_id {
unsigned char hash[GIT_HASH_SIZE];
} hash_t;
to create one single data structure that doesn't make my eyes bleed.
That "struct object_id" still exists, but I don't generally have to
look at it when doing the conversion, and any current users "just
work".
>> turns into
>>
>> + const hash_t *mb = &result->item->object.oid;
>> + if (!hashcmp(mb, current_bad_oid)) {
>
> Hmph. I somehow thought the longer term directio for the above code
> would be to turn it into
>
> if (!oidcmp(&result->item->object.oid, ¤t_bad_oid))
Well, you can actually do it with my patch, since I left "oidcmp()"
alone and it's just an alias for "hashcmp()" in my tree.
Except I think "oid" is an odious name, and really confusing and not
at all descriptive.
Using a three-letter acronym when we have a four-letter actual word to
say it feels stupid and wrong to me.
So what my conversion does is basically say that the name is *hash*.
So instead of using "oidcmp", you use "hashcmp":
if (!hashcmp(&result->item->object.oid, ¤t_bad_oid))
and functions take a "hash_t *" argument rather than a "struct
object_id *" argument, and when there was any kind of confusion and
mixing of use, I converted to "hash_t".
Both oid and "unsigned char *" users got converted.
In other words, what I was aiming for was getting rid - entirely - of
the "two different types", and I disliked both "oid" and "unsigned
char []", so neither replaces the other.
> Having said all that, I do not offhand see a huge benefit of the
> current layout that has one layer between the hash (i.e. oid.hash)
> and the object name (i.e. oid) over "there is no need for oid.hash;
> oid is just a hash", which you seem to be doing.
Yes exactly.
>> And as part of the type safety, I do think I may have found a bug:
>>
>> show_one_mergetag():
>>
>> strbuf_addf(&verify_message, "tag %s names a non-parent %s\n",
>> tag->tag, tag->tagged->oid.hash);
>>
>> note how it prints out the "non-parent %s", but that's a SHA1 hash
>> that hasn't been converted to hex. Hmm?
>
> Yup. That needs fixing, obviously.
I suspect nobody has ever hit that case - I tried to google for "names
a non-parent" and "tag" and "git" and the only thing that I found was
hits to git source.
So I was actually fairly impressed that the only thing I found was one
totally insignificant bug in a printout.
I did find a lot of cases where we really do mix a buffer of memory
("unsigned char *") with the hash. Not unsurprisingly, most of them
were in pack-file handling and in the tree parsing.
And some thing do the reverse, and really walk a hash name byte by
byte. Things like "find_pack_entry_one()" really does walk the bytes
of the hash.
With the conversion in place, those painful things are a bit more
obvious. So there's a couple of places where I just did a hard
conversion from a "unsigned char *" to a hash_t, but they are now
obvious casts and there's only 17 of them:
[torvalds@i7 git]$ git grep '(hash_t \*)'
builtin/index-pack.c: hashcpy(ref_hash, (hash_t *) fill(20));
builtin/pack-redundant.c: hash_t *h1 = (hash_t
*)(p1_base + p1_off);
builtin/pack-redundant.c: hash_t *h2 = (hash_t
*)(p2_base + p2_off);
builtin/pack-redundant.c: hash_t *h1 = (hash_t
*)(p1_base + p1_off);
builtin/pack-redundant.c: hash_t *h2 = (hash_t
*)(p2_base + p2_off);
builtin/pack-redundant.c: hash_t *h = (hash_t *)(base + off);
dir.c: hashcpy(&ud->exclude_sha1, (hash_t *)rd->data);
fast-import.c: hashcpy(&e->versions[0].hash, (hash_t *)c);
fast-import.c: hashcpy(&e->versions[1].hash, (hash_t *)c);
match-trees.c: hashcpy((hash_t *)rewrite_here, rewrite_with);
sha1-lookup.c: lo, mi, hi, sha1_to_hex((hash_t *)key));
sha1_file.c: return (hash_t *)(base + idx * GIT_SHA1_RAWSZ);
sha1_file.c: return (hash_t *)base;
sha1_file.c: return (hash_t *) (index + 24 * n + 4);
sha1_file.c: return (hash_t *) (index + 20 * n);
sha1_file.c: int cmp = hashcmp((hash_t *)(index + mi *
stride), (hash_t *)sha1);
split-index.c: hashcpy(&si->base_sha1, (hash_t *)data);
and there are basically an equal number of cases where I do the
reverse (by doing hash->hash to get the byte array data of the hash).
So the patch doesn't *fix* anything, but it does, I think, make it
easier to see the problems.
And the *bulk* of the code doesn't look inside the hashes at all.
Linus
next prev parent reply other threads:[~2017-02-28 20:35 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-28 6:59 Typesafer git hash patch Linus Torvalds
[not found] ` <xmqqvarujdmv.fsf@gitster.mtv.corp.google.com>
2017-02-28 20:19 ` brian m. carlson
2017-02-28 20:38 ` Linus Torvalds
2017-02-28 20:25 ` Linus Torvalds [this message]
2017-02-28 20:45 ` brian m. carlson
2017-02-28 20:26 ` Jeff King
2017-02-28 20:33 ` brian m. carlson
2017-02-28 20:37 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+55aFzUhWinWqK30GBc1BKy-v6QtDdO2BLUODkiqg9XoKLrwA@mail.gmail.com \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).