git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Typesafer git hash patch
Date: Mon, 27 Feb 2017 22:59:15 -0800
Message-ID: <CA+55aFxYs1zp2c-UPe8EfshNNOxRVxZ2H+ipsnG489NBsE+DLQ@mail.gmail.com> (raw)

So because of the whole SHA1 discussion, I started looking at what it
would involve to turn

    unsigned char *sha1

style arguments (and various structure members) in the git source into

   typedef struct { ... } hash_t;

   hash_t *hash;

The answer is that it's pretty painful - more so than I expected (I
looked at it _looong_ ago and decided it was painful - and it has
become more painful as git has grown).

But once I got started I just continued through the slog.

Having the hashes be more encapsulated does seem to make things better
in many ways. What I did was to also just unify the notion of "hash_t"
and "struct object_id", so the two are entirely interchangeable. That
actually can clean up some code, because right now we have duplicate
interfaces for some things that take an oid pointer and others take a
"const unsigned char *sha1", and that duplication essentially can go
away. I didn't do any of that, I tried to keep it as as brute-force
stupid conversion.

I saw that somebody is actually looking at doing this "well" - by
doing it in many smaller steps. I tried. I gave up. The "unsigned char
*sha1" model is so ingrained that doing it incrementally just seemed
like a lot of pain. But it might be the right approach.

It might particularly be the right approach considering the size of the patch:

 216 files changed, 4174 insertions(+), 4080 deletions(-)

which is pretty nasty. The good news is that my patch passes all the
tests, and while it's big it's mostly very very mindless, and a lot of
it looks like cleanups, and the lines are generally shorter, eg

-               const unsigned char *mb = result->item->object.oid.hash;
-               if (!hashcmp(mb, current_bad_oid->hash)) {

turns into

+               const hash_t *mb = &result->item->object.oid;
+               if (!hashcmp(mb, current_bad_oid)) {

but I ended up also renaming a lot of common "sha1" as "hash", which
adds to the noise in that patch.

NOTE! It doesn't actually _fix_ the SHA1-centricity in any way, but it
makes it a bit more obvious where the bigger problems are. Not that
anybody would be surprised by what they are, but as part of writing
the patch it did kind of pinpoint most of them, and about 30 of those
new lines are added

 /* FIXME! Hardcoded hash sizes */
 /* FIXME! Lots of fixed-size hashes */
 /* FIXME! Fixed 20-byte hash usage */

with the rest of the added lines being a few helper functions and
splitting cache.h up a bit.

And as part of the type safety, I do think I may have found a bug:

show_one_mergetag():

                strbuf_addf(&verify_message, "tag %s names a non-parent %s\n",
                                    tag->tag, tag->tagged->oid.hash);

note how it prints out the "non-parent %s", but that's a SHA1 hash
that hasn't been converted to hex. Hmm?

Anyway, is anybody interested?  I warn you - it really is one big
patch on top of 2.12.

                   Linus

PS. Just for fun, I tried to look what it does when you then merge pu
or next.. You do get a fair number of merge conflicts because there's
just a lot of changes all around, but they look manageable. So merging
something like that would be painful, but it appears to not entirely
break other development.

             reply index

Thread overview: 8+ messages in thread (expand / mbox.gz / Atom feed / [top])
2017-02-28  6:59 Linus Torvalds [this message]
     [not found] ` <xmqqvarujdmv.fsf@gitster.mtv.corp.google.com>
2017-02-28 20:19   ` brian m. carlson
2017-02-28 20:38     ` Linus Torvalds
2017-02-28 20:25   ` Linus Torvalds
2017-02-28 20:45     ` brian m. carlson
2017-02-28 20:26 ` Jeff King
2017-02-28 20:33   ` brian m. carlson
2017-02-28 20:37     ` Jeff King

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply to all the recipients using the --to, --cc,
  and --in-reply-to switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFxYs1zp2c-UPe8EfshNNOxRVxZ2H+ipsnG489NBsE+DLQ@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox