On Mon, Nov 05, 2018 at 12:39:14PM +0100, Ævar Arnfjörð Bjarmason wrote: > On Sun, Nov 04 2018, brian m. carlson wrote: > > + { > > + "sha256", > > + /* "s256", big-endian */ > > The existing entry/comment for sha1 is: > > "sha1", > /* "sha1", big-endian */ > > So why the sha256/s256 difference in the code/comment? Wondering if I'm > missing something and we're using "s256" for something. Ah, good question. The comment refers to the format_id field which follows this comment. The value is the big-endian representation of "s256" as a 32-bit value. I picked that over "sha2" to avoid confusion in case we add SHA-512 in the future, since that's also an SHA-2 algorithm. Config files and command-line interfaces will use "sha1" or "sha256", and binary formats will use those 32-bit values ("sha1" or "s256"). > > const char *empty_tree_oid_hex(void) > > diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c > > [...] > > I had a question before about whether we see ourselves perma-forking > this implementation based off libtomcrypt, as I recall you said yes. Yes. > Still, I think it would be better to introduce this in at least two-four > commits where the upstream code is added as-is, then trimmed down to > size, then adapted to our coding style, and finally we add our own > utility functions. At this point, the only code that's actually used from libtomcrypt is the block transform. The upstream code is split over multiple files in multiple directories and won't compile in our codebase without many files and a lot of work, so I don't feel good about either including code that doesn't compile or including large numbers of files that don't meet our coding standards (and that may still not compile because of platform issues). > It'll make it easier to forward-port any future upstream changes. I don't foresee many, if any, changes to this code. It either implements the specification or it doesn't, and it's objectively easy to determine which. There's not even an argument to port performance improvements, since almost everyone will be using a crypto library to provide this code because libraries perform so dramatically better. I've tried to not make the code perform worse than it did originally, but that's it. Furthermore, the modified code carries a relatively small amount of resemblance to the original, so if we did port changes forward, we'd probably have conflicts. It seems like you really want to include the upstream code as a separate commit and I understand where you're coming from with wanting to have this split out into logical commits, but due to the specific nature of this code, I see a lot of downsides and not a lot of upsides. > > + perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \ > > + test-tool sha256 >actual && > > + grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual && > > + perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \ > > + test-tool sha256 >actual && > > I've been wanting to make use depend on perl >= 5.10 (previous noises > about that on-list), but for now we claim to support >=5.8, which > doesn't have the -E switch. Good point. I'll fix that. After having written a lot of one-liners, I always write -E, and this was originally a one-liner. > But most importantly you aren't even using -E features here, and this > isn't very idoimatic Perl. Instead do, respectively: > > perl -e 'print q{aaaaaaaaaa} x 100000' > perl -e "print q{abcdefghijklmnopqrstuvwxyz} x 100000" I considered the more idiomatic version originally, but the latter could allocate a decent amount of memory in one chunk, and I wanted to avoid that. I think what I'd like to do, actually, is turn on autoflush and use a postfix for, which would be more idiomatic and could potentially provide better testing of the chunking code. I'll add a comment to that effect. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204