On Mon, Nov 05, 2018 at 12:39:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
> On Sun, Nov 04 2018, brian m. carlson wrote:
> > +	{
> > +		"sha256",
> > +		/* "s256", big-endian */
> 
> The existing entry/comment for sha1 is:
> 
> 		"sha1",
> 		/* "sha1", big-endian */
> 
> So why the sha256/s256 difference in the code/comment? Wondering if I'm
> missing something and we're using "s256" for something.

Ah, good question.  The comment refers to the format_id field which
follows this comment.  The value is the big-endian representation of
"s256" as a 32-bit value.  I picked that over "sha2" to avoid confusion
in case we add SHA-512 in the future, since that's also an SHA-2
algorithm.

Config files and command-line interfaces will use "sha1" or "sha256",
and binary formats will use those 32-bit values ("sha1" or "s256").

> >  const char *empty_tree_oid_hex(void)
> > diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
> > [...]
> 
> I had a question before about whether we see ourselves perma-forking
> this implementation based off libtomcrypt, as I recall you said yes.

Yes.

> Still, I think it would be better to introduce this in at least two-four
> commits where the upstream code is added as-is, then trimmed down to
> size, then adapted to our coding style, and finally we add our own
> utility functions.

At this point, the only code that's actually used from libtomcrypt is
the block transform.  The upstream code is split over multiple files in
multiple directories and won't compile in our codebase without many
files and a lot of work, so I don't feel good about either including
code that doesn't compile or including large numbers of files that don't
meet our coding standards (and that may still not compile because of
platform issues).

> It'll make it easier to forward-port any future upstream changes.

I don't foresee many, if any, changes to this code.  It either
implements the specification or it doesn't, and it's objectively easy to
determine which.  There's not even an argument to port performance
improvements, since almost everyone will be using a crypto library to
provide this code because libraries perform so dramatically better.
I've tried to not make the code perform worse than it did originally,
but that's it.

Furthermore, the modified code carries a relatively small amount of
resemblance to the original, so if we did port changes forward, we'd
probably have conflicts.

It seems like you really want to include the upstream code as a separate
commit and I understand where you're coming from with wanting to have
this split out into logical commits, but due to the specific nature of
this code, I see a lot of downsides and not a lot of upsides.

> > +	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
> > +		test-tool sha256 >actual &&
> > +	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
> > +	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
> > +		test-tool sha256 >actual &&
> 
> I've been wanting to make use depend on perl >= 5.10 (previous noises
> about that on-list), but for now we claim to support >=5.8, which
> doesn't have the -E switch.

Good point.  I'll fix that.  After having written a lot of one-liners,
I always write -E, and this was originally a one-liner.

> But most importantly you aren't even using -E features here, and this
> isn't very idoimatic Perl. Instead do, respectively:
> 
>     perl -e 'print q{aaaaaaaaaa} x 100000'
>     perl -e "print q{abcdefghijklmnopqrstuvwxyz} x 100000"

I considered the more idiomatic version originally, but the latter could
allocate a decent amount of memory in one chunk, and I wanted to avoid
that.  I think what I'd like to do, actually, is turn on autoflush and
use a postfix for, which would be more idiomatic and could potentially
provide better testing of the chunking code.  I'll add a comment to that
effect.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204