From: Jonathan Nieder <jrnieder@gmail.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: "Stefan Beller" <sbeller@google.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Phillip Wood" <phillip.wood@dunelm.org.uk>,
"git@vger.kernel.org" <git@vger.kernel.org>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: pushing for a new hash, was Re: [PATCH 2/3] rebase: Add tests for console output
Date: Fri, 2 Jun 2017 10:54:55 -0700 [thread overview]
Message-ID: <20170602175455.GA30988@aiede.mtv.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1.1706021442190.171564@virtualbox>
Hi Dscho,
Johannes Schindelin wrote:
> On Thu, 1 Jun 2017, Stefan Beller wrote:
>> We had a discussion off list how much of the test suite is in bad shape,
>> and "$ git grep ^index" points out a lot of places as well.
>
> Maybe we should call out a specific month (or even a longer period) during
> which we try to push toward that new hash function, and focus more on
> those tasks (and on critical bug fixes, if any) than anything else.
Thanks for offering. ;-)
Here's a rough list of some useful tasks, in no particular order:
1. bc/object-id: This patch series continues, eliminating assumptions
about the size of object ids by encapsulating them in a struct.
One straightforward way to find code that still needs to be
converted is to grep for "sha" --- often the conversion patches
change function and variable names to refer to oid_ where they used
to use sha1_, making the stragglers easier to spot.
2. Hard-coded object ids in tests: As Stefan hinted, many tests beyond
t00* make assumptions about the exact values of object ids. That's
bad for maintainability for other reasons beyond the hash function
transition, too.
It should be possible to suss them out by patching git's sha1
routine to use the ones-complement of sha1 (~sha1) instead and
seeing which tests fail.
3. Repository format extension to use a different hash function: we
want git to be able to work with two hash functions: sha1 and
something else. For interoperability and simplity, it is useful
for a single git binary to support both hash functions.
That means a repository needs to be able to specify what hash
function is used for the objects in that repository. This can be
configured by setting '[core] repositoryformatversion=1' (to avoid
confusing old versions of git) and
'[extensions] experimentalNewHashFunction = true'.
Documentation/technical/repository-version.txt has more details.
We can start experimenting with this using e.g. the ~sha1 function
described at (2), or the 160-bit hash of the patch author's choice
(e.g. truncated blake2bp-256).
4. When choosing a hash function, people may argue about performance.
It would be useful for run some benchmarks for git (running
the test suite, t/perf tests, etc) using a variety of hash
functions as input to such a discussion.
5. Longer hash: Even once all object id references in git use struct
object_id (see (1)), we need to tackle other assumptions about
object id size in git and its tests.
It should be possible to suss them out by replacing git's sha1
routine with a 40-byte hash: sha1 with each byte repeated (sha1+sha1)
and seeing what fails.
6. Repository format extension for longer hash: As in (3), we could
add a repository format extension to experiment with using the
sha1+sha1 function.
7. Avoiding wasted memory from unused hash functions: struct object_id
has definition 'unsigned char hash[GIT_MAX_RAWSZ]', where
GIT_MAX_RAWSZ is the size of the largest supported hash function.
When operating on a repository that only uses sha1, this wastes
memory.
Avoid that by making object identifiers variable-sized. That is,
something like
struct object_id {
union {
unsigned char hash20[20];
unsigned char hash32[32];
} *hash;
}
or
struct object_id {
unsigned char *hash;
}
The hard part is that allocation and destruction have to be
explicit instead of happening automatically when an object_id is an
automatic variable.
8. Implement
http://public-inbox.org/git/20170307001709.GC26789@aiede.mtv.corp.google.com/
:) I'd like to send a breakdown of that too, but that probably
should happen in a separate message.
9. We can use help from security experts in all of this. Fuzzing,
analysis of how we use cryptography, security review of other parts
of the design, and information to help choose a hash function are
all appreciated.
> I also wonder how we can attract (back) cryptographic talent to help us
> avoid repeating mistakes when picking a new hash algorithm.
>
> So far, the only undisputable expert opinion I read was from the Keccak
> team, and I did not have the impression that their opinion had any impact
> on the discussion. Needless to say: I think it should. Cryptography is
> hard. We proved it ;-)
Do you have some ideas in mind here?
How did you get the impression that their opinion had no impact? We
have been getting feedback about the choice of hash function both on
and off list from a variety of people, some indisputably security
experts. Sometimes the best one can do is to just listen.
For what it's worth my personal opinion is currently leaning toward
blake2bp-256 as choice of hash function, not SHA2 or SHA3. But we
still have time to learn more and make a better decision.
Thanks and hope that helps,
Jonathan
next prev parent reply other threads:[~2017-06-02 17:55 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-31 10:42 [PATCH 0/3] Add regression tests for recent rebase -i fixes Phillip Wood
2017-05-31 10:42 ` [PATCH 1/3] rebase -i: Add test for reflog message Phillip Wood
2017-06-01 2:00 ` Junio C Hamano
2017-05-31 10:42 ` [PATCH 2/3] rebase: Add tests for console output Phillip Wood
2017-05-31 19:02 ` Phillip Wood
2017-06-01 1:59 ` Junio C Hamano
2017-06-01 12:56 ` Johannes Schindelin
2017-06-01 23:40 ` Junio C Hamano
2017-06-01 23:47 ` Stefan Beller
2017-06-02 12:47 ` pushing for a new hash, was " Johannes Schindelin
2017-06-02 17:54 ` Jonathan Nieder [this message]
2017-06-02 18:05 ` Jonathan Nieder
2017-06-02 20:29 ` Ævar Arnfjörð Bjarmason
2017-06-15 10:38 ` Johannes Schindelin
2017-06-03 0:36 ` Junio C Hamano
2017-06-06 22:22 ` Johannes Schindelin
2017-06-06 22:45 ` Jonathan Nieder
2017-06-07 1:09 ` Junio C Hamano
2017-06-07 2:18 ` [PATCH] t4005: modernize style and drop hard coded sha1 Stefan Beller
2017-06-07 17:39 ` Brandon Williams
2017-06-06 22:45 ` pushing for a new hash, was Re: [PATCH 2/3] rebase: Add tests for console output Stefan Beller
2017-06-06 22:52 ` Jonathan Nieder
2017-06-07 0:34 ` Samuel Lijin
2017-06-07 14:47 ` Johannes Schindelin
2017-06-07 16:53 ` Stefan Beller
2017-06-07 10:47 ` Phillip Wood
2017-06-09 16:39 ` Junio C Hamano
2017-06-14 10:18 ` Phillip Wood
2017-06-14 12:51 ` Johannes Schindelin
2017-05-31 10:42 ` [PATCH 3/3] rebase: Add tests for console output with conflicting stash Phillip Wood
2017-06-14 10:24 ` [PATCH v2 0/3] Add regression tests for rectent rebase -i fixes Phillip Wood
2017-06-14 10:24 ` [PATCH v2 1/3] rebase -i: Add test for reflog message Phillip Wood
2017-06-14 10:24 ` [PATCH v2 2/3] rebase: Add regression tests for console output Phillip Wood
2017-06-14 10:24 ` [PATCH v2 3/3] rebase: Add more " Phillip Wood
2017-06-14 20:35 ` [PATCH v2 0/3] Add regression tests for rectent rebase -i fixes Johannes Schindelin
2017-06-15 23:05 ` Junio C Hamano
2017-06-15 23:23 ` Junio C Hamano
2017-06-15 23:29 ` Junio C Hamano
2017-06-16 13:49 ` Johannes Schindelin
2017-06-16 18:43 ` Johannes Sixt
2017-06-16 21:05 ` Junio C Hamano
2017-06-19 19:45 ` Johannes Sixt
2017-06-19 20:02 ` Junio C Hamano
2017-06-19 9:49 ` Phillip Wood
2017-06-19 15:45 ` Junio C Hamano
2017-06-19 9:52 ` Phillip Wood
2017-06-19 17:56 ` [PATCH v3 0/4] Add regression tests for recent " Phillip Wood
2017-06-19 17:56 ` [PATCH v3 1/4] sequencer: print autostash messages to stderr Phillip Wood
2017-06-19 17:56 ` [PATCH v3 2/4] rebase -i: Add test for reflog message Phillip Wood
2017-06-19 17:56 ` [PATCH v3 3/4] rebase: Add regression tests for console output Phillip Wood
2017-06-19 17:56 ` [PATCH v3 4/4] rebase: Add more " Phillip Wood
2017-06-23 4:17 ` [PATCH v3 0/4] Add regression tests for recent rebase -i fixes Junio C Hamano
2017-06-23 5:07 ` Junio C Hamano
2017-06-23 9:53 ` Phillip Wood
2017-06-23 17:03 ` Junio C Hamano
2017-06-23 18:53 ` Junio C Hamano
2017-06-26 9:17 ` Phillip Wood
2017-06-23 19:01 ` Junio C Hamano
2017-06-26 9:23 ` Phillip Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170602175455.GA30988@aiede.mtv.corp.google.com \
--to=jrnieder@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=phillip.wood@dunelm.org.uk \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).