From: "brian m. carlson" <firstname.lastname@example.org> To: email@example.com Subject: State of NewHash work, future directions, and discussion Date: Sat, 9 Jun 2018 20:56:28 +0000 [thread overview] Message-ID: <20180609205628.GB38834@genre.crustytoothpaste.net> (raw) [-- Attachment #1: Type: text/plain, Size: 4661 bytes --] Since there's been a lot of questions recently about the state of the NewHash work, I thought I'd send out a summary. == Status I have patches to make the entire codebase work, including passing all tests, when Git is converted to use a 256-bit hash algorithm. Obviously, such a Git is incompatible with the current version, but it means that we've fixed essentially all of the hard-coded 20 and 40 constants (and therefore Git doesn't segfault). I'm working on getting a 256-bit Git to work with SHA-1 being the default. Currently, this involves doing things like writing transport code, since in order to clone a repository, you need to be able to set up the hash algorithm correctly. I know that this was a non-goal in the transition plan, but since the testsuite doesn't pass without it, it's become necessary. Some of these patches will be making their way to the list soon. They're hanging out in the normal places in the object-id-part14 branch (which may be rebased). == Future Design The work I've done necessarily involves porting everything to use the_hash_algo. Essentially, when the piece I'm currently working on is complete, we'll have a transition stage 4 implementation (all NewHash). Stage 2 and 3 will be implemented next. My vision of how data is stored is that the .git directory is, except for pack indices and the loose object lookup table, entirely in one format. It will be all SHA-1 or all NewHash. This algorithm will be stored in the_hash_algo. I plan on introducing an array of hash algorithms into struct repository (and wrapper macros) which stores, in order, the output hash, and if used, the additional input hash. Functions like get_oid_hex and parse_oid_hex will acquire an internal version, which knows about parsing things (like refs) in the internal format, and one which knows about parsing in the UI formats. Similarly, oid_to_hex will have an internal version that handles data in the .git directory, and an external version that produces data in the output format. Translation will take place at the outer edges of the program. The transition plan anticipates a stage 1 where accept only SHA-1 on input and produce only SHA-1 on output, but store in NewHash. As I've worked with our tests, I've realized such an implementation is not entirely possible. We have various tools that expect to accept invalid object IDs, and obviously there's no way to have those continue to work. We'd have to either reject invalid data in such a case or combine stages 1 and 2. == Compatibility with this Work If you're working on new features and you'd like to implement the best possible compatibility with this work, here are some recommendations: * Assume everything in the .git directory but pack indices and the loose object index will be in the same algorithm and that that algorithm is the_hash_algo. * For the moment, use the_hash_algo to look up the size of all hash-related constants. Use GIT_MAX_* for allocations. * If you are writing a new data format, add a version number. * If you need to serialize an algorithm identifier into your data format, use the format_id field of struct git_hash_algo. It's designed specifically for that purpose. * You can safely assume that the_hash_algo will be suitably initialized to the correct algorithm for your repository. * Keep using the object ID functions and struct object_id. * Try not to use mmap'd structs for reading and writing formats on disk, since these are hard to make hash size agnostic. == Discussion about an Actual NewHash Since I'll be writing new code, I'll be writing tests for this code. However, writing tests for creating and initializing repositories requires that I be able to test that objects are being serialized correctly, and therefore requires that I actually know what the hash algorithm is going to be. I also can't submit code for multi-hash packs when we officially only support one hash algorithm. I know that we have long tried to avoid discussing the specific algorithm to use, in part because the last discussion generated more heat than light, and settled on referring to it as NewHash for the time being. However, I think it's time to pick this topic back up, since I can't really continue work in this direction without us picking a NewHash. If people are interested, I've done some analysis on availability of implementations, performance, and other attributes described in the transition plan and can send that to the list. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 867 bytes --]
next reply other threads:[~2018-06-09 20:56 UTC|newest] Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-06-09 20:56 brian m. carlson [this message] 2018-06-09 21:26 ` Ævar Arnfjörð Bjarmason 2018-06-09 22:49 ` Hash algorithm analysis brian m. carlson 2018-06-11 19:29 ` Jonathan Nieder 2018-06-11 20:20 ` Linus Torvalds 2018-06-11 23:27 ` Ævar Arnfjörð Bjarmason 2018-06-12 0:11 ` David Lang 2018-06-12 0:45 ` Linus Torvalds 2018-06-11 22:35 ` brian m. carlson 2018-06-12 16:21 ` Gilles Van Assche 2018-06-13 23:58 ` brian m. carlson 2018-06-15 10:33 ` Gilles Van Assche 2018-07-20 21:52 ` brian m. carlson 2018-07-21 0:31 ` Jonathan Nieder 2018-07-21 19:52 ` Ævar Arnfjörð Bjarmason 2018-07-21 20:25 ` brian m. carlson 2018-07-21 22:38 ` Johannes Schindelin 2018-07-21 23:09 ` Linus Torvalds 2018-07-21 23:59 ` brian m. carlson 2018-07-22 9:34 ` Eric Deplagne 2018-07-22 14:21 ` brian m. carlson 2018-07-22 14:55 ` Eric Deplagne 2018-07-26 10:05 ` Johannes Schindelin 2018-07-22 15:23 ` Joan Daemen 2018-07-22 18:54 ` Adam Langley 2018-07-26 10:31 ` Johannes Schindelin 2018-07-23 12:40 ` demerphq 2018-07-23 12:48 ` Sitaram Chamarty 2018-07-23 12:55 ` demerphq 2018-07-23 18:23 ` Linus Torvalds 2018-07-23 17:57 ` Stefan Beller 2018-07-23 18:35 ` Jonathan Nieder 2018-07-24 19:01 ` Edward Thomson 2018-07-24 20:31 ` Linus Torvalds 2018-07-24 20:49 ` Jonathan Nieder 2018-07-24 21:13 ` Junio C Hamano 2018-07-24 22:10 ` brian m. carlson 2018-07-30 9:06 ` Johannes Schindelin 2018-07-30 20:01 ` Dan Shumow 2018-08-03 2:57 ` Jonathan Nieder 2018-09-18 15:18 ` Joan Daemen 2018-09-18 15:32 ` Jonathan Nieder 2018-09-18 16:50 ` Linus Torvalds 2018-07-25 8:30 ` [PATCH 0/2] document that NewHash is now SHA-256 Ævar Arnfjörð Bjarmason 2018-07-25 8:30 ` [PATCH 1/2] doc hash-function-transition: note the lack of a changelog Ævar Arnfjörð Bjarmason 2018-07-25 8:30 ` [PATCH 2/2] doc hash-function-transition: pick SHA-256 as NewHash Ævar Arnfjörð Bjarmason 2018-07-25 16:45 ` Junio C Hamano 2018-07-25 17:25 ` Jonathan Nieder 2018-07-25 21:32 ` Junio C Hamano 2018-07-26 13:41 ` [PATCH v2 " Ævar Arnfjörð Bjarmason 2018-08-03 7:20 ` Jonathan Nieder 2018-08-03 16:40 ` Junio C Hamano 2018-08-03 17:01 ` Linus Torvalds 2018-08-03 16:42 ` Linus Torvalds 2018-08-03 17:43 ` Ævar Arnfjörð Bjarmason 2018-08-04 8:52 ` Jonathan Nieder 2018-08-03 17:45 ` brian m. carlson 2018-07-25 22:56 ` [PATCH " brian m. carlson 2018-06-11 21:19 ` Hash algorithm analysis Ævar Arnfjörð Bjarmason 2018-06-21 8:20 ` Johannes Schindelin 2018-06-21 22:39 ` brian m. carlson 2018-06-11 18:09 ` State of NewHash work, future directions, and discussion Duy Nguyen 2018-06-12 1:28 ` brian m. carlson 2018-06-11 19:01 ` Jonathan Nieder 2018-06-12 2:28 ` brian m. carlson 2018-06-12 2:42 ` Jonathan Nieder
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180609205628.GB38834@genre.crustytoothpaste.net \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: State of NewHash work, future directions, and discussion' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).