git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: State of NewHash work, future directions, and discussion
Date: Mon, 11 Jun 2018 20:09:47 +0200
Message-ID: <CACsJy8CJrFCUnVMes=3_gQKNTiyHsKkawWNQ1aB_GCvOh1rKcw@mail.gmail.com> (raw)
In-Reply-To: <20180609205628.GB38834@genre.crustytoothpaste.net>

On Sat, Jun 9, 2018 at 10:57 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Since there's been a lot of questions recently about the state of the
> NewHash work, I thought I'd send out a summary.
>
> == Status
>
> I have patches to make the entire codebase work, including passing all
> tests, when Git is converted to use a 256-bit hash algorithm.
> Obviously, such a Git is incompatible with the current version, but it
> means that we've fixed essentially all of the hard-coded 20 and 40
> constants (and therefore Git doesn't segfault).

This is so cool!

> == Future Design
>
> The work I've done necessarily involves porting everything to use
> the_hash_algo.  Essentially, when the piece I'm currently working on is
> complete, we'll have a transition stage 4 implementation (all NewHash).
> Stage 2 and 3 will be implemented next.
>
> My vision of how data is stored is that the .git directory is, except
> for pack indices and the loose object lookup table, entirely in one
> format.  It will be all SHA-1 or all NewHash.  This algorithm will be
> stored in the_hash_algo.
>
> I plan on introducing an array of hash algorithms into struct repository
> (and wrapper macros) which stores, in order, the output hash, and if
> used, the additional input hash.

I'm actually thinking that putting the_hash_algo inside struct
repository is a mistake. We have code that's supposed to work without
a repo and it shows this does not really make sense to forcefully use
a partially-valid repo. Keeping the_hash_algo a separate variable
sounds more elegant.

> If people are interested, I've done some analysis on availability of
> implementations, performance, and other attributes described in the
> transition plan and can send that to the list.

I quickly skimmed through that document. I have two more concerns that
are less about any specific hash algorithm:

- how does larger hash size affects git (I guess you covered cpu
aspect, but what about cache-friendliness, disk usage, memory
consumption)

- how does all the function redirection (from abstracting away SHA-1)
affects git performance. E.g. hashcmp could be optimized and inlined
by the compiler. Now it still probably can optimize the memcmp(,,20),
but we stack another indirect function call on top. I guess I might be
just paranoid and this is not a big deal after all.
-- 
Duy

  parent reply index

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-09 20:56 brian m. carlson
2018-06-09 21:26 ` Ævar Arnfjörð Bjarmason
2018-06-09 22:49 ` Hash algorithm analysis brian m. carlson
2018-06-11 19:29   ` Jonathan Nieder
2018-06-11 20:20     ` Linus Torvalds
2018-06-11 23:27       ` Ævar Arnfjörð Bjarmason
2018-06-12  0:11         ` David Lang
2018-06-12  0:45         ` Linus Torvalds
2018-06-11 22:35     ` brian m. carlson
2018-06-12 16:21       ` Gilles Van Assche
2018-06-13 23:58         ` brian m. carlson
2018-06-15 10:33           ` Gilles Van Assche
2018-07-20 21:52     ` brian m. carlson
2018-07-21  0:31       ` Jonathan Nieder
2018-07-21 19:52       ` Ævar Arnfjörð Bjarmason
2018-07-21 20:25         ` brian m. carlson
2018-07-21 22:38       ` Johannes Schindelin
2018-07-21 23:09         ` Linus Torvalds
2018-07-21 23:59         ` brian m. carlson
2018-07-22  9:34           ` Eric Deplagne
2018-07-22 14:21             ` brian m. carlson
2018-07-22 14:55               ` Eric Deplagne
2018-07-26 10:05                 ` Johannes Schindelin
2018-07-22 15:23           ` Joan Daemen
2018-07-22 18:54             ` Adam Langley
2018-07-26 10:31             ` Johannes Schindelin
2018-07-23 12:40           ` demerphq
2018-07-23 12:48             ` Sitaram Chamarty
2018-07-23 12:55               ` demerphq
2018-07-23 18:23               ` Linus Torvalds
2018-07-23 17:57             ` Stefan Beller
2018-07-23 18:35             ` Jonathan Nieder
2018-07-24 19:01       ` Edward Thomson
2018-07-24 20:31         ` Linus Torvalds
2018-07-24 20:49           ` Jonathan Nieder
2018-07-24 21:13           ` Junio C Hamano
2018-07-24 22:10             ` brian m. carlson
2018-07-30  9:06               ` Johannes Schindelin
2018-07-30 20:01                 ` Dan Shumow
2018-08-03  2:57                   ` Jonathan Nieder
2018-09-18 15:18                   ` Joan Daemen
2018-09-18 15:32                     ` Jonathan Nieder
2018-09-18 16:50                     ` Linus Torvalds
2018-07-25  8:30             ` [PATCH 0/2] document that NewHash is now SHA-256 Ævar Arnfjörð Bjarmason
2018-07-25  8:30             ` [PATCH 1/2] doc hash-function-transition: note the lack of a changelog Ævar Arnfjörð Bjarmason
2018-07-25  8:30             ` [PATCH 2/2] doc hash-function-transition: pick SHA-256 as NewHash Ævar Arnfjörð Bjarmason
2018-07-25 16:45               ` Junio C Hamano
2018-07-25 17:25                 ` Jonathan Nieder
2018-07-25 21:32                   ` Junio C Hamano
2018-07-26 13:41                     ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2018-08-03  7:20                       ` Jonathan Nieder
2018-08-03 16:40                         ` Junio C Hamano
2018-08-03 17:01                           ` Linus Torvalds
2018-08-03 16:42                         ` Linus Torvalds
2018-08-03 17:43                         ` Ævar Arnfjörð Bjarmason
2018-08-04  8:52                           ` Jonathan Nieder
2018-08-03 17:45                         ` brian m. carlson
2018-07-25 22:56                 ` [PATCH " brian m. carlson
2018-06-11 21:19   ` Hash algorithm analysis Ævar Arnfjörð Bjarmason
2018-06-21  8:20     ` Johannes Schindelin
2018-06-21 22:39     ` brian m. carlson
2018-06-11 18:09 ` Duy Nguyen [this message]
2018-06-12  1:28   ` State of NewHash work, future directions, and discussion brian m. carlson
2018-06-11 19:01 ` Jonathan Nieder
2018-06-12  2:28   ` brian m. carlson
2018-06-12  2:42     ` Jonathan Nieder

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8CJrFCUnVMes=3_gQKNTiyHsKkawWNQ1aB_GCvOh1rKcw@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox