git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 03/15] cache: add an algo member to struct object_id
Date: Sun, 11 Apr 2021 21:37:21 +0000	[thread overview]
Message-ID: <YHNsEVMs2lRmycOx@camp.crustytoothpaste.net> (raw)
In-Reply-To: <87mtu5f31u.fsf@evledraar.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3956 bytes --]

On 2021-04-11 at 11:55:57, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Apr 10 2021, brian m. carlson wrote:
> 
> > Now that we're working with multiple hash algorithms in the same repo,
> > it's best if we label each object ID with its algorithm so we can
> > determine how to format a given object ID. Add a member called algo to
> > struct object_id.
> >
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >  hash.h | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hash.h b/hash.h
> > index 3fb0c3d400..dafdcb3335 100644
> > --- a/hash.h
> > +++ b/hash.h
> > @@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
> >  
> >  struct object_id {
> >  	unsigned char hash[GIT_MAX_RAWSZ];
> > +	int algo;
> 
> Curiosity since I'm not as familiar as you with the multi-hash support
> by far:
> 
> So struct object_id is GIT_MAX_RAWSZ, not two types of structs for
> GIT_SHA1_RAWSZ and GIT_SHA256_RAWSZ. That pre-dates this series because
> we'd like to not deal with two types of objects everywhere for SHA-1 and
> SHA-256. Makes sense.
> 
> Before this series we'd memcmp them up to their actual length, but the
> last GIT_MAX_RAWSZ-GIT_SHA1_RAWSZ would be uninitialized
> 
> Now we pad them out, so the last 96 bits of every SHA1 are 0000...;
> Couldn't we also tell which hash an object is by memcmp-ing those last N
> bits and see if they're all zero'd?

That makes a lot of assumptions about the security of the hash
algorithm that I don't want to make here.  If anyone can ever find a
SHA-256 hash with trailing 96 bits zero, then they can confuse it with a
SHA-1 hash.  That means that our security level goes from 128 bits to 96
bits.  It's also a nonstandard construction.

More importantly, it results in the null OID being treated as a SHA-1
OID.  Because we do print the null OID in some cases, we're going to
break a lot of output formats if we print all the rest of the OIDs with
64 characters and then the null OID with 40.  That's to say nothing of
the problems in binary formats.

The reason we pad these objects is because our hashmaps are broken if we
don't.  I don't remember all the gory details, but it was obvious to me
that if they weren't consistently equal, things were going to be broken.
That's the only reason, not theoretical purity.

> Feels a bit hackish, and we'd need to reconsider that method if we'd
> ever support other same-length hashes.

My hope is that we don't need to do this, but we do have SHA-3 to serve
as a backup for SHA-2.  If quantum computers don't progress
substantially, SHA-3-256 is definitely a viable candidate for
replacement if anything ever happens to SHA-256.

> But OTOH having these objects all padded out in memory to the same
> length, but having to carry around a "what hash algo" is it yields the
> arguably weird hack of having a per-hash NULL_OID, which has never been
> an actual object of any hash type, but just a pseudo-object.

Unfortunately, as I mentioned above, we need to have two null OIDs to
handle printing things out.  It's inconvenient, I agree.

> I abandoned it as insany sillyness after playing with it for about a
> day, but it did reveal that much of the hash code now can assume
> internal length == formatting length, which is why I'm 3 paragraphs into
> this digression, i.e. maybe some of the code structure also makes having
> a NULL_OID always be 256-bits when we want to format it as 160/256
> painful...

We'll always format based on the algorithm in the OID.  That's the
simplest way to make things work because unfortunately we may end up
with both types of OIDs in the same code paths (as we're converting one
to the other) and otherwise our printing functions need a lot of special
handling and even more variants than they have today.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

  reply	other threads:[~2021-04-11 21:38 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
2021-04-10 15:21 ` [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm brian m. carlson
2021-04-15  8:55   ` Denton Liu
2021-04-15 23:03     ` brian m. carlson
2021-04-16 15:04   ` Ævar Arnfjörð Bjarmason
2021-04-16 18:55     ` Junio C Hamano
2021-04-10 15:21 ` [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm brian m. carlson
2021-04-11  8:52   ` Ævar Arnfjörð Bjarmason
2021-04-11 21:07     ` brian m. carlson
2021-04-16 15:21   ` Ævar Arnfjörð Bjarmason
2021-04-16 17:27   ` Ævar Arnfjörð Bjarmason
2021-04-10 15:21 ` [PATCH 03/15] cache: add an algo member to struct object_id brian m. carlson
2021-04-11 11:55   ` Ævar Arnfjörð Bjarmason
2021-04-11 21:37     ` brian m. carlson [this message]
2021-04-13 12:12   ` Derrick Stolee
2021-04-14  1:08     ` brian m. carlson
2021-04-15  8:47       ` Ævar Arnfjörð Bjarmason
2021-04-15 23:51         ` brian m. carlson
2021-04-10 15:21 ` [PATCH 04/15] Always use oidread to read into " brian m. carlson
2021-04-10 15:21 ` [PATCH 05/15] hash: add a function to finalize object IDs brian m. carlson
2021-04-10 15:21 ` [PATCH 06/15] Use the final_oid_fn to finalize hashing of " brian m. carlson
2021-04-10 15:21 ` [PATCH 07/15] builtin/pack-redundant: avoid casting buffers to struct object_id brian m. carlson
2021-04-10 15:21 ` [PATCH 08/15] cache: compare the entire buffer for " brian m. carlson
2021-04-11  8:17   ` Chris Torek
2021-04-11 11:36   ` Ævar Arnfjörð Bjarmason
2021-04-11 21:05     ` brian m. carlson
2021-04-10 15:21 ` [PATCH 09/15] hash: set and copy algo field in " brian m. carlson
2021-04-11 11:57   ` Ævar Arnfjörð Bjarmason
2021-04-11 21:48     ` brian m. carlson
2021-04-11 22:12       ` Ævar Arnfjörð Bjarmason
2021-04-11 23:52         ` brian m. carlson
2021-04-12 11:02           ` [PATCH 0/2] C99: harder dependency on variadic macros Ævar Arnfjörð Bjarmason
2021-04-12 11:02             ` [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code Ævar Arnfjörð Bjarmason
2021-04-13  7:57               ` Jeff King
2021-04-13 21:07                 ` Junio C Hamano
2021-04-14  5:21                   ` Jeff King
2021-04-14  6:12                     ` Ævar Arnfjörð Bjarmason
2021-04-14  7:31                       ` Jeff King
2021-05-21  2:06               ` Jonathan Nieder
2021-04-12 11:02             ` [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2021-04-12 17:58               ` Junio C Hamano
2021-04-13  8:00                 ` Jeff King
2021-05-21  2:50               ` Jonathan Nieder
2021-04-12 12:14             ` [PATCH 0/2] C99: harder dependency on variadic macros Bagas Sanjaya
2021-04-12 12:41               ` Ævar Arnfjörð Bjarmason
2021-04-12 22:57                 ` brian m. carlson
2021-04-12 23:19                   ` Junio C Hamano
2022-01-28 11:11             ` [PATCH v2 0/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2022-01-28 11:11               ` [PATCH v2 1/2] git-compat-util.h: clarify GCC v.s. C99-specific in comment Ævar Arnfjörð Bjarmason
2022-01-28 11:11               ` [PATCH v2 2/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2022-01-28 22:40                 ` Junio C Hamano
2022-02-19 10:41               ` [PATCH v3 0/3] C99: remove dead " Ævar Arnfjörð Bjarmason
2022-02-19 10:41                 ` [PATCH v3 1/3] git-compat-util.h: clarify GCC v.s. C99-specific in comment Ævar Arnfjörð Bjarmason
2022-02-19 10:41                 ` [PATCH v3 2/3] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2022-02-19 10:41                 ` [PATCH v3 3/3] trace.h: remove never-used TRACE_CONTEXT Ævar Arnfjörð Bjarmason
2022-02-20 12:02                   ` Junio C Hamano
2022-02-20 12:38                     ` Ævar Arnfjörð Bjarmason
2022-02-20 20:12                       ` Junio C Hamano
2022-02-21 16:05                 ` [PATCH v4 0/2] C99: remove dead !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2022-02-21 16:05                   ` [PATCH v4 1/2] git-compat-util.h: clarify GCC v.s. C99-specific in comment Ævar Arnfjörð Bjarmason
2022-02-21 16:05                   ` [PATCH v4 2/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2021-04-12 10:53         ` [PATCH 09/15] hash: set and copy algo field in struct object_id Junio C Hamano
2021-04-12 11:13           ` Ævar Arnfjörð Bjarmason
2021-04-10 15:21 ` [PATCH 10/15] hash: provide per-algorithm null OIDs brian m. carlson
2021-04-11 14:03   ` Junio C Hamano
2021-04-11 21:51     ` brian m. carlson
2021-04-10 15:21 ` [PATCH 11/15] builtin/show-index: set the algorithm for object IDs brian m. carlson
2021-04-10 15:21 ` [PATCH 12/15] commit-graph: don't store file hashes as struct object_id brian m. carlson
2021-04-10 15:21 ` [PATCH 13/15] builtin/pack-objects: avoid using struct object_id for pack hash brian m. carlson
2021-04-10 15:21 ` [PATCH 14/15] hex: default to the_hash_algo on zero algorithm value brian m. carlson
2021-04-10 15:21 ` [PATCH 15/15] hex: print objects using the hash algorithm member brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHNsEVMs2lRmycOx@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).