From: Chris Torek <chris.torek@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Git List <git@vger.kernel.org>, Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 08/15] cache: compare the entire buffer for struct object_id
Date: Sun, 11 Apr 2021 01:17:10 -0700 [thread overview]
Message-ID: <CAPx1GvfiG9AzYJi2PqhEW2d1n1ghn4Jnna4QR0cJSgNXvaEXQg@mail.gmail.com> (raw)
In-Reply-To: <20210410152140.3525040-9-sandals@crustytoothpaste.net>
Just an observation here: comparing 256 bytes every time
would seem to have one nice bonus side effect and one
potentially bad, but vanishingly unlikely, side effect: a 160
byte null hash will now compare equal to a 256 byte null
hash (good), but a 160 byte hash extended to 256 bytes
will compare equal to a 256 byte hash that just happens to
end in 96 bytes of zero (bad, but I would guess, will never
actually happen).
Chris
On Sat, Apr 10, 2021 at 8:23 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Currently, when we compare two object IDs, we have to take a branch to
> determine what the hash size is supposed to be. The compiler can
> optimize well for a single length, but has trouble when there are two
> possible lengths.
>
> There is, however, an alternative: we can ensure that we always compare
> the full length of the hash buffer, but in turn we must zero the
> remainder of the buffer when using SHA-1; otherwise, we'll end up with
> incompatible junk at the end of otherwise equivalent object IDs that
> will prevent them from matching. This is an acceptable tradeoff,
> because we generally read an object ID in once, but then compare it
> against others multiple times.
>
> This latter approach also has some benefits as well: since we will have
> annotated every location in which we load an object ID into an instance
> of struct object_id, if we want to set the hash algorithm for the object
> ID, we can do so at the same time.
>
> Adopt this latter approach, since it provides us greater flexibility and
> lets us read and store object IDs for multiple algorithms at once.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> hash.h | 13 ++++++++++---
> hex.c | 9 ++++++---
> notes.c | 3 +++
> object-file.c | 1 +
> 4 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/hash.h b/hash.h
> index c8f03d8aee..04eba5c56b 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -205,7 +205,7 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>
> static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
> {
> - return hashcmp(oid1->hash, oid2->hash);
> + return memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
> }
>
> static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
> @@ -221,7 +221,7 @@ static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
>
> static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)
> {
> - return hasheq(oid1->hash, oid2->hash);
> + return !memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
> }
>
> static inline int is_null_oid(const struct object_id *oid)
> @@ -258,7 +258,9 @@ static inline void oidclr(struct object_id *oid)
>
> static inline void oidread(struct object_id *oid, const unsigned char *hash)
> {
> - memcpy(oid->hash, hash, the_hash_algo->rawsz);
> + size_t rawsz = the_hash_algo->rawsz;
> + memcpy(oid->hash, hash, rawsz);
> + memset(oid->hash + rawsz, 0, GIT_MAX_RAWSZ - rawsz);
> }
>
> static inline int is_empty_blob_sha1(const unsigned char *sha1)
> @@ -281,6 +283,11 @@ static inline int is_empty_tree_oid(const struct object_id *oid)
> return oideq(oid, the_hash_algo->empty_tree);
> }
>
> +static inline void oid_pad_buffer(struct object_id *oid, const struct git_hash_algo *algop)
> +{
> + memset(oid->hash + algop->rawsz, 0, GIT_MAX_RAWSZ - algop->rawsz);
> +}
> +
> const char *empty_tree_oid_hex(void);
> const char *empty_blob_oid_hex(void);
>
> diff --git a/hex.c b/hex.c
> index da51e64929..5fa3e71cb9 100644
> --- a/hex.c
> +++ b/hex.c
> @@ -69,7 +69,10 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
> int get_oid_hex_algop(const char *hex, struct object_id *oid,
> const struct git_hash_algo *algop)
> {
> - return get_hash_hex_algop(hex, oid->hash, algop);
> + int ret = get_hash_hex_algop(hex, oid->hash, algop);
> + if (!ret)
> + oid_pad_buffer(oid, algop);
> + return ret;
> }
>
> /*
> @@ -80,7 +83,7 @@ int get_oid_hex_any(const char *hex, struct object_id *oid)
> {
> int i;
> for (i = GIT_HASH_NALGOS - 1; i > 0; i--) {
> - if (!get_hash_hex_algop(hex, oid->hash, &hash_algos[i]))
> + if (!get_oid_hex_algop(hex, oid, &hash_algos[i]))
> return i;
> }
> return GIT_HASH_UNKNOWN;
> @@ -95,7 +98,7 @@ int parse_oid_hex_algop(const char *hex, struct object_id *oid,
> const char **end,
> const struct git_hash_algo *algop)
> {
> - int ret = get_hash_hex_algop(hex, oid->hash, algop);
> + int ret = get_oid_hex_algop(hex, oid, algop);
> if (!ret)
> *end = hex + algop->hexsz;
> return ret;
> diff --git a/notes.c b/notes.c
> index a44b25858f..1dfe9e2b9f 100644
> --- a/notes.c
> +++ b/notes.c
> @@ -455,6 +455,8 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
> CALLOC_ARRAY(l, 1);
> oidcpy(&l->key_oid, &object_oid);
> oidcpy(&l->val_oid, &entry.oid);
> + oid_pad_buffer(&l->key_oid, the_hash_algo);
> + oid_pad_buffer(&l->val_oid, the_hash_algo);
> if (note_tree_insert(t, node, n, l, type,
> combine_notes_concatenate))
> die("Failed to load %s %s into notes tree "
> @@ -484,6 +486,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
> strbuf_addch(&non_note_path, '/');
> }
> strbuf_addstr(&non_note_path, entry.path);
> + oid_pad_buffer(&entry.oid, the_hash_algo);
> add_non_note(t, strbuf_detach(&non_note_path, NULL),
> entry.mode, entry.oid.hash);
> }
> diff --git a/object-file.c b/object-file.c
> index 3f43c376e7..8e338247cc 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2352,6 +2352,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
> if (namelen == the_hash_algo->hexsz - 2 &&
> !hex_to_bytes(oid.hash + 1, de->d_name,
> the_hash_algo->rawsz - 1)) {
> + oid_pad_buffer(&oid, the_hash_algo);
> if (obj_cb) {
> r = obj_cb(&oid, path->buf, data);
> if (r)
next prev parent reply other threads:[~2021-04-11 8:17 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
2021-04-10 15:21 ` [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm brian m. carlson
2021-04-15 8:55 ` Denton Liu
2021-04-15 23:03 ` brian m. carlson
2021-04-16 15:04 ` Ævar Arnfjörð Bjarmason
2021-04-16 18:55 ` Junio C Hamano
2021-04-10 15:21 ` [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm brian m. carlson
2021-04-11 8:52 ` Ævar Arnfjörð Bjarmason
2021-04-11 21:07 ` brian m. carlson
2021-04-16 15:21 ` Ævar Arnfjörð Bjarmason
2021-04-16 17:27 ` Ævar Arnfjörð Bjarmason
2021-04-10 15:21 ` [PATCH 03/15] cache: add an algo member to struct object_id brian m. carlson
2021-04-11 11:55 ` Ævar Arnfjörð Bjarmason
2021-04-11 21:37 ` brian m. carlson
2021-04-13 12:12 ` Derrick Stolee
2021-04-14 1:08 ` brian m. carlson
2021-04-15 8:47 ` Ævar Arnfjörð Bjarmason
2021-04-15 23:51 ` brian m. carlson
2021-04-10 15:21 ` [PATCH 04/15] Always use oidread to read into " brian m. carlson
2021-04-10 15:21 ` [PATCH 05/15] hash: add a function to finalize object IDs brian m. carlson
2021-04-10 15:21 ` [PATCH 06/15] Use the final_oid_fn to finalize hashing of " brian m. carlson
2021-04-10 15:21 ` [PATCH 07/15] builtin/pack-redundant: avoid casting buffers to struct object_id brian m. carlson
2021-04-10 15:21 ` [PATCH 08/15] cache: compare the entire buffer for " brian m. carlson
2021-04-11 8:17 ` Chris Torek [this message]
2021-04-11 11:36 ` Ævar Arnfjörð Bjarmason
2021-04-11 21:05 ` brian m. carlson
2021-04-10 15:21 ` [PATCH 09/15] hash: set and copy algo field in " brian m. carlson
2021-04-11 11:57 ` Ævar Arnfjörð Bjarmason
2021-04-11 21:48 ` brian m. carlson
2021-04-11 22:12 ` Ævar Arnfjörð Bjarmason
2021-04-11 23:52 ` brian m. carlson
2021-04-12 11:02 ` [PATCH 0/2] C99: harder dependency on variadic macros Ævar Arnfjörð Bjarmason
2021-04-12 11:02 ` [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code Ævar Arnfjörð Bjarmason
2021-04-13 7:57 ` Jeff King
2021-04-13 21:07 ` Junio C Hamano
2021-04-14 5:21 ` Jeff King
2021-04-14 6:12 ` Ævar Arnfjörð Bjarmason
2021-04-14 7:31 ` Jeff King
2021-05-21 2:06 ` Jonathan Nieder
2021-04-12 11:02 ` [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2021-04-12 17:58 ` Junio C Hamano
2021-04-13 8:00 ` Jeff King
2021-05-21 2:50 ` Jonathan Nieder
2021-04-12 12:14 ` [PATCH 0/2] C99: harder dependency on variadic macros Bagas Sanjaya
2021-04-12 12:41 ` Ævar Arnfjörð Bjarmason
2021-04-12 22:57 ` brian m. carlson
2021-04-12 23:19 ` Junio C Hamano
2022-01-28 11:11 ` [PATCH v2 0/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2022-01-28 11:11 ` [PATCH v2 1/2] git-compat-util.h: clarify GCC v.s. C99-specific in comment Ævar Arnfjörð Bjarmason
2022-01-28 11:11 ` [PATCH v2 2/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2022-01-28 22:40 ` Junio C Hamano
2022-02-19 10:41 ` [PATCH v3 0/3] C99: remove dead " Ævar Arnfjörð Bjarmason
2022-02-19 10:41 ` [PATCH v3 1/3] git-compat-util.h: clarify GCC v.s. C99-specific in comment Ævar Arnfjörð Bjarmason
2022-02-19 10:41 ` [PATCH v3 2/3] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2022-02-19 10:41 ` [PATCH v3 3/3] trace.h: remove never-used TRACE_CONTEXT Ævar Arnfjörð Bjarmason
2022-02-20 12:02 ` Junio C Hamano
2022-02-20 12:38 ` Ævar Arnfjörð Bjarmason
2022-02-20 20:12 ` Junio C Hamano
2022-02-21 16:05 ` [PATCH v4 0/2] C99: remove dead !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2022-02-21 16:05 ` [PATCH v4 1/2] git-compat-util.h: clarify GCC v.s. C99-specific in comment Ævar Arnfjörð Bjarmason
2022-02-21 16:05 ` [PATCH v4 2/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2021-04-12 10:53 ` [PATCH 09/15] hash: set and copy algo field in struct object_id Junio C Hamano
2021-04-12 11:13 ` Ævar Arnfjörð Bjarmason
2021-04-10 15:21 ` [PATCH 10/15] hash: provide per-algorithm null OIDs brian m. carlson
2021-04-11 14:03 ` Junio C Hamano
2021-04-11 21:51 ` brian m. carlson
2021-04-10 15:21 ` [PATCH 11/15] builtin/show-index: set the algorithm for object IDs brian m. carlson
2021-04-10 15:21 ` [PATCH 12/15] commit-graph: don't store file hashes as struct object_id brian m. carlson
2021-04-10 15:21 ` [PATCH 13/15] builtin/pack-objects: avoid using struct object_id for pack hash brian m. carlson
2021-04-10 15:21 ` [PATCH 14/15] hex: default to the_hash_algo on zero algorithm value brian m. carlson
2021-04-10 15:21 ` [PATCH 15/15] hex: print objects using the hash algorithm member brian m. carlson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPx1GvfiG9AzYJi2PqhEW2d1n1ghn4Jnna4QR0cJSgNXvaEXQg@mail.gmail.com \
--to=chris.torek@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).