git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Andrzej Hunt <andrzej@ahunt.org>
To: Eric Wong <e@80x24.org>, git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>, "René Scharfe" <l.s.r@web.de>
Subject: Re: [PATCH v2 5/5] oidtree: a crit-bit tree for odb_loose_cache
Date: Fri, 6 Aug 2021 17:31:19 +0200	[thread overview]
Message-ID: <3cbec773-cd99-cf9f-a713-45ef8e6746c3@ahunt.org> (raw)
In-Reply-To: <20210629205305.7100-6-e@80x24.org>



On 29/06/2021 22:53, Eric Wong wrote:
> [...snip...]
> diff --git a/oidtree.c b/oidtree.c
> new file mode 100644
> index 0000000000..c1188d8f48
> --- /dev/null
> +++ b/oidtree.c
> @@ -0,0 +1,94 @@
> +/*
> + * A wrapper around cbtree which stores oids
> + * May be used to replace oid-array for prefix (abbreviation) matches
> + */
> +#include "oidtree.h"
> +#include "alloc.h"
> +#include "hash.h"
> +
> +struct oidtree_node {
> +	/* n.k[] is used to store "struct object_id" */
> +	struct cb_node n;
> +};
> +
> [... snip ...]
> +
> +void oidtree_insert(struct oidtree *ot, const struct object_id *oid)
> +{
> +	struct oidtree_node *on;
> +
> +	if (!ot->mempool)
> +		ot->mempool = allocate_alloc_state();
> +	if (!oid->algo)
> +		BUG("oidtree_insert requires oid->algo");
> +
> +	on = alloc_from_state(ot->mempool, sizeof(*on) + sizeof(*oid));
> +	oidcpy_with_padding((struct object_id *)on->n.k, oid);

I think this object_id cast introduced undefined behaviour - here's my 
layperson's interepretation of what's going on (full UBSAN output is 
pasted below):

cb_node.k is a uint8_t[], and hence can be 1-byte aligned (on my 
machine: offsetof(struct cb_node, k) == 21). We're casting its pointer 
to "struct object_id *", and later try to access object_id.hash within 
oidcpy_with_padding. My compiler assumes that an object_id pointer needs 
to be 4-byte aligned, and reading from a misaligned pointer means we hit 
undefined behaviour. (I think the 4-byte alignment requirement comes 
from the fact that object_id's largest member is an int?)

I'm not sure what an elegant and idiomatic fix might be - IIUC it's hard 
to guarantee misaligned access can't happen with a flex array that's 
being used for arbitrary data (you would presumably have to declare it 
as an array of whatever the largest supported type is, so that you can 
guarantee correct alignment even when cbtree is used with that type) - 
which might imply that k needs to be declared as a void pointer? That in 
turn would make cbtree.c harder to read.

Anyhow, here's the UBSAN output from t0000 running against next:

hash.h:277:14: runtime error: member access within misaligned address 
0x7fcb31c4103d for type 'struct object_id', which requires 4 byte alignment
0x7fcb31c4103d: note: pointer points here
  5a 5a 5a 5a 5a 5a 5a  5a 5a 5a 5a 5a 5a 5a 5a  5a 5a 5a 5a 5a 5a 5a 5a 
  5a 5a 5a 5a 5a 5a 5a 5a  5a
              ^
     #0 0xc76d9d in oidcpy_with_padding hash.h:277:14
     #1 0xc768f5 in oidtree_insert oidtree.c:44:2
     #2 0xc418e3 in append_loose_object object-file.c:2398:2
     #3 0xc3fbdc in for_each_file_in_obj_subdir object-file.c:2316:9
     #4 0xc41785 in odb_loose_cache object-file.c:2424:2
     #5 0xc50336 in find_short_object_filename object-name.c:103:16
     #6 0xc50e04 in repo_find_unique_abbrev_r object-name.c:712:2
     #7 0xc519a9 in repo_find_unique_abbrev object-name.c:727:2
     #8 0x9b6ce2 in diff_abbrev_oid diff.c:4208:10
     #9 0x9f13d0 in fill_metainfo diff.c:4286:8
     #10 0x9f02d6 in run_diff_cmd diff.c:4322:3
     #11 0x9efbef in run_diff diff.c:4422:3
     #12 0x9c9ac9 in diff_flush_patch diff.c:5765:2
     #13 0x9c9e74 in diff_flush_patch_all_file_pairs diff.c:6246:4
     #14 0x9be33e in diff_flush diff.c:6387:3
     #15 0xb8864e in log_tree_diff_flush log-tree.c:895:2
     #16 0xb8987b in log_tree_diff log-tree.c:933:4
     #17 0xb88c9a in log_tree_commit log-tree.c:988:10
     #18 0x5b4257 in cmd_log_walk log.c:426:8
     #19 0x5b6224 in cmd_show log.c:698:10
     #20 0x42ec48 in run_builtin git.c:461:11
     #21 0x4295e0 in handle_builtin git.c:714:3
     #22 0x42d043 in run_argv git.c:781:4
     #23 0x428cc2 in cmd_main git.c:912:19
     #24 0x7791ce in main common-main.c:52:11
     #25 0x7fcb30aab349 in __libc_start_main (/lib64/libc.so.6+0x24349)
     #26 0x4074a9 in _start start.S:120

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.h:277:14 in


> +
> +	/*
> +	 * n.b. we shouldn't get duplicates, here, but we'll have
> +	 * a small leak that won't be freed until oidtree_destroy
> +	 */
> +	cb_insert(&ot->t, &on->n, sizeof(*oid));
> +}
> +


ATB,

Andrzej

  parent reply	other threads:[~2021-08-06 15:31 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24  0:58 [PATCH] speed up alt_odb_usable() with many alternates Eric Wong
2021-06-27  2:47 ` [PATCH 0/5] optimizations for many odb alternates Eric Wong
2021-06-27  2:47   ` [PATCH 1/5] speed up alt_odb_usable() with many alternates Eric Wong
2021-06-27  2:47   ` [PATCH 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-06-27  2:47   ` [PATCH 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-06-27 10:23     ` René Scharfe
2021-06-28 23:09       ` Eric Wong
2021-06-27  2:47   ` [PATCH 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-06-27  2:47   ` [PATCH 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-06-29 14:42     ` Junio C Hamano
2021-06-29 20:17       ` Eric Wong
2021-06-29 20:53   ` [PATCH v2 0/5] optimizations for many alternates Eric Wong
2021-07-07 23:10     ` [PATCH v3 " Eric Wong
2021-07-07 23:10     ` [PATCH v3 1/5] speed up alt_odb_usable() with " Eric Wong
2021-07-08  0:20       ` Junio C Hamano
2021-07-08  1:14         ` Eric Wong
2021-07-08  4:30           ` Junio C Hamano
2021-07-07 23:10     ` [PATCH v3 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-07-08  4:57       ` Junio C Hamano
2021-07-07 23:10     ` [PATCH v3 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-07-07 23:10     ` [PATCH v3 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-07-07 23:10     ` [PATCH v3 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-06-29 20:53   ` [PATCH v2 1/5] speed up alt_odb_usable() with many alternates Eric Wong
2021-07-03 10:05     ` René Scharfe
2021-07-04  9:02       ` René Scharfe
2021-07-06 23:01       ` Eric Wong
2021-06-29 20:53   ` [PATCH v2 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-06-29 20:53   ` [PATCH v2 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-06-29 20:53   ` [PATCH v2 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-06-29 20:53   ` [PATCH v2 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-07-04  9:02     ` René Scharfe
2021-07-06 23:21       ` Eric Wong
2021-07-04  9:32     ` Ævar Arnfjörð Bjarmason
2021-07-07 23:12       ` Eric Wong
2021-08-06 15:31     ` Andrzej Hunt [this message]
2021-08-06 17:53       ` René Scharfe
2021-08-07 22:49         ` Eric Wong
2021-08-09  1:35           ` Carlo Arenas
2021-08-09  1:38             ` [PATCH/RFC 0/3] pedantic errors in next Carlo Marcelo Arenas Belón
2021-08-09  1:38               ` [PATCH/RFC 1/3] oidtree: avoid nested struct oidtree_node Carlo Marcelo Arenas Belón
2021-08-09  1:38               ` [PATCH/RFC 2/3] object-store: avoid extra ';' from KHASH_INIT Carlo Marcelo Arenas Belón
2021-08-09 15:53                 ` Junio C Hamano
2021-08-09  1:38               ` [PATCH/RFC 3/3] ci: run a pedantic build as part of the GitHub workflow Carlo Marcelo Arenas Belón
2021-08-09 10:50                 ` Bagas Sanjaya
2021-08-09 22:03                   ` Carlo Arenas
2021-08-09 14:56                 ` Phillip Wood
2021-08-09 22:48                   ` Carlo Arenas
2021-08-10 15:24                     ` Phillip Wood
2021-08-10 18:25                       ` Junio C Hamano
2021-08-30 11:36                   ` Ævar Arnfjörð Bjarmason
2021-08-31 20:28                     ` Carlo Arenas
2021-08-31 20:51                       ` Ævar Arnfjörð Bjarmason
2021-08-31 23:54                         ` Carlo Arenas
2021-09-01  1:52                           ` Jeff King
2021-09-01 17:55                             ` Junio C Hamano
2021-08-30 11:40                 ` Ævar Arnfjörð Bjarmason
2021-09-01  9:19                 ` [RFC PATCH v2 0/4] developer: support pedantic Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 1/4] developer: retire USE_PARENS_AROUND_GETTEXT_N support Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 2/4] developer: enable pedantic by default Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 3/4] developer: add an alternative script for detecting broken N_() Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 4/4] developer: move detect-compiler out of the main directory Carlo Marcelo Arenas Belón
2021-09-01 10:10                   ` [RFC PATCH v2 0/4] developer: support pedantic Jeff King
2021-09-01 11:25                     ` [PATCH] gettext: remove optional non-standard parens in N_() definition Ævar Arnfjörð Bjarmason
2021-09-01 17:31                       ` Eric Sunshine
2021-09-02  9:13                       ` Jeff King
2021-09-02 19:19                       ` Junio C Hamano
2021-09-01 11:27                     ` [RFC PATCH v2 0/4] developer: support pedantic Ævar Arnfjörð Bjarmason
2021-09-01 18:03                       ` Carlo Arenas
2021-09-03 17:02                   ` [PATCH v3 0/3] support pedantic in developer mode Carlo Marcelo Arenas Belón
2021-09-03 17:02                     ` [PATCH v3 1/3] gettext: remove optional non-standard parens in N_() definition Carlo Marcelo Arenas Belón
2021-09-10 15:39                       ` Ævar Arnfjörð Bjarmason
2021-09-03 17:02                     ` [PATCH v3 2/3] win32: allow building with pedantic mode enabled Carlo Marcelo Arenas Belón
2021-09-03 18:47                       ` René Scharfe
2021-09-03 20:13                         ` Carlo Marcelo Arenas Belón
2021-09-03 20:32                           ` Junio C Hamano
2021-09-03 20:38                           ` René Scharfe
2021-09-04  9:37                             ` René Scharfe
2021-09-04 14:42                               ` Carlo Arenas
2021-09-27 23:04                       ` Jonathan Tan
2021-09-28  0:30                         ` Carlo Arenas
2021-09-28 16:50                           ` Jonathan Tan
2021-09-28 17:37                           ` Junio C Hamano
2021-09-28 20:16                             ` Jonathan Tan
2021-09-29  1:00                               ` Carlo Arenas
2021-09-29 15:55                                 ` Junio C Hamano
2021-09-03 17:02                     ` [PATCH v3 3/3] developer: enable pedantic by default Carlo Marcelo Arenas Belón
2021-09-05  7:54                     ` [PATCH v3 0/3] support pedantic in developer mode Ævar Arnfjörð Bjarmason
2021-08-09 16:44               ` [PATCH/RFC 0/3] pedantic errors in next Junio C Hamano
2021-08-09 20:10                 ` Eric Wong
2021-08-10  6:16                 ` Carlo Marcelo Arenas Belón
2021-08-10 19:30                   ` René Scharfe
2021-08-10 23:49                     ` Carlo Arenas
2021-08-11  0:57                       ` Carlo Arenas
2021-08-11 14:57                       ` René Scharfe
2021-08-11 17:20                         ` Junio C Hamano
2021-08-10 18:59             ` [PATCH v2 5/5] oidtree: a crit-bit tree for odb_loose_cache René Scharfe
2021-08-10 19:40           ` René Scharfe
2021-08-14 20:00       ` [PATCH] oidtree: avoid unaligned access to crit-bit tree René Scharfe
2021-08-16 19:11         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3cbec773-cd99-cf9f-a713-45ef8e6746c3@ahunt.org \
    --to=andrzej@ahunt.org \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).