git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Andrzej Hunt via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Andrzej Hunt <andrzej@ahunt.org>
Subject: [PATCH 0/3] Fix uninitialised reads found with MSAN
Date: Thu, 10 Jun 2021 16:48:29 +0000	[thread overview]
Message-ID: <pull.1033.git.git.1623343712.gitgitgadget@gmail.com> (raw)

This series fixes a small number of issues found when running git's
test-suite with MSAN (MemorySanitizer: a clang sanitizer that tries to
detect reads from uninitialised memory [2]). To summarise: I think there's
one real bug, one theoretical bug where compiler nevertheless produce
working code, and one false-positive that we can easily suppress.

Getting the test suite to run under MSAN is a bit trickier than simply
adding SANITIZERS=memory, I've detailed the reasons and the process I'm
using below. Unfortunately this series is also not sufficient to make the
whole test suite pass when building with MSAN:

 * t0005-sigchain and t7006-pager fail with an infinite loop inside MSAN's
   signal handling interceptors. I think this is a bad interaction between
   git's signal handling and MSAN's interceptors, and I suspect it's not
   indicative of a bug in git itself - but I haven't investigated in detail
   yet.
 * t3206-range-diff, t4013-diff-various, t4018-diff-funcname all fail due to
   a change in diff output. I can reproduce this issue when running with
   TSAN (but not ASAN or UBSAN), which suggests a bug or difference in
   behaviour in code shared between MSAN and TSAN - similarly, I haven't
   investigated in all that much detail yet.

(These issues were seen when running with clang-11 - the next step is to
test with clang built from main)

As to the tricky part: MSAN tries to detect reads from uninitialised memory
at runtime. However you need to ensure that all code performing
initialisation is built with the right instrumentation (i.e.
-fsanitize=memory). So you'll immediately run into issues if you link
against libraries provided by your system (with the exception of libc, as
MSAN provides some default interceptors for most of libc). In theory you
should rebuild all dependencies with -fsanitize=memory, although I
discovered that it's sufficient to recompile only zlib + link git against
that copy of zlib (which not a very tricky thing to do). Doing this will
uncover one intentional read from uninitialised memory inside zlib itself.
This can be worked around with an annotation in zlib (which I'm trying to
submit upstream at [1]) - but it's also possible to define an override list
at compile time - I've detailed this in my recipe below).

My recipe for running git tests against MSAN:

 1. Grab zlib sources from zlib.net or github.com/madler/zlib , I used zlib
    1.2.11 (which is also what most systems seem to ship).

 2. Create a sanitizers special cast list (named e.g. ignorelist.txt)
    containing "fun:slide_hash" (this is only needed as long as zlib doesn't
    contain [1]).

 3. Build zlib, installing it into SOME_PREFIX (I happened to use clang, but
    that might not be necessary): CC=clang-11 CFLAGS="-fsanitize=memory
    -fno-sanitize-recover=memory
    -fsanitize-ignorelist=YOUR_IGNORELIST_FROM_STEP_2" ./configure && make
    install prefix=$SOME_PREFIX

 4. Build git and run the tests (again, I'm using clang, but gcc might be OK
    too): make ZLIB_PATH=$SOME_PREFIX CC=clang-11 SANITIZERS=memory test

If you're actively trying to understand and fix issues, I also recommend
adding -fsanitize-memory-track-origins (which points you directly to where
the uninitialised memory comes from), see also further docs at [2].

ATB,

Andrzej

[1] https://github.com/madler/zlib/pull/561

[2] https://clang.llvm.org/docs/MemorySanitizer.html

Andrzej Hunt (3):
  bulk-checkin: make buffer reuse more obvious and safer
  split-index: use oideq instead of memcmp to compare object_id's
  builtin/checkout--worker: memset struct to avoid MSAN complaints

 builtin/checkout--worker.c | 11 +++++++++++
 bulk-checkin.c             |  3 +--
 split-index.c              |  3 ++-
 3 files changed, 14 insertions(+), 3 deletions(-)


base-commit: 62a8d224e6203d9d3d2d1d63a01cf5647ec312c9
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1033%2Fahunt%2Fmsan-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1033/ahunt/msan-v1
Pull-Request: https://github.com/git/git/pull/1033
-- 
gitgitgadget

             reply	other threads:[~2021-06-10 16:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 16:48 Andrzej Hunt via GitGitGadget [this message]
2021-06-10 16:48 ` [PATCH 1/3] bulk-checkin: make buffer reuse more obvious and safer Andrzej Hunt via GitGitGadget
2021-06-10 16:48 ` [PATCH 2/3] split-index: use oideq instead of memcmp to compare object_id's Andrzej Hunt via GitGitGadget
2021-06-10 16:48 ` [PATCH 3/3] builtin/checkout--worker: memset struct to avoid MSAN complaints Andrzej Hunt via GitGitGadget
2021-06-11  4:43   ` Chris Torek
2021-06-11  6:28     ` Junio C Hamano
2021-06-11 15:37       ` Andrzej Hunt
2021-06-14  1:04         ` Junio C Hamano
2021-06-11 17:11 ` [PATCH 0/3] Fix uninitialised reads found with MSAN Jeff King
2021-06-14 15:51 ` [PATCH v2 " Andrzej Hunt via GitGitGadget
2021-06-14 15:51   ` [PATCH v2 1/3] bulk-checkin: make buffer reuse more obvious and safer Andrzej Hunt via GitGitGadget
2021-06-14 15:51   ` [PATCH v2 2/3] split-index: use oideq instead of memcmp to compare object_id's Andrzej Hunt via GitGitGadget
2021-06-14 15:51   ` [PATCH v2 3/3] builtin/checkout--worker: zero-initialise struct to avoid MSAN complaints Andrzej Hunt via GitGitGadget
2021-06-17  9:28 ` [PATCH 0/3] Fix uninitialised reads found with MSAN Philip Oakley
2021-06-20 15:19   ` Andrzej Hunt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1033.git.git.1623343712.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=andrzej@ahunt.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).