[PATCH 0/2] Generate temporary files using a CSPRNG

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/2] Generate temporary files using a CSPRNG
@ 2021-11-16  3:35 brian m. carlson
  2021-11-16  3:35 ` [PATCH 1/2] wrapper: add a helper to generate numbers from " brian m. carlson
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: brian m. carlson @ 2021-11-16  3:35 UTC (permalink / raw)
  To: git

Currently, when we generate a temporary file name, we use the seconds,
microseconds, and the PID to generate a unique value.  The resulting
value, while changing frequently, is actually predictable and on some
systems, it may be possible to cause a DoS by creating all potential
temporary files when the temporary file is being created in TMPDIR.

The solution to this is to use the system CSPRNG to generate the
temporary file name.  This is the approach taken by FreeBSD, NetBSD, and
OpenBSD, and glibc also recently switched to this approach from an
approach that resembled ours in many ways.

Even if this is not practically exploitable on many systems, it seems
prudent to be at least as careful about temporary file generation as
libc is.

This issue was mentioned on the security list and it was decided that
this was not sensitive enough to warrant a coordinated disclosure, a
sentiment with which I agree.  This is difficult to exploit on most
systems, but I think it's still worth fixing.

This series introduces two commits.  The first implements a generic
function which calls the system CSPRNG.  A reasonably exhaustive attempt
is made to pick from the options with a preference for performance.  The
second changes our temporary file code to use the CSPRNG.

I have added a test helper that can emit bytes from the CSPRNG, as well
as a self-test mode.  The former is not used, but I anticipated it could
find utility in the testsuite, and it was useful for testing by hand, so
I included it.

The careful reader will notice that the sole additional test is added to
t0000.  That's because temporary file generation is fundamental to how
Git operates and if it fails, the entire testsuite is broken.  Thus, a
simple test to verify that it's working seems prudent as part of t0000.
I was also unable to find a better place to put it, but am open to
suggestions if folks have ideas.

This passes our CI, including on Windows, and I have manually verified
the correctness of the other four branches on Linux (the HAVE_ARC4RANDOM
branch requiring a small patch which is not necessary on systems which
have it in libc and which is therefore not included here).

I am of course interested in hearing from anyone who lacks one of the
CSPRNG interfaces we have here.  Looking at the Go standard library,
/dev/urandom should be available on at least AIX, Darwin (macOS),
DragonflyBSD, FreeBSD, Linux, NetBSD, OpenBSD, and Solaris, and I
believe it is available on most other Unix systems as well.
RtlGenRandom is available on Windows back to XP, which we no longer
support.  The bizarre header contortion on Windows comes from Mozilla,
but is widely used in other codebases with no substantial changes.

For those who are interested, I computed the probability of spurious
failure for the self-test mode like so:

  256 * (255/256)^65536

This Ruby one-liner estimates the probability at approximately 10^-108:

  ruby -e 'a = 255 ** 65536; b = 256 ** 65536; puts b.to_s.length - a.to_s.length - 3'

If I have made an error in the calculation, please do feel free to point
it out.

brian m. carlson (2):
  wrapper: add a helper to generate numbers from a CSPRNG
  wrapper: use a CSPRNG to generate random file names

 Makefile                            | 25 ++++++++++
 compat/winansi.c                    |  6 +++
 config.mak.uname                    |  9 ++++
 contrib/buildsystems/CMakeLists.txt |  2 +-
 git-compat-util.h                   | 16 +++++++
 t/helper/test-csprng.c              | 63 +++++++++++++++++++++++++
 t/helper/test-tool.c                |  1 +
 t/helper/test-tool.h                |  1 +
 t/t0000-basic.sh                    |  4 ++
 wrapper.c                           | 71 ++++++++++++++++++++++++-----
 10 files changed, 186 insertions(+), 12 deletions(-)
 create mode 100644 t/helper/test-csprng.c

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16  3:35 [PATCH 0/2] Generate temporary files using a CSPRNG brian m. carlson
@ 2021-11-16  3:35 ` brian m. carlson
  2021-11-16 15:31   ` Jeff King
  2021-11-17  7:39   ` Junio C Hamano
  2021-11-16  3:35 ` [PATCH 2/2] wrapper: use a CSPRNG to generate random file names brian m. carlson
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 37+ messages in thread
From: brian m. carlson @ 2021-11-16  3:35 UTC (permalink / raw)
  To: git

There are many situations in which having access to a cryptographically
secure pseudorandom number generator (CSPRNG) is helpful.  In the
future, we'll encounter one of these when dealing with temporary files.
To make this possible, let's add a function which reads from a system
CSPRNG and returns some bytes.

Because this is a security sensitive interface, we take some
precautions.  We either succeed by filling the buffer completely as we
requested, or we fail.  We don't return partial data because the caller
will almost never find that to be a useful behavior.

The order of options is also important here.  On systems with
arc4random, which is most of the BSDs, we use that, since, except on
MirBSD, it uses ChaCha20, which is extremely fast, and sits entirely in
userspace, avoiding a system call.  We then prefer getrandom over
getentropy, because the former has been available longer on Linux, and
finally, if none of those are available, we use /dev/urandom, because
most Unix-like operating systems provide that API.  We prefer options
that don't involve device files when possible because those work in some
restricted environments where device files may not be available.

macOS appears to have arc4random but not the arc4random_buf function we
want to use, so we let it use the fallback of /dev/urandom.  Set the
configuration variables appropriately for Linux and the other BSDs.  We
specifically only consider versions which receive publicly available
security support; for example, getrandom(2) and getentropy(3) are only
available in FreeBSD 12, which is the oldest version with current
security support.  For the same reason, we don't specify getrandom(2) on
Linux, because CentOS 7 doesn't support it in glibc (although its kernel
does) and we don't want to resort to making syscalls.

Finally, add a self-test option here to make sure that our buffer
handling is correct and we aren't truncating data.  We simply read 64
KiB and then make sure we've seen each byte.  The probability of this
test failing spuriously is less than 10^-100.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                            | 25 ++++++++++++
 compat/winansi.c                    |  6 +++
 config.mak.uname                    |  9 +++++
 contrib/buildsystems/CMakeLists.txt |  2 +-
 git-compat-util.h                   | 16 ++++++++
 t/helper/test-csprng.c              | 63 +++++++++++++++++++++++++++++
 t/helper/test-tool.c                |  1 +
 t/helper/test-tool.h                |  1 +
 t/t0000-basic.sh                    |  4 ++
 wrapper.c                           | 56 +++++++++++++++++++++++++
 10 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-csprng.c

diff --git a/Makefile b/Makefile
index 12be39ac49..1d17021f59 100644
--- a/Makefile
+++ b/Makefile
@@ -234,6 +234,14 @@ all::
 # Define NO_TRUSTABLE_FILEMODE if your filesystem may claim to support
 # the executable mode bit, but doesn't really do so.
 #
+# Define HAVE_ARC4RANDOM if your system has arc4random and arc4random_buf.
+#
+# Define HAVE_GETRANDOM if your system has getrandom.
+#
+# Define HAVE_GETENTROPY if your system has getentropy.
+#
+# Define HAVE_RTLGENRANDOM if your system has RtlGenRandom (Windows only).
+#
 # Define NEEDS_MODE_TRANSLATION if your OS strays from the typical file type
 # bits in mode values (e.g. z/OS defines I_SFMT to 0xFF000000 as opposed to the
 # usual 0xF000).
@@ -694,6 +702,7 @@ TEST_BUILTINS_OBJS += test-bloom.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
+TEST_BUILTINS_OBJS += test-csprng.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
@@ -1900,6 +1909,22 @@ ifdef HAVE_GETDELIM
 	BASIC_CFLAGS += -DHAVE_GETDELIM
 endif
 
+ifdef HAVE_ARC4RANDOM
+	BASIC_CFLAGS += -DHAVE_ARC4RANDOM
+endif
+
+ifdef HAVE_GETRANDOM
+	BASIC_CFLAGS += -DHAVE_GETRANDOM
+endif
+
+ifdef HAVE_GETENTROPY
+	BASIC_CFLAGS += -DHAVE_GETENTROPY
+endif
+
+ifdef HAVE_RTLGENRANDOM
+	BASIC_CFLAGS += -DHAVE_RTLGENRANDOM
+endif
+
 ifneq ($(PROCFS_EXECUTABLE_PATH),)
 	procfs_executable_path_SQ = $(subst ','\'',$(PROCFS_EXECUTABLE_PATH))
 	BASIC_CFLAGS += '-DPROCFS_EXECUTABLE_PATH="$(procfs_executable_path_SQ)"'
diff --git a/compat/winansi.c b/compat/winansi.c
index c27b20a79d..0e5a9cc82e 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -3,6 +3,12 @@
  */
 
 #undef NOGDI
+
+/*
+ * Including the appropriate header file for RtlGenRandom causes MSVC to see a
+ * redefinition of types in an incompatible way when including headers below.
+ */
+#undef HAVE_RTLGENRANDOM
 #include "../git-compat-util.h"
 #include <wingdi.h>
 #include <winreg.h>
diff --git a/config.mak.uname b/config.mak.uname
index 3236a4918a..5030d3c70b 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -257,6 +257,9 @@ ifeq ($(uname_S),FreeBSD)
 	HAVE_PATHS_H = YesPlease
 	HAVE_BSD_SYSCTL = YesPlease
 	HAVE_BSD_KERN_PROC_SYSCTL = YesPlease
+	HAVE_ARC4RANDOM = YesPlease
+	HAVE_GETRANDOM = YesPlease
+	HAVE_GETENTROPY = YesPlease
 	PAGER_ENV = LESS=FRX LV=-c MORE=FRX
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	FILENO_IS_A_MACRO = UnfortunatelyYes
@@ -271,6 +274,8 @@ ifeq ($(uname_S),OpenBSD)
 	HAVE_PATHS_H = YesPlease
 	HAVE_BSD_SYSCTL = YesPlease
 	HAVE_BSD_KERN_PROC_SYSCTL = YesPlease
+	HAVE_ARC4RANDOM = YesPlease
+	HAVE_GETENTROPY = YesPlease
 	PROCFS_EXECUTABLE_PATH = /proc/curproc/file
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	FILENO_IS_A_MACRO = UnfortunatelyYes
@@ -282,6 +287,7 @@ ifeq ($(uname_S),MirBSD)
 	NEEDS_LIBICONV = YesPlease
 	HAVE_PATHS_H = YesPlease
 	HAVE_BSD_SYSCTL = YesPlease
+	HAVE_ARC4RANDOM = YesPlease
 endif
 ifeq ($(uname_S),NetBSD)
 	ifeq ($(shell expr "$(uname_R)" : '[01]\.'),2)
@@ -293,6 +299,7 @@ ifeq ($(uname_S),NetBSD)
 	HAVE_PATHS_H = YesPlease
 	HAVE_BSD_SYSCTL = YesPlease
 	HAVE_BSD_KERN_PROC_SYSCTL = YesPlease
+	HAVE_ARC4RANDOM = YesPlease
 	PROCFS_EXECUTABLE_PATH = /proc/curproc/exe
 endif
 ifeq ($(uname_S),AIX)
@@ -422,6 +429,7 @@ ifeq ($(uname_S),Windows)
 	NO_STRTOUMAX = YesPlease
 	NO_MKDTEMP = YesPlease
 	NO_INTTYPES_H = YesPlease
+	HAVE_RTLGENRANDOM = YesPlease
 	# VS2015 with UCRT claims that snprintf and friends are C99 compliant,
 	# so we don't need this:
 	#
@@ -624,6 +632,7 @@ ifeq ($(uname_S),MINGW)
 	NO_POSIX_GOODIES = UnfortunatelyYes
 	DEFAULT_HELP_FORMAT = html
 	HAVE_PLATFORM_PROCINFO = YesPlease
+	HAVE_RTLGENRANDOM = YesPlease
 	BASIC_LDFLAGS += -municode
 	COMPAT_CFLAGS += -DNOGDI -Icompat -Icompat/win32
 	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index fd1399c440..134e00bde3 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -260,7 +260,7 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
 				_CONSOLE DETECT_MSYS_TTY STRIP_EXTENSION=".exe"  NO_SYMLINK_HEAD UNRELIABLE_FSTAT
 				NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0
 				USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP
-				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET)
+				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET HAVE_RTLGENRANDOM)
 	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c
 		compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c
 		compat/win32/trace2_win32_process_info.c compat/win32/dirent.c
diff --git a/git-compat-util.h b/git-compat-util.h
index d70ce14286..f2cff656e7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -165,6 +165,12 @@
 #endif
 #include <windows.h>
 #define GIT_WINDOWS_NATIVE
+#ifdef HAVE_RTLGENRANDOM
+/* This is required to get access to RtlGenRandom. */
+#define SystemFunction036 NTAPI SystemFunction036
+#include <NTSecAPI.h>
+#undef SystemFunction036
+#endif
 #endif
 
 #include <unistd.h>
@@ -235,6 +241,9 @@
 #else
 #include <stdint.h>
 #endif
+#ifdef HAVE_GETRANDOM
+#include <sys/random.h>
+#endif
 #ifdef NO_INTPTR_T
 /*
  * On I16LP32, ILP32 and LP64 "long" is the safe bet, however
@@ -1381,4 +1390,11 @@ static inline void *container_of_or_null_offset(void *ptr, size_t offset)
 
 void sleep_millisec(int millisec);
 
+/*
+ * Generate len bytes from the system cryptographically secure PRNG.
+ * Returns 0 on success and -1 on error, setting errno.  The inability to
+ * satisfy the full request is an error.
+ */
+int csprng_bytes(void *buf, size_t len);
+
 #endif
diff --git a/t/helper/test-csprng.c b/t/helper/test-csprng.c
new file mode 100644
index 0000000000..196c14e44f
--- /dev/null
+++ b/t/helper/test-csprng.c
@@ -0,0 +1,63 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+
+/*
+ * Check that we read each byte value at least once when reading 64 KiB from the
+ * CSPRNG.  This is not to test the quality of the CSPRNG, but to test our
+ * buffer handling of it.
+ *
+ * The probability of this failing by random is less than 10^-100.
+ */
+static int selftest(void)
+{
+	int buckets[256] = { 0 };
+	unsigned char buf[1024];
+	unsigned long count = 64 * 1024;
+	int i;
+
+	while (count) {
+		if (csprng_bytes(buf, sizeof(buf)) < 0) {
+			perror("failed to read");
+			return 3;
+		}
+		for (i = 0; i < sizeof(buf); i++)
+			buckets[buf[i]]++;
+		count -= sizeof(buf);
+	}
+	for (i = 0; i < ARRAY_SIZE(buckets); i++)
+		if (!buckets[i]) {
+			fprintf(stderr, "failed to find any bytes with value %02x\n", i);
+			return 4;
+		}
+	return 0;
+}
+
+int cmd__csprng(int argc, const char **argv)
+{
+	unsigned long count;
+	unsigned char buf[1024];
+
+	if (argc > 2) {
+		fprintf(stderr, "usage: %s [--selftest | <size>]\n", argv[0]);
+		return 2;
+	}
+
+	if (!strcmp(argv[1], "--selftest")) {
+		return selftest();
+	}
+
+	count = (argc == 2) ? strtoul(argv[1], NULL, 0) : -1L;
+
+	while (count) {
+		unsigned long chunk = count < sizeof(buf) ? count : sizeof(buf);
+		if (csprng_bytes(buf, chunk) < 0) {
+			perror("failed to read");
+			return 5;
+		}
+		if (fwrite(buf, chunk, 1, stdout) != chunk)
+			return 1;
+		count -= chunk;
+	}
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 3ce5585e53..fc0fb86c1b 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
+	{ "csprng", cmd__csprng },
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 9f0f522850..077d9bfcca 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -10,6 +10,7 @@ int cmd__bloom(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
+int cmd__csprng(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index b007f0efef..9647ec9629 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -1131,4 +1131,8 @@ test_expect_success 'test_must_fail rejects a non-git command with env' '
 	grep -F "test_must_fail: only '"'"'git'"'"' is allowed" err
 '
 
+test_expect_success 'CSPRNG handling functions correctly' '
+	test-tool csprng --selftest
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 36e12119d7..0046f32e46 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -702,3 +702,59 @@ int open_nofollow(const char *path, int flags)
 	return open(path, flags);
 #endif
 }
+
+int csprng_bytes(void *buf, size_t len)
+{
+#if defined(HAVE_ARC4RANDOM)
+	arc4random_buf(buf, len);
+	return 0;
+#elif defined(HAVE_GETRANDOM)
+	ssize_t res;
+	char *p = buf;
+	while (len) {
+		res = getrandom(p, len, 0);
+		if (res < 0)
+			return -1;
+		len -= res;
+		p += res;
+	}
+	return 0;
+#elif defined(HAVE_GETENTROPY)
+	int res;
+	char *p = buf;
+	while (len) {
+		/* getentropy has a maximum size of 256 bytes. */
+		size_t chunk = len < 256 ? len : 256;
+		res = getentropy(p, chunk);
+		if (res < 0)
+			return -1;
+		len -= chunk;
+		p += chunk;
+	}
+	return 0;
+#elif defined(HAVE_RTLGENRANDOM)
+	if (!RtlGenRandom(buf, len))
+		return -1;
+	return 0;
+#else
+	ssize_t res;
+	char *p = buf;
+	int fd, err;
+	fd = open("/dev/urandom", O_RDONLY);
+	if (fd < 0)
+		return -1;
+	while (len) {
+		res = xread(fd, p, len);
+		if (res < 0) {
+			err = errno;
+			close(fd);
+			errno = err;
+			return -1;
+		}
+		len -= res;
+		p += res;
+	}
+	close(fd);
+	return 0;
+#endif
+}

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/2] wrapper: use a CSPRNG to generate random file names
  2021-11-16  3:35 [PATCH 0/2] Generate temporary files using a CSPRNG brian m. carlson
  2021-11-16  3:35 ` [PATCH 1/2] wrapper: add a helper to generate numbers from " brian m. carlson
@ 2021-11-16  3:35 ` brian m. carlson
  2021-11-16 15:36   ` Jeff King
  2021-11-16 15:44 ` [PATCH 0/2] Generate temporary files using a CSPRNG Jeff King
  2021-11-16 20:35 ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2021-11-16  3:35 UTC (permalink / raw)
  To: git

The current way we generate random file names is by taking the seconds
and microseconds, plus the PID, and mixing them together, then encoding
them.  If this fails, we increment the value by 7777, and try again up
to TMP_MAX times.

Unfortunately, this is not the best idea from a security perspective.
If we're writing into TMPDIR, an attacker can guess these values easily
and prevent us from creating any temporary files at all by creating them
all first.  POSIX only requires TMP_MAX to be 25, so this is achievable
in some contexts, even if unlikely to occur in practice.

Fortunately, we can simply solve this by using the system
cryptographically secure pseudorandom number generator (CSPRNG) to
generate a random 64-bit value, and use that as before.  Note that there
is still a small bias here, but because a six-character sequence chosen
out of 62 characters provides about 36 bits of entropy, the bias here is
less than 2^-28, which is acceptable, especially considering we'll retry
several times.

Note that the use of a CSPRNG in generating temporary file names is also
used in many libcs.  glibc recently changed from an approach similar to
ours to using a CSPRNG, and FreeBSD and OpenBSD also use a CSPRNG in
this case.  Even if the likelihood of an attack is low, we should still
be at least as responsible in creating temporary files as libc is.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 wrapper.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index 0046f32e46..0cdb5b18ff 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -463,8 +463,6 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 	static const int num_letters = ARRAY_SIZE(letters) - 1;
 	static const char x_pattern[] = "XXXXXX";
 	static const int num_x = ARRAY_SIZE(x_pattern) - 1;
-	uint64_t value;
-	struct timeval tv;
 	char *filename_template;
 	size_t len;
 	int fd, count;
@@ -485,12 +483,13 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 	 * Replace pattern's XXXXXX characters with randomness.
 	 * Try TMP_MAX different filenames.
 	 */
-	gettimeofday(&tv, NULL);
-	value = ((uint64_t)tv.tv_usec << 16) ^ tv.tv_sec ^ getpid();
 	filename_template = &pattern[len - num_x - suffix_len];
 	for (count = 0; count < TMP_MAX; ++count) {
-		uint64_t v = value;
 		int i;
+		uint64_t v;
+		if (csprng_bytes(&v, sizeof(v)) < 0)
+			return -1;
+
 		/* Fill in the random bits. */
 		for (i = 0; i < num_x; i++) {
 			filename_template[i] = letters[v % num_letters];
@@ -506,12 +505,6 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 		 */
 		if (errno != EEXIST)
 			break;
-		/*
-		 * This is a random value.  It is only necessary that
-		 * the next TMP_MAX values generated by adding 7777 to
-		 * VALUE are different with (module 2^32).
-		 */
-		value += 7777;
 	}
 	/* We return the null string if we can't find a unique file name.  */
 	pattern[0] = '\0';

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16  3:35 ` [PATCH 1/2] wrapper: add a helper to generate numbers from " brian m. carlson
@ 2021-11-16 15:31   ` Jeff King
  2021-11-16 16:01     ` rsbecker
  2021-11-17  7:39   ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff King @ 2021-11-16 15:31 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

On Tue, Nov 16, 2021 at 03:35:41AM +0000, brian m. carlson wrote:

> The order of options is also important here.  On systems with
> arc4random, which is most of the BSDs, we use that, since, except on
> MirBSD, it uses ChaCha20, which is extremely fast, and sits entirely in
> userspace, avoiding a system call.  We then prefer getrandom over
> getentropy, because the former has been available longer on Linux, and
> finally, if none of those are available, we use /dev/urandom, because
> most Unix-like operating systems provide that API.  We prefer options
> that don't involve device files when possible because those work in some
> restricted environments where device files may not be available.

I wonder if we'll need a low-quality fallback for older systems which
don't even have /dev/urandom. Because it's going to be used in such a
core part of the system (tempfiles), this basically becomes a hard
requirement for using Git at all.

I can't say I'm excited in general to be introducing a dependency like
this, just because of the portability headaches. But it may be the least
bad thing (especially if we can fall back to the existing behavior).
One alternative would be to build on top of the system mkstemp(), which
makes it libc's problem. I'm not sure if we'd run into problems there,
though.

> diff --git a/Makefile b/Makefile
> index 12be39ac49..1d17021f59 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -234,6 +234,14 @@ all::
>  # Define NO_TRUSTABLE_FILEMODE if your filesystem may claim to support
>  # the executable mode bit, but doesn't really do so.
>  #
> +# Define HAVE_ARC4RANDOM if your system has arc4random and arc4random_buf.
> +#
> +# Define HAVE_GETRANDOM if your system has getrandom.
> +#
> +# Define HAVE_GETENTROPY if your system has getentropy.
> +#
> +# Define HAVE_RTLGENRANDOM if your system has RtlGenRandom (Windows only).

It seems like these will be mutually exclusive (and indeed, the #ifdef
in the code ends up defining a particular precedence). Would we be
better off exposing that to the user with a single CSPRNG_METHOD set to
arc4random, getrandom, getentropy, etc?

> diff --git a/config.mak.uname b/config.mak.uname
> index 3236a4918a..5030d3c70b 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -257,6 +257,9 @@ ifeq ($(uname_S),FreeBSD)
>  	HAVE_PATHS_H = YesPlease
>  	HAVE_BSD_SYSCTL = YesPlease
>  	HAVE_BSD_KERN_PROC_SYSCTL = YesPlease
> +	HAVE_ARC4RANDOM = YesPlease
> +	HAVE_GETRANDOM = YesPlease
> +	HAVE_GETENTROPY = YesPlease

So here we claim to support a whole bunch of methods, but in practice,
we only use arc4random, because these are all in an #elif chain:

> +int csprng_bytes(void *buf, size_t len)
> +{
> +#if defined(HAVE_ARC4RANDOM)
> +	arc4random_buf(buf, len);
> +	return 0;
> +#elif defined(HAVE_GETRANDOM)

though we still respect the others in other places, like including
headers that we don't end up using:

> +#ifdef HAVE_GETRANDOM
> +#include <sys/random.h>
> +#endif

If csprng_bytes() could fallback between methods based on runtime
errors, it would make sense to me to allow support for multiple methods
to be declared. But without that, it just seems to invite confusion (and
I am not sure runtime fallbacks are really worth the trouble).

> +int csprng_bytes(void *buf, size_t len)
> +{
> +#if defined(HAVE_ARC4RANDOM)
> +	arc4random_buf(buf, len);
> +	return 0;

OK, presumably this one can't return an error, which is nice.

> +#elif defined(HAVE_GETRANDOM)
> +
> +	ssize_t res;
> +	char *p = buf;
> +	while (len) {
> +		res = getrandom(p, len, 0);
> +		if (res < 0)
> +			return -1;
> +		len -= res;
> +		p += res;
> +	}
> +	return 0;

Do we ever have to worry about a "0" return from getrandom()? I'd expect
it to block rather than return 0, but what I'm wondering is if we could
ever be in a situation where we fail to make progress and loop
infinitely.

The manpage says that reads up to 256 bytes will always return the full
output and never be interrupted. So for the caller you add in patch 2,
we wouldn't need this loop. However, since cspring_bytes() is generic,
being defensive makes sense. But in that case, do we need to handle
EINTR when it returns -1?

> +#elif defined(HAVE_GETENTROPY)
> +	int res;
> +	char *p = buf;
> +	while (len) {
> +		/* getentropy has a maximum size of 256 bytes. */
> +		size_t chunk = len < 256 ? len : 256;
> +		res = getentropy(p, chunk);
> +		if (res < 0)
> +			return -1;
> +		len -= chunk;
> +		p += chunk;
> +	}
> +	return 0;

Heh, I see that getentropy() punted on all of those questions above by
just insisting you ask for 256 bytes at a time. Cute solution. ;)

> +#elif defined(HAVE_RTLGENRANDOM)
> +	if (!RtlGenRandom(buf, len))
> +		return -1;
> +	return 0;

I have no comment on this one. :)

> +#else
> +	ssize_t res;
> +	char *p = buf;
> +	int fd, err;
> +	fd = open("/dev/urandom", O_RDONLY);
> +	if (fd < 0)
> +		return -1;
> +	while (len) {
> +		res = xread(fd, p, len);
> +		if (res < 0) {
> +			err = errno;
> +			close(fd);
> +			errno = err;
> +			return -1;
> +		}
> +		len -= res;
> +		p += res;
> +	}
> +	close(fd);
> +	return 0;
> +#endif
> +}

This loop is basically read_in_full(), except that it doesn't treat a
"0" return as an EOF. I'm not sure if that's intentional (because we
want to keep trying on a 0 return, though I'd expect the read to block
in such a case), or if it would be an improvement (because it would
prevent us from infinite looping if /dev/urandom wanted to signal EOF).

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] wrapper: use a CSPRNG to generate random file names
  2021-11-16  3:35 ` [PATCH 2/2] wrapper: use a CSPRNG to generate random file names brian m. carlson
@ 2021-11-16 15:36   ` Jeff King
  2021-11-16 18:28     ` Taylor Blau
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff King @ 2021-11-16 15:36 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

On Tue, Nov 16, 2021 at 03:35:42AM +0000, brian m. carlson wrote:

> The current way we generate random file names is by taking the seconds
> and microseconds, plus the PID, and mixing them together, then encoding
> them.  If this fails, we increment the value by 7777, and try again up
> to TMP_MAX times.
> 
> Unfortunately, this is not the best idea from a security perspective.
> If we're writing into TMPDIR, an attacker can guess these values easily
> and prevent us from creating any temporary files at all by creating them
> all first.  POSIX only requires TMP_MAX to be 25, so this is achievable
> in some contexts, even if unlikely to occur in practice.

I think we unconditionally define TMP_MAX as 16384. I don't think that
changes the fundamental issue that somebody could race us and win,
though.

> @@ -485,12 +483,13 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>  	 * Replace pattern's XXXXXX characters with randomness.
>  	 * Try TMP_MAX different filenames.
>  	 */
> -	gettimeofday(&tv, NULL);
> -	value = ((uint64_t)tv.tv_usec << 16) ^ tv.tv_sec ^ getpid();
>  	filename_template = &pattern[len - num_x - suffix_len];
>  	for (count = 0; count < TMP_MAX; ++count) {
> -		uint64_t v = value;
>  		int i;
> +		uint64_t v;
> +		if (csprng_bytes(&v, sizeof(v)) < 0)
> +			return -1;

If csprng_bytes() fail, the resulting errno is likely to be confusing.
E.g., if /dev/urandom doesn't exist we'd get ENOENT. But the caller is
likely to say something like:

  error: unable to create temporary file: no such file or directory

which is misleading. It's probably worth doing:

  return error_errno("unable to get random bytes for temporary file");

or similar here. That's verbose on top of the error that the caller will
give, but this is something we don't expect to fail in practice.

I actually wonder if we should simply die() in such a case. That's not
very friendly from a libification stand-point, but we really can't
progress on much without being able to generate random bytes.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/2] Generate temporary files using a CSPRNG
  2021-11-16  3:35 [PATCH 0/2] Generate temporary files using a CSPRNG brian m. carlson
  2021-11-16  3:35 ` [PATCH 1/2] wrapper: add a helper to generate numbers from " brian m. carlson
  2021-11-16  3:35 ` [PATCH 2/2] wrapper: use a CSPRNG to generate random file names brian m. carlson
@ 2021-11-16 15:44 ` Jeff King
  2021-11-16 22:17   ` brian m. carlson
  2021-11-16 20:35 ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 37+ messages in thread
From: Jeff King @ 2021-11-16 15:44 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

On Tue, Nov 16, 2021 at 03:35:40AM +0000, brian m. carlson wrote:

> For those who are interested, I computed the probability of spurious
> failure for the self-test mode like so:
> 
>   256 * (255/256)^65536
> 
> This Ruby one-liner estimates the probability at approximately 10^-108:
> 
>   ruby -e 'a = 255 ** 65536; b = 256 ** 65536; puts b.to_s.length - a.to_s.length - 3'
> 
> If I have made an error in the calculation, please do feel free to point
> it out.

Yes, I think your math is correct there.

A more interesting question is whether generating 64k of PRNG bytes per
test run is going to a problem for system entropy pools. For that
matter, I guess the use of it for tempfiles will produce a similar
burden, since we run so many commands. My understanding is that modern
systems will just produce infinite output for /dev/urandom, etc, but I
wonder if there are any systems left where that is not true (because
they have a misguided notion that they need to stir in more "real"
entropy bits).

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16 15:31   ` Jeff King
@ 2021-11-16 16:01     ` rsbecker
  2021-11-16 18:22       ` Taylor Blau
  2021-11-16 22:41       ` brian m. carlson
  0 siblings, 2 replies; 37+ messages in thread
From: rsbecker @ 2021-11-16 16:01 UTC (permalink / raw)
  To: 'Jeff King', 'brian m. carlson'; +Cc: git

On November 16, 2021 10:31 AM, Jeff King wrote:
> On Tue, Nov 16, 2021 at 03:35:41AM +0000, brian m. carlson wrote:
> 
> > The order of options is also important here.  On systems with
> > arc4random, which is most of the BSDs, we use that, since, except on
> > MirBSD, it uses ChaCha20, which is extremely fast, and sits entirely
> > in userspace, avoiding a system call.  We then prefer getrandom over
> > getentropy, because the former has been available longer on Linux, and
> > finally, if none of those are available, we use /dev/urandom, because
> > most Unix-like operating systems provide that API.  We prefer options
> > that don't involve device files when possible because those work in
> > some restricted environments where device files may not be available.
> 
> I wonder if we'll need a low-quality fallback for older systems which don't
> even have /dev/urandom. Because it's going to be used in such a core part of
> the system (tempfiles), this basically becomes a hard requirement for using
> Git at all.
> 
> I can't say I'm excited in general to be introducing a dependency like this, just
> because of the portability headaches. But it may be the least bad thing
> (especially if we can fall back to the existing behavior).
> One alternative would be to build on top of the system mkstemp(), which
> makes it libc's problem. I'm not sure if we'd run into problems there, though.

None of /dev/urandom, /dev/random, or mkstemp are available on some platforms, including NonStop. This is not a good dependency to add. One variant PRNGD is used in ia64 OpenSSL, while the CPU random generator in hardware is used on x86. I cannot get behind this at all. Libc is also not used in or available to our port. I am very worried about this direction.

-Randall


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16 16:01     ` rsbecker
@ 2021-11-16 18:22       ` Taylor Blau
  2021-11-16 19:58         ` rsbecker
  2021-11-16 22:41       ` brian m. carlson
  1 sibling, 1 reply; 37+ messages in thread
From: Taylor Blau @ 2021-11-16 18:22 UTC (permalink / raw)
  To: rsbecker; +Cc: 'Jeff King', 'brian m. carlson', git

On Tue, Nov 16, 2021 at 11:01:20AM -0500, rsbecker@nexbridge.com wrote:
> On November 16, 2021 10:31 AM, Jeff King wrote:
> > On Tue, Nov 16, 2021 at 03:35:41AM +0000, brian m. carlson wrote:
> >
> > > The order of options is also important here.  On systems with
> > > arc4random, which is most of the BSDs, we use that, since, except on
> > > MirBSD, it uses ChaCha20, which is extremely fast, and sits entirely
> > > in userspace, avoiding a system call.  We then prefer getrandom over
> > > getentropy, because the former has been available longer on Linux, and
> > > finally, if none of those are available, we use /dev/urandom, because
> > > most Unix-like operating systems provide that API.  We prefer options
> > > that don't involve device files when possible because those work in
> > > some restricted environments where device files may not be available.
> >
> > I wonder if we'll need a low-quality fallback for older systems which don't
> > even have /dev/urandom. Because it's going to be used in such a core part of
> > the system (tempfiles), this basically becomes a hard requirement for using
> > Git at all.
> >
> > I can't say I'm excited in general to be introducing a dependency like this, just
> > because of the portability headaches. But it may be the least bad thing
> > (especially if we can fall back to the existing behavior).
> > One alternative would be to build on top of the system mkstemp(), which
> > makes it libc's problem. I'm not sure if we'd run into problems there, though.
>
> None of /dev/urandom, /dev/random, or mkstemp are available on some
> platforms, including NonStop. This is not a good dependency to add.
> One variant PRNGD is used in ia64 OpenSSL, while the CPU random
> generator in hardware is used on x86. I cannot get behind this at all.
> Libc is also not used in or available to our port. I am very worried
> about this direction.

I share Peff's lack of enthusiasm about the dependency situation. But
making Git depend on having /dev/urandom available is simply not
feasible, as you point out.

I wonder if the suitable fall-back should be the existing behavior of
git_mkstemps_mode()? That leaves us in a somewhat-disappointing
situation of not having fully resolved the DOS attack on all platforms.
But it makes our dependency situation less complicated, and leaves
things no worse off than the were before on platforms like NonStop.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] wrapper: use a CSPRNG to generate random file names
  2021-11-16 15:36   ` Jeff King
@ 2021-11-16 18:28     ` Taylor Blau
  2021-11-16 18:57       ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Taylor Blau @ 2021-11-16 18:28 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git

On Tue, Nov 16, 2021 at 10:36:51AM -0500, Jeff King wrote:
> On Tue, Nov 16, 2021 at 03:35:42AM +0000, brian m. carlson wrote:
>
> > The current way we generate random file names is by taking the seconds
> > and microseconds, plus the PID, and mixing them together, then encoding
> > them.  If this fails, we increment the value by 7777, and try again up
> > to TMP_MAX times.
> >
> > Unfortunately, this is not the best idea from a security perspective.
> > If we're writing into TMPDIR, an attacker can guess these values easily
> > and prevent us from creating any temporary files at all by creating them
> > all first.  POSIX only requires TMP_MAX to be 25, so this is achievable
> > in some contexts, even if unlikely to occur in practice.
>
> I think we unconditionally define TMP_MAX as 16384. I don't think that
> changes the fundamental issue that somebody could race us and win,
> though.

Yes, we do. Right above the declaration of this function (and so hidden
from the context) we do:

    #undef TMP_MAX
    #define TMP_MAX 16384

I don't think that the value of TMP_MAX makes this substantially less
likely, so I agree that the fundamental issue is the same.

> > @@ -485,12 +483,13 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
> >  	 * Replace pattern's XXXXXX characters with randomness.
> >  	 * Try TMP_MAX different filenames.
> >  	 */
> > -	gettimeofday(&tv, NULL);
> > -	value = ((uint64_t)tv.tv_usec << 16) ^ tv.tv_sec ^ getpid();
> >  	filename_template = &pattern[len - num_x - suffix_len];
> >  	for (count = 0; count < TMP_MAX; ++count) {
> > -		uint64_t v = value;
> >  		int i;
> > +		uint64_t v;
> > +		if (csprng_bytes(&v, sizeof(v)) < 0)
> > +			return -1;
>
> If csprng_bytes() fail, the resulting errno is likely to be confusing.
> E.g., if /dev/urandom doesn't exist we'd get ENOENT. But the caller is
> likely to say something like:
>
>   error: unable to create temporary file: no such file or directory
>
> which is misleading. It's probably worth doing:
>
>   return error_errno("unable to get random bytes for temporary file");
>
> or similar here. That's verbose on top of the error that the caller will
> give, but this is something we don't expect to fail in practice.
>
> I actually wonder if we should simply die() in such a case. That's not
> very friendly from a libification stand-point, but we really can't
> progress on much without being able to generate random bytes.

Alternatively, we could fall back to the existing code paths. This is
somewhat connected to my suggestion to Randall earlier in the thread.
But I would rather see that fallback done at compile-time for platforms
that don't give us an easy-to-use CSPRNG, and avoid masking legitimate
errors caused from trying to use a CSPRNG that should exist.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] wrapper: use a CSPRNG to generate random file names
  2021-11-16 18:28     ` Taylor Blau
@ 2021-11-16 18:57       ` Junio C Hamano
  2021-11-16 19:21         ` Jeff King
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2021-11-16 18:57 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, brian m. carlson, git

Taylor Blau <me@ttaylorr.com> writes:

>> I actually wonder if we should simply die() in such a case. That's not
>> very friendly from a libification stand-point, but we really can't
>> progress on much without being able to generate random bytes.
>
> Alternatively, we could fall back to the existing code paths. This is
> somewhat connected to my suggestion to Randall earlier in the thread.
> But I would rather see that fallback done at compile-time for platforms
> that don't give us an easy-to-use CSPRNG, and avoid masking legitimate
> errors caused from trying to use a CSPRNG that should exist.

Yeah, I do not think we are doing this because the current code is
completely broken and everybody needs to move to CSPRNG that makes
it absoletely safe---rather this is still just making it safer than
the current code, when system support is available.  So a fallback
to the current code would be a good (and easy) thing to have, I
would think.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] wrapper: use a CSPRNG to generate random file names
  2021-11-16 18:57       ` Junio C Hamano
@ 2021-11-16 19:21         ` Jeff King
  2021-11-16 19:33           ` Taylor Blau
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff King @ 2021-11-16 19:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, brian m. carlson, git

On Tue, Nov 16, 2021 at 10:57:28AM -0800, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> >> I actually wonder if we should simply die() in such a case. That's not
> >> very friendly from a libification stand-point, but we really can't
> >> progress on much without being able to generate random bytes.
> >
> > Alternatively, we could fall back to the existing code paths. This is
> > somewhat connected to my suggestion to Randall earlier in the thread.
> > But I would rather see that fallback done at compile-time for platforms
> > that don't give us an easy-to-use CSPRNG, and avoid masking legitimate
> > errors caused from trying to use a CSPRNG that should exist.
> 
> Yeah, I do not think we are doing this because the current code is
> completely broken and everybody needs to move to CSPRNG that makes
> it absoletely safe---rather this is still just making it safer than
> the current code, when system support is available.  So a fallback
> to the current code would be a good (and easy) thing to have, I
> would think.

One challenge for any fallback is that there are security implications.
In particular:

  - the fallback probably needs to be specific to the mktemp code; we
    don't have any callers yet of csprng_bytes(), but anybody using it
    for, say, actual cryptography would be very unhappy if it quietly
    fell back to insecure bytes.

    (I don't have any plans to use it and we don't do very much actual
    crypto ourselves, but one place that _could_ use it is the
    generation of the push-cert nonce seed).

  - I'm not sure if we should fallback for runtime errors or not. E.g.,
    if we try to open /dev/urandom and it isn't there, is it OK to fall
    back to the older, less-secure tempfile method? That's convenient in
    some sense; Git continues to work inside a chroot for which you
    haven't set up /dev/urandom. But it may also be surprising, and
    erring on the side of doing the less secure thing is probably a bad
    idea.

    So the mktemp code probably needs to be aware of the difference
    between "we have no CSPRNG source" and "we were compiled with
    support for a source, but it didn't work".

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] wrapper: use a CSPRNG to generate random file names
  2021-11-16 19:21         ` Jeff King
@ 2021-11-16 19:33           ` Taylor Blau
  0 siblings, 0 replies; 37+ messages in thread
From: Taylor Blau @ 2021-11-16 19:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Taylor Blau, brian m. carlson, git

On Tue, Nov 16, 2021 at 02:21:22PM -0500, Jeff King wrote:
> On Tue, Nov 16, 2021 at 10:57:28AM -0800, Junio C Hamano wrote:
>
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > >> I actually wonder if we should simply die() in such a case. That's not
> > >> very friendly from a libification stand-point, but we really can't
> > >> progress on much without being able to generate random bytes.
> > >
> > > Alternatively, we could fall back to the existing code paths. This is
> > > somewhat connected to my suggestion to Randall earlier in the thread.
> > > But I would rather see that fallback done at compile-time for platforms
> > > that don't give us an easy-to-use CSPRNG, and avoid masking legitimate
> > > errors caused from trying to use a CSPRNG that should exist.
> >
> > Yeah, I do not think we are doing this because the current code is
> > completely broken and everybody needs to move to CSPRNG that makes
> > it absoletely safe---rather this is still just making it safer than
> > the current code, when system support is available.  So a fallback
> > to the current code would be a good (and easy) thing to have, I
> > would think.
>
> One challenge for any fallback is that there are security implications.
> In particular:
>
>   - the fallback probably needs to be specific to the mktemp code; we
>     don't have any callers yet of csprng_bytes(), but anybody using it
>     for, say, actual cryptography would be very unhappy if it quietly
>     fell back to insecure bytes.
>
>     (I don't have any plans to use it and we don't do very much actual
>     crypto ourselves, but one place that _could_ use it is the
>     generation of the push-cert nonce seed).
>
>   - I'm not sure if we should fallback for runtime errors or not. E.g.,
>     if we try to open /dev/urandom and it isn't there, is it OK to fall
>     back to the older, less-secure tempfile method? That's convenient in
>     some sense; Git continues to work inside a chroot for which you
>     haven't set up /dev/urandom. But it may also be surprising, and
>     erring on the side of doing the less secure thing is probably a bad
>     idea.
>
>     So the mktemp code probably needs to be aware of the difference
>     between "we have no CSPRNG source" and "we were compiled with
>     support for a source, but it didn't work".

My opinion is that we should probably not fallback for runtime errors
where we do have a CSPRNG and any errors trying to use it are
legitimate.

I would probably have csprng_bytes() itself only be compiled where we
know we have a CSPRNG. And then I think our implementation of
git_mkstemps_mode() would depend on whether csprng_bytes() was compiled
or not. If it was, then any errors returned by it are propagated to the
caller (or we call die()). If not, then we use the existing, insecure
implementation.

And I think that basically addresses both of your points, namely that
the fallback is specific to the mktemp code, and provides one opinion on
the matter of runtime errors.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16 18:22       ` Taylor Blau
@ 2021-11-16 19:58         ` rsbecker
  0 siblings, 0 replies; 37+ messages in thread
From: rsbecker @ 2021-11-16 19:58 UTC (permalink / raw)
  To: 'Taylor Blau'
  Cc: 'Jeff King', 'brian m. carlson', git

On November 16, 2021 1:23 PM, Taylor Blau wrote:
> On Tue, Nov 16, 2021 at 11:01:20AM -0500, rsbecker@nexbridge.com wrote:
> > On November 16, 2021 10:31 AM, Jeff King wrote:
> > > On Tue, Nov 16, 2021 at 03:35:41AM +0000, brian m. carlson wrote:
> > >
> > > > The order of options is also important here.  On systems with
> > > > arc4random, which is most of the BSDs, we use that, since, except
> > > > on MirBSD, it uses ChaCha20, which is extremely fast, and sits
> > > > entirely in userspace, avoiding a system call.  We then prefer
> > > > getrandom over getentropy, because the former has been available
> > > > longer on Linux, and finally, if none of those are available, we
> > > > use /dev/urandom, because most Unix-like operating systems provide
> > > > that API.  We prefer options that don't involve device files when
> > > > possible because those work in some restricted environments where
> device files may not be available.
> > >
> > > I wonder if we'll need a low-quality fallback for older systems
> > > which don't even have /dev/urandom. Because it's going to be used in
> > > such a core part of the system (tempfiles), this basically becomes a
> > > hard requirement for using Git at all.
> > >
> > > I can't say I'm excited in general to be introducing a dependency
> > > like this, just because of the portability headaches. But it may be
> > > the least bad thing (especially if we can fall back to the existing behavior).
> > > One alternative would be to build on top of the system mkstemp(),
> > > which makes it libc's problem. I'm not sure if we'd run into problems
> there, though.
> >
> > None of /dev/urandom, /dev/random, or mkstemp are available on some
> > platforms, including NonStop. This is not a good dependency to add.
> > One variant PRNGD is used in ia64 OpenSSL, while the CPU random
> > generator in hardware is used on x86. I cannot get behind this at all.
> > Libc is also not used in or available to our port. I am very worried
> > about this direction.
> 
> I share Peff's lack of enthusiasm about the dependency situation. But making
> Git depend on having /dev/urandom available is simply not feasible, as you
> point out.
> 
> I wonder if the suitable fall-back should be the existing behavior of
> git_mkstemps_mode()? That leaves us in a somewhat-disappointing
> situation of not having fully resolved the DOS attack on all platforms.
> But it makes our dependency situation less complicated, and leaves things no
> worse off than the were before on platforms like NonStop.

The general advice on NonStop is to delegate handling DOS attacks to either SSH or firewalls (preferably). I have yet to see anyone publish a git service on that platform outside of using SSH anyway - and if they did, they would get a pretty fierce glare from me.
-Randall


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/2] Generate temporary files using a CSPRNG
  2021-11-16  3:35 [PATCH 0/2] Generate temporary files using a CSPRNG brian m. carlson
                   ` (2 preceding siblings ...)
  2021-11-16 15:44 ` [PATCH 0/2] Generate temporary files using a CSPRNG Jeff King
@ 2021-11-16 20:35 ` Ævar Arnfjörð Bjarmason
  2021-11-16 21:06   ` Jeff King
  3 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-16 20:35 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

On Tue, Nov 16 2021, brian m. carlson wrote:

> Currently, when we generate a temporary file name, we use the seconds,
> microseconds, and the PID to generate a unique value.  The resulting
> value, while changing frequently, is actually predictable and on some
> systems, it may be possible to cause a DoS by creating all potential
> temporary files when the temporary file is being created in TMPDIR.
>
> The solution to this is to use the system CSPRNG to generate the
> temporary file name.  This is the approach taken by FreeBSD, NetBSD, and
> OpenBSD, and glibc also recently switched to this approach from an
> approach that resembled ours in many ways.
>
> Even if this is not practically exploitable on many systems, it seems
> prudent to be at least as careful about temporary file generation as
> libc is.
>    
> This issue was mentioned on the security list and it was decided that
> this was not sensitive enough to warrant a coordinated disclosure, a
> sentiment with which I agree.  This is difficult to exploit on most
> systems, but I think it's still worth fixing.

I skimmed that report on the security list, and having skimmed this
patch series I think what's missing is something like this summary of
yours there (which I hope you don't mind me quoting):

    Now, in Git's case, I don't think our security model allows untrusted
    users to write directly into the repository, so I don't think this
    constitutes a vulnerability there.  We have a function that uses TMPDIR,
    which appears to be used for prepping temporary blobs in diffs and in
    GnuPG verification, which is definitely more questionable.

I tried testing this codepath real quick now with:

    diff --git a/wrapper.c b/wrapper.c
    index 36e12119d76..2f3755886fb 100644
    --- a/wrapper.c
    +++ b/wrapper.c
    @@ -497,6 +497,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
                            v /= num_letters;
                    }

    +               BUG("%s", pattern);
                    fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
                    if (fd >= 0)
                            return fd;

And then doing:

    grep BUG test-results/*.out

And the resulting output is all of the form:

    .git/objects/9f/tmp_obj_FOzEcZ
    .git/objects/pack/tmp_pack_fJC0RI

And a couple of:

    .git/info/refs_Lctaew

I.e. these are all cases where we're creating in-repo tempfiles, we're
not racing someone in /tmp/ for these, except perhaps in some cases I've
missed (but you allude to) where we presumably should just move those
into .git/tmp/, at least by default.

Doesn't that entirely solve this security problem going forward? If a
hostile actor can write into your .git/ they don't need to screw with
you in this way, they can just write executable aliases, or the same in
.git/hook/.

Unless that is we do have some use-case for potentially racing others in
/tmp/, but then we could make that specifically configurable etc.

I really don't mind us having a better tempfile() function principle,
but so far this sort of hardening just seems entirely unnecessary to me.

As seen from your implementation requires us top dip our toes into
seeding random data, which I'd think from a security maintenance
perspective we'd be much better offloading to the OS going forward if at
all possible.

If there are cases where we actually need this hardening because we're
writing in a shared /tmp/ and not .git/, then surely we're better having
those API users call a differently named function, or to move those
users to using a .git/tmp/ unless they configure things otherwise?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/2] Generate temporary files using a CSPRNG
  2021-11-16 20:35 ` Ævar Arnfjörð Bjarmason
@ 2021-11-16 21:06   ` Jeff King
  2021-11-17  8:36     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff King @ 2021-11-16 21:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: brian m. carlson, git

On Tue, Nov 16, 2021 at 09:35:59PM +0100, Ævar Arnfjörð Bjarmason wrote:

> I tried testing this codepath real quick now with:
>     
>     diff --git a/wrapper.c b/wrapper.c
>     index 36e12119d76..2f3755886fb 100644
>     --- a/wrapper.c
>     +++ b/wrapper.c
>     @@ -497,6 +497,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>                             v /= num_letters;
>                     }
>      
>     +               BUG("%s", pattern);
>                     fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
>                     if (fd >= 0)
>                             return fd;
>     
> And then doing:
> 
>     grep BUG test-results/*.out
> 
> And the resulting output is all of the form:
> 
>     .git/objects/9f/tmp_obj_FOzEcZ
>     .git/objects/pack/tmp_pack_fJC0RI
> 
> And a couple of:
> 
>     .git/info/refs_Lctaew
> 
> I.e. these are all cases where we're creating in-repo tempfiles, we're
> not racing someone in /tmp/ for these, except perhaps in some cases I've
> missed (but you allude to) where we presumably should just move those
> into .git/tmp/, at least by default.

Your patch is way too aggressive. By bailing via BUG(), most commands
will fail, so we never get to the interesting ones (e.g., we would not
ever get to the point of writing out a tag signature for gpg to verify,
because we'd barf when trying to create the tag in the first place).

Try:

diff --git a/wrapper.c b/wrapper.c
index 36e12119d7..5218a4b3bd 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -497,6 +497,10 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 			v /= num_letters;
 		}
 
+		{
+			static struct trace_key t = TRACE_KEY_INIT(TEMPFILE);
+			trace_printf_key(&t, "%s", pattern);
+		}
 		fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
 		if (fd >= 0)
 			return fd;

And then:

  GIT_TRACE_TEMPFILE=/tmp/foo make test
  grep ^/tmp /tmp/foo | wc -l

turns up hundreds of hits.

> If there are cases where we actually need this hardening because we're
> writing in a shared /tmp/ and not .git/, then surely we're better having
> those API users call a differently named function, or to move those
> users to using a .git/tmp/ unless they configure things otherwise?

Assuming you can write to .git/tmp means that conceptually read-only
operations (like verifying tags) require write access to the repository.

-Peff

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/2] Generate temporary files using a CSPRNG
  2021-11-16 15:44 ` [PATCH 0/2] Generate temporary files using a CSPRNG Jeff King
@ 2021-11-16 22:17   ` brian m. carlson
  2021-11-16 22:29     ` rsbecker
  0 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2021-11-16 22:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2866 bytes --]

On 2021-11-16 at 15:44:33, Jeff King wrote:
> On Tue, Nov 16, 2021 at 03:35:40AM +0000, brian m. carlson wrote:
> 
> > For those who are interested, I computed the probability of spurious
> > failure for the self-test mode like so:
> > 
> >   256 * (255/256)^65536
> > 
> > This Ruby one-liner estimates the probability at approximately 10^-108:
> > 
> >   ruby -e 'a = 255 ** 65536; b = 256 ** 65536; puts b.to_s.length - a.to_s.length - 3'
> > 
> > If I have made an error in the calculation, please do feel free to point
> > it out.
> 
> Yes, I think your math is correct there.
> 
> A more interesting question is whether generating 64k of PRNG bytes per
> test run is going to a problem for system entropy pools. For that
> matter, I guess the use of it for tempfiles will produce a similar
> burden, since we run so many commands. My understanding is that modern
> systems will just produce infinite output for /dev/urandom, etc, but I
> wonder if there are any systems left where that is not true (because
> they have a misguided notion that they need to stir in more "real"
> entropy bits).

I have specifically avoided invoking any sort of potentially blocking
CSPRNG for that reason.  /dev/urandom is specifically not supposed to
block, and on the systems that I mentioned, the way Go uses it would
indicate that it should not.  There is a system, which is Plan 9, where
Go uses /dev/random to seed an X.917 generator, and there I assume there
is no /dev/urandom, but I also know full well that we are likely
completely broken on Plan 9 already, so this will be the least of the
required fixes.

RtlGenRandom is non-blocking, and as the commit message mentioned,
arc4random uses ChaCha20 in a non-blocking way on all systems I could
find, except MirBSD which uses RC4, also without blocking.  Linux's
CSPRNG is also non-blocking.

I've also looked at Rust's getrandom crate, which provides support for
various other systems, and I have no indication that any of the
interfaces I've provided are blocking in any way, since that crate would
not desire that behavior.  Looking at it just now, I did notice that
macOS supports getentropy, so if I need to do a reroll, I'll add an
option for that.

So I don't think we're likely to run into a problem here.  If we do run
into systems with that problem, we can add an option to use libbsd,
which provides arc4random and company (using ChaCha20).  The tricky part
is that when using libbsd, arc4random is not in <stdlib.h> (since that's
a system header file) and is instead in <bsd/stdlib.h>.  However, it's
an easy change if we run into some uncommon system where that's the
case.

If we don't like the test, we can avoid running it by default on the
risk of seeing breakage go uncaught.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 0/2] Generate temporary files using a CSPRNG
  2021-11-16 22:17   ` brian m. carlson
@ 2021-11-16 22:29     ` rsbecker
  0 siblings, 0 replies; 37+ messages in thread
From: rsbecker @ 2021-11-16 22:29 UTC (permalink / raw)
  To: 'brian m. carlson', 'Jeff King'; +Cc: git

On November 16, 2021 5:18 PM, brian m. carlson wrote:
> On 2021-11-16 at 15:44:33, Jeff King wrote:
> > On Tue, Nov 16, 2021 at 03:35:40AM +0000, brian m. carlson wrote:
> >
> > > For those who are interested, I computed the probability of spurious
> > > failure for the self-test mode like so:
> > >
> > >   256 * (255/256)^65536
> > >
> > > This Ruby one-liner estimates the probability at approximately 10^-108:
> > >
> > >   ruby -e 'a = 255 ** 65536; b = 256 ** 65536; puts b.to_s.length -
> a.to_s.length - 3'
> > >
> > > If I have made an error in the calculation, please do feel free to
> > > point it out.
> >
> > Yes, I think your math is correct there.
> >
> > A more interesting question is whether generating 64k of PRNG bytes
> > per test run is going to a problem for system entropy pools. For that
> > matter, I guess the use of it for tempfiles will produce a similar
> > burden, since we run so many commands. My understanding is that
> modern
> > systems will just produce infinite output for /dev/urandom, etc, but I
> > wonder if there are any systems left where that is not true (because
> > they have a misguided notion that they need to stir in more "real"
> > entropy bits).
> 
> I have specifically avoided invoking any sort of potentially blocking CSPRNG
> for that reason.  /dev/urandom is specifically not supposed to block, and on
> the systems that I mentioned, the way Go uses it would indicate that it
> should not.  There is a system, which is Plan 9, where Go uses /dev/random
> to seed an X.917 generator, and there I assume there is no /dev/urandom,
> but I also know full well that we are likely completely broken on Plan 9
> already, so this will be the least of the required fixes.
> 
> RtlGenRandom is non-blocking, and as the commit message mentioned,
> arc4random uses ChaCha20 in a non-blocking way on all systems I could find,
> except MirBSD which uses RC4, also without blocking.  Linux's CSPRNG is also
> non-blocking.
> 
> I've also looked at Rust's getrandom crate, which provides support for
> various other systems, and I have no indication that any of the interfaces I've
> provided are blocking in any way, since that crate would not desire that
> behavior.  Looking at it just now, I did notice that macOS supports
> getentropy, so if I need to do a reroll, I'll add an option for that.
> 
> So I don't think we're likely to run into a problem here.  If we do run into
> systems with that problem, we can add an option to use libbsd, which
> provides arc4random and company (using ChaCha20).  The tricky part is that
> when using libbsd, arc4random is not in <stdlib.h> (since that's a system
> header file) and is instead in <bsd/stdlib.h>.  However, it's an easy change if
> we run into some uncommon system where that's the case.
> 
> If we don't like the test, we can avoid running it by default on the risk of
> seeing breakage go uncaught.

Adding these dependencies are also a problem. libbsd does not port to NonStop. GO is not available yet. Please stay at least somewhat POSIX-like. Begging because I do not want to lose git.
-Randall


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16 16:01     ` rsbecker
  2021-11-16 18:22       ` Taylor Blau
@ 2021-11-16 22:41       ` brian m. carlson
  2021-11-16 23:20         ` rsbecker
  1 sibling, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2021-11-16 22:41 UTC (permalink / raw)
  To: rsbecker; +Cc: 'Jeff King', git

[-- Attachment #1: Type: text/plain, Size: 3234 bytes --]

On 2021-11-16 at 16:01:20, rsbecker@nexbridge.com wrote:
> On November 16, 2021 10:31 AM, Jeff King wrote:
> > On Tue, Nov 16, 2021 at 03:35:41AM +0000, brian m. carlson wrote:
> > 
> > > The order of options is also important here.  On systems with
> > > arc4random, which is most of the BSDs, we use that, since, except on
> > > MirBSD, it uses ChaCha20, which is extremely fast, and sits entirely
> > > in userspace, avoiding a system call.  We then prefer getrandom over
> > > getentropy, because the former has been available longer on Linux, and
> > > finally, if none of those are available, we use /dev/urandom, because
> > > most Unix-like operating systems provide that API.  We prefer options
> > > that don't involve device files when possible because those work in
> > > some restricted environments where device files may not be available.
> > 
> > I wonder if we'll need a low-quality fallback for older systems which don't
> > even have /dev/urandom. Because it's going to be used in such a core part of
> > the system (tempfiles), this basically becomes a hard requirement for using
> > Git at all.
> > 
> > I can't say I'm excited in general to be introducing a dependency like this, just
> > because of the portability headaches. But it may be the least bad thing
> > (especially if we can fall back to the existing behavior).
> > One alternative would be to build on top of the system mkstemp(), which
> > makes it libc's problem. I'm not sure if we'd run into problems there, though.
> 
> None of /dev/urandom, /dev/random, or mkstemp are available on some
> platforms, including NonStop. This is not a good dependency to add.
> One variant PRNGD is used in ia64 OpenSSL, while the CPU random
> generator in hardware is used on x86. I cannot get behind this at all.
> Libc is also not used in or available to our port. I am very worried
> about this direction.

I'm really not excited about a fallback here, and I specifically did not
include one for that reason.  I'm happy to add an appropriate dependency
on an OpenSSL or libgcrypt PRNG if you're linking against that already
(e.g., for libcurl) or support for libbsd's arc4random or getentropy if
that will work on your system.  For example, how are you dealing with
TLS connections over HTTPS?  That library will almost certainly provide
the required primitives in a straightforward and portable way.

I do fundamentally believe every operating system and language
environment need to provide a readily available CSPRNG in 2021,
especially because in the vast majority of cases, hash tables must be
randomized to avoid hash DoS attacks on untrusted input.  I'm planning
to look into our hash tables in the future to see if they are vulnerable
to that kind of attack, and if so, we'll need to have a CSPRNG for basic
security reasons, and platforms that can't provide one would be subject
to a CVE.

If we really can't find a solution, I won't object to a patch on top
that adds an insecure fallback, but I don't want to put my name or
sign-off on such a patch because I think it's a mistake.  But I think we
almost certainly can, though.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16 22:41       ` brian m. carlson
@ 2021-11-16 23:20         ` rsbecker
  2021-11-17  0:47           ` Carlo Arenas
  2021-11-17  1:03           ` brian m. carlson
  0 siblings, 2 replies; 37+ messages in thread
From: rsbecker @ 2021-11-16 23:20 UTC (permalink / raw)
  To: 'brian m. carlson'; +Cc: 'Jeff King', git

On November 16, 2021 5:42 PM, brian m. carlson
> On 2021-11-16 at 16:01:20, rsbecker@nexbridge.com wrote:
> > On November 16, 2021 10:31 AM, Jeff King wrote:
> > > On Tue, Nov 16, 2021 at 03:35:41AM +0000, brian m. carlson wrote:
> > >
> > > > The order of options is also important here.  On systems with
> > > > arc4random, which is most of the BSDs, we use that, since, except
> > > > on MirBSD, it uses ChaCha20, which is extremely fast, and sits
> > > > entirely in userspace, avoiding a system call.  We then prefer
> > > > getrandom over getentropy, because the former has been available
> > > > longer on Linux, and finally, if none of those are available, we
> > > > use /dev/urandom, because most Unix-like operating systems provide
> > > > that API.  We prefer options that don't involve device files when
> > > > possible because those work in some restricted environments where
> device files may not be available.
> > >
> > > I wonder if we'll need a low-quality fallback for older systems
> > > which don't even have /dev/urandom. Because it's going to be used in
> > > such a core part of the system (tempfiles), this basically becomes a
> > > hard requirement for using Git at all.
> > >
> > > I can't say I'm excited in general to be introducing a dependency
> > > like this, just because of the portability headaches. But it may be
> > > the least bad thing (especially if we can fall back to the existing behavior).
> > > One alternative would be to build on top of the system mkstemp(),
> > > which makes it libc's problem. I'm not sure if we'd run into problems
> there, though.
> >
> > None of /dev/urandom, /dev/random, or mkstemp are available on some
> > platforms, including NonStop. This is not a good dependency to add.
> > One variant PRNGD is used in ia64 OpenSSL, while the CPU random
> > generator in hardware is used on x86. I cannot get behind this at all.
> > Libc is also not used in or available to our port. I am very worried
> > about this direction.
> 
> I'm really not excited about a fallback here, and I specifically did not include
> one for that reason.  I'm happy to add an appropriate dependency on an
> OpenSSL or libgcrypt PRNG if you're linking against that already (e.g., for
> libcurl) or support for libbsd's arc4random or getentropy if that will work on
> your system.  For example, how are you dealing with TLS connections over
> HTTPS?  That library will almost certainly provide the required primitives in a
> straightforward and portable way.
> 
> I do fundamentally believe every operating system and language
> environment need to provide a readily available CSPRNG in 2021, especially
> because in the vast majority of cases, hash tables must be randomized to
> avoid hash DoS attacks on untrusted input.  I'm planning to look into our hash
> tables in the future to see if they are vulnerable to that kind of attack, and if
> so, we'll need to have a CSPRNG for basic security reasons, and platforms
> that can't provide one would be subject to a CVE.
> 
> If we really can't find a solution, I won't object to a patch on top that adds an
> insecure fallback, but I don't want to put my name or sign-off on such a patch
> because I think it's a mistake.  But I think we almost certainly can, though.

We do link with libcurl and use OpenSSL as a DLL to handle TLS. The underlying random source for the nonstop-* configurations as of OpenSSL 3.0 are PNRG supplied by the vendor (HPE) on ia64 and the hardware rdrand* instructions on x86. I know that part of the OpenSSL code rather intimately.
--
Randall Becker
Also from the GTA


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16 23:20         ` rsbecker
@ 2021-11-17  0:47           ` Carlo Arenas
  2021-11-17  3:05             ` rsbecker
  2021-11-17  1:03           ` brian m. carlson
  1 sibling, 1 reply; 37+ messages in thread
From: Carlo Arenas @ 2021-11-17  0:47 UTC (permalink / raw)
  To: rsbecker; +Cc: brian m. carlson, Jeff King, git

On Tue, Nov 16, 2021 at 4:01 PM <rsbecker@nexbridge.com> wrote:
>
> We do link with libcurl and use OpenSSL as a DLL to handle TLS. The underlying random source for the nonstop-* configurations as of OpenSSL 3.0 are PNRG supplied by the vendor (HPE) on ia64 and the hardware rdrand* instructions on x86. I know that part of the OpenSSL code rather intimately.

Older versions of OpenSSL exported (AFAIK) a usable version of
arc4random_buf() that could have helped here; it seems to still be
there in libressl[1] which is mostly API compatible and might be worth
looking into IMHO even if as you pointed out will need an
implementation similar to what OpenSSL does internally.

[1] https://cvsweb.openbsd.org/src/lib/libcrypto/arc4random/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16 23:20         ` rsbecker
  2021-11-17  0:47           ` Carlo Arenas
@ 2021-11-17  1:03           ` brian m. carlson
  2021-11-17  1:50             ` Carlo Arenas
  2021-11-17  3:03             ` rsbecker
  1 sibling, 2 replies; 37+ messages in thread
From: brian m. carlson @ 2021-11-17  1:03 UTC (permalink / raw)
  To: rsbecker; +Cc: 'Jeff King', git

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

On 2021-11-16 at 23:20:45, rsbecker@nexbridge.com wrote:
> We do link with libcurl and use OpenSSL as a DLL to handle TLS. The
> underlying random source for the nonstop-* configurations as of
> OpenSSL 3.0 are PNRG supplied by the vendor (HPE) on ia64 and the
> hardware rdrand* instructions on x86. I know that part of the OpenSSL
> code rather intimately.

Great, as long as you don't define NO_OPENSSL, I think I can make this
work with OpenSSL by calling RAND_bytes, which will use whatever OpenSSL
uses.  I'll work on that for a v2 to see if that will meet the needs for
your platform, and if not, I'll try something else.

That should also have the pleasant side effect of making this more
portable even for those people who do have less common platforms, since
OpenSSL will likely be an option there.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17  1:03           ` brian m. carlson
@ 2021-11-17  1:50             ` Carlo Arenas
  2021-11-17  3:04               ` Jeff King
  2021-11-17  3:03             ` rsbecker
  1 sibling, 1 reply; 37+ messages in thread
From: Carlo Arenas @ 2021-11-17  1:50 UTC (permalink / raw)
  To: brian m. carlson, rsbecker, Jeff King, git

On Tue, Nov 16, 2021 at 5:04 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2021-11-16 at 23:20:45, rsbecker@nexbridge.com wrote:
> > We do link with libcurl and use OpenSSL as a DLL to handle TLS. The
> > underlying random source for the nonstop-* configurations as of
> > OpenSSL 3.0 are PNRG supplied by the vendor (HPE) on ia64 and the
> > hardware rdrand* instructions on x86. I know that part of the OpenSSL
> > code rather intimately.
>
> Great, as long as you don't define NO_OPENSSL, I think I can make this
> work with OpenSSL by calling RAND_bytes, which will use whatever OpenSSL
> uses.

not that RAND_bytes return high entropy bytes (like /dev/random) and
is therefore limited and prone to draining, blocking and erroring when
drained, so if we are going this route, will most likely need a second
layer on top that doesn't block (like arc4random does), and at that
point I would think we would rather use something battle tested than
our own.

for the little amount of random data we need, it might be wiser to
fallback to something POSIX like lrand48 which is most likely to be
available, but of course your tests that consume lots of random data
will need to change.

Carlo

PS. Probably missing context as I don't know what was discussed
previously, but indeed making this the libc problem by using mkstemp
(plus some compatibility on top), like Peff mentioned seems like a
more straightforward "fix"

  I'll work on that for a v2 to see if that will meet the needs for
> your platform, and if not, I'll try something else.
>
> That should also have the pleasant side effect of making this more
> portable even for those people who do have less common platforms, since
> OpenSSL will likely be an option there.
> --
> brian m. carlson (he/him or they/them)
> Toronto, Ontario, CA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17  1:03           ` brian m. carlson
  2021-11-17  1:50             ` Carlo Arenas
@ 2021-11-17  3:03             ` rsbecker
  1 sibling, 0 replies; 37+ messages in thread
From: rsbecker @ 2021-11-17  3:03 UTC (permalink / raw)
  To: 'brian m. carlson'; +Cc: 'Jeff King', git

On November 16, 2021 8:03 PM, brian m. carlson wrote:
> On 2021-11-16 at 23:20:45, rsbecker@nexbridge.com wrote:
> > We do link with libcurl and use OpenSSL as a DLL to handle TLS. The
> > underlying random source for the nonstop-* configurations as of
> > OpenSSL 3.0 are PNRG supplied by the vendor (HPE) on ia64 and the
> > hardware rdrand* instructions on x86. I know that part of the OpenSSL
> > code rather intimately.
> 
> Great, as long as you don't define NO_OPENSSL, I think I can make this work
> with OpenSSL by calling RAND_bytes, which will use whatever OpenSSL uses.
> I'll work on that for a v2 to see if that will meet the needs for your platform,
> and if not, I'll try something else.
> 
> That should also have the pleasant side effect of making this more portable
> even for those people who do have less common platforms, since OpenSSL
> will likely be an option there.

I checked config.mak.uname. We should be fine with that qualification.

Regards,
Randall


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17  1:50             ` Carlo Arenas
@ 2021-11-17  3:04               ` Jeff King
  2021-11-17  3:12                 ` rsbecker
  2021-11-17  3:36                 ` Carlo Arenas
  0 siblings, 2 replies; 37+ messages in thread
From: Jeff King @ 2021-11-17  3:04 UTC (permalink / raw)
  To: Carlo Arenas; +Cc: brian m. carlson, rsbecker, git

On Tue, Nov 16, 2021 at 05:50:44PM -0800, Carlo Arenas wrote:

> for the little amount of random data we need, it might be wiser to
> fallback to something POSIX like lrand48 which is most likely to be
> available, but of course your tests that consume lots of random data
> will need to change.

Unfortunately that won't help. You have to seed lrand48 with something,
which usually means pid and/or timestamp. Which are predictable to an
attacker, which was the start of the whole conversation. You really need
_some_ source of entropy, and only the OS can provide that.

> PS. Probably missing context as I don't know what was discussed
> previously, but indeed making this the libc problem by using mkstemp
> (plus some compatibility on top), like Peff mentioned seems like a
> more straightforward "fix"

It might be nice if it works. I don't recall all of the reasons that led
us to implement our own mkstemp in the first place. So the first step
would probably be digging in the history and the archive to find that
out, and whether it still applies.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17  0:47           ` Carlo Arenas
@ 2021-11-17  3:05             ` rsbecker
  0 siblings, 0 replies; 37+ messages in thread
From: rsbecker @ 2021-11-17  3:05 UTC (permalink / raw)
  To: 'Carlo Arenas'
  Cc: 'brian m. carlson', 'Jeff King', git

On November 16, 2021 7:48 PM, Carlo Arenas wrote:
> On Tue, Nov 16, 2021 at 4:01 PM <rsbecker@nexbridge.com> wrote:
> >
> > We do link with libcurl and use OpenSSL as a DLL to handle TLS. The
> underlying random source for the nonstop-* configurations as of OpenSSL
> 3.0 are PNRG supplied by the vendor (HPE) on ia64 and the hardware
> rdrand* instructions on x86. I know that part of the OpenSSL code rather
> intimately.
> 
> Older versions of OpenSSL exported (AFAIK) a usable version of
> arc4random_buf() that could have helped here; it seems to still be there in
> libressl[1] which is mostly API compatible and might be worth looking into
> IMHO even if as you pointed out will need an implementation similar to what
> OpenSSL does internally.
> 
> [1] https://cvsweb.openbsd.org/src/lib/libcrypto/arc4random/

I do not see arc4random being used in our builds going back to OpenSSL 1.0.2, which is as far back as I go anyway.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17  3:04               ` Jeff King
@ 2021-11-17  3:12                 ` rsbecker
  2021-11-17  3:36                 ` Carlo Arenas
  1 sibling, 0 replies; 37+ messages in thread
From: rsbecker @ 2021-11-17  3:12 UTC (permalink / raw)
  To: 'Jeff King', 'Carlo Arenas'
  Cc: 'brian m. carlson', git

On November 16, 2021 10:04 PM, Jeff King wrote:
> On Tue, Nov 16, 2021 at 05:50:44PM -0800, Carlo Arenas wrote:
> 
> > for the little amount of random data we need, it might be wiser to
> > fallback to something POSIX like lrand48 which is most likely to be
> > available, but of course your tests that consume lots of random data
> > will need to change.
> 
> Unfortunately that won't help. You have to seed lrand48 with something,
> which usually means pid and/or timestamp. Which are predictable to an
> attacker, which was the start of the whole conversation. You really need
> _some_ source of entropy, and only the OS can provide that.
> 
> > PS. Probably missing context as I don't know what was discussed
> > previously, but indeed making this the libc problem by using mkstemp
> > (plus some compatibility on top), like Peff mentioned seems like a
> > more straightforward "fix"
> 
> It might be nice if it works. I don't recall all of the reasons that led us to
> implement our own mkstemp in the first place. So the first step would
> probably be digging in the history and the archive to find that out, and
> whether it still applies.

mkstemp is more recent than mktemp and not implemented everywhere, sadly, and despite my whining about it. That may be why. It is actually available on recent NonStop platforms, so no real issue. mkstemp does allocate a file descriptor, which can be expensive and not always desired.
--Randall


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17  3:04               ` Jeff King
  2021-11-17  3:12                 ` rsbecker
@ 2021-11-17  3:36                 ` Carlo Arenas
  2021-11-17 20:01                   ` Jeff King
  1 sibling, 1 reply; 37+ messages in thread
From: Carlo Arenas @ 2021-11-17  3:36 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, rsbecker, git

On Tue, Nov 16, 2021 at 7:04 PM Jeff King <peff@peff.net> wrote:
>
> On Tue, Nov 16, 2021 at 05:50:44PM -0800, Carlo Arenas wrote:
>
> > for the little amount of random data we need, it might be wiser to
> > fallback to something POSIX like lrand48 which is most likely to be
> > available, but of course your tests that consume lots of random data
> > will need to change.
>
> Unfortunately that won't help. You have to seed lrand48 with something,
> which usually means pid and/or timestamp. Which are predictable to an
> attacker, which was the start of the whole conversation. You really need
> _some_ source of entropy, and only the OS can provide that.

again, showing my ignorance here; but that "something" doesn't need to
be guessable externally; ex: git add could use as seed contents from
the file that is adding, or even better mix it up with the other
sources as a poor man's /dev/urandom

I agree though that having a true random source will require the OS,
but isn't it about generating 6 random letters?

Carlo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-16  3:35 ` [PATCH 1/2] wrapper: add a helper to generate numbers from " brian m. carlson
  2021-11-16 15:31   ` Jeff King
@ 2021-11-17  7:39   ` Junio C Hamano
  2021-11-17 23:01     ` brian m. carlson
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2021-11-17  7:39 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Finally, add a self-test option here to make sure that our buffer
> handling is correct and we aren't truncating data.  We simply read 64
> KiB and then make sure we've seen each byte.  The probability of this
> test failing spuriously is less than 10^-100.

I saw that 10^-100 math in the other message, and have no problem
with that, but I am not sure how such a test makes "sure that our
buffer handling is correct and we aren't truncating data."  If you
thought you are generate 64kiB of random bytes but a bug caused you
to actually use 32kiB of random bytes with 32kiB of other garbage,
wouldn't you still have enough entropy left that you would be likely
to paint all 256 buckets?

I also agree with Peff's comment about making these look as if many
of them can be specified at once, when only one of them would
actually be in effect.  Giving one Makefile macro that the builder
can set to a single value would be much less confusing.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/2] Generate temporary files using a CSPRNG
  2021-11-16 21:06   ` Jeff King
@ 2021-11-17  8:36     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-17  8:36 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git


On Tue, Nov 16 2021, Jeff King wrote:

> On Tue, Nov 16, 2021 at 09:35:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> I tried testing this codepath real quick now with:
>>     
>>     diff --git a/wrapper.c b/wrapper.c
>>     index 36e12119d76..2f3755886fb 100644
>>     --- a/wrapper.c
>>     +++ b/wrapper.c
>>     @@ -497,6 +497,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>>                             v /= num_letters;
>>                     }
>>      
>>     +               BUG("%s", pattern);
>>                     fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
>>                     if (fd >= 0)
>>                             return fd;
>>     
>> And then doing:
>> 
>>     grep BUG test-results/*.out
>> 
>> And the resulting output is all of the form:
>> 
>>     .git/objects/9f/tmp_obj_FOzEcZ
>>     .git/objects/pack/tmp_pack_fJC0RI
>> 
>> And a couple of:
>> 
>>     .git/info/refs_Lctaew
>> 
>> I.e. these are all cases where we're creating in-repo tempfiles, we're
>> not racing someone in /tmp/ for these, except perhaps in some cases I've
>> missed (but you allude to) where we presumably should just move those
>> into .git/tmp/, at least by default.
>
> Your patch is way too aggressive. By bailing via BUG(), most commands
> will fail, so we never get to the interesting ones (e.g., we would not
> ever get to the point of writing out a tag signature for gpg to verify,
> because we'd barf when trying to create the tag in the first place).
>
> Try:
>
> diff --git a/wrapper.c b/wrapper.c
> index 36e12119d7..5218a4b3bd 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -497,6 +497,10 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>  			v /= num_letters;
>  		}
>  
> +		{
> +			static struct trace_key t = TRACE_KEY_INIT(TEMPFILE);
> +			trace_printf_key(&t, "%s", pattern);
> +		}
>  		fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
>  		if (fd >= 0)
>  			return fd;
>
> And then:
>
>   GIT_TRACE_TEMPFILE=/tmp/foo make test
>   grep ^/tmp /tmp/foo | wc -l
>
> turns up hundreds of hits.

Thanks, there's a long tail of these, but I came up with this crappy
one-liner one regex at a time while looking at it:

    cat /tmp/git_mkstemps_mode.trace | perl -pe 's[/[0-9a-f]{2}/][/HH/]; s[/incoming-\K[^/]+][XXX]; s[/tmp/\K[^_]+][XXX]; s/tmp_(idx|obj|pack)_\K[a-zA-Z0-9]+$/XXX/; s[/objects/\
K../][$1??/]g; s[^/run/user.*/objects/][<systemd run/user>/objects/]; s[(vtag_tmp|pack_|refs_)\K.*][XXX]; '|sort|uniq -c|sort -nr|less

Which gives us:

    893 .git/objects/pack/tmp_pack_XXX
    836 ./objects/??/tmp_obj_XXX
    722 .git/objects/pack/tmp_idx_XXX
    401 <systemd run/user>/objects/incoming-XXX/HH/tmp_obj_XXX
    366 /run/user/1001/tmp/XXX_pack_XXX
    289 <systemd run/user>/objects/??/tmp_obj_XXX
    261 .git/info/refs_XXX
    258 /tmp/XXX_vtag_tmpXXX
    185 clone.git/objects/??/tmp_obj_XXX
     77 /tmp/XXX_file
     72 marks-test/.git/objects/??/tmp_obj_XXX
     71 <systemd run/user>/objects/pack/tmp_pack_XXX
     69 <systemd run/user>/objects/pack/tmp_idx_XXX
     34 objects/pack/tmp_pack_XXX
     34 objects/pack/tmp_idx_XXX
     25 /run/user/1001/tmp/XXX.git/objects/??/tmp_obj_XXX
     20 info/refs_XXX
     12 /tmp/XXX_text
     12 foo.git/objects/??/tmp_obj_XXX

I.e. this is stuff that's either already in .git, or a small handful of
special-cases such as "git verify-tag".

>> If there are cases where we actually need this hardening because we're
>> writing in a shared /tmp/ and not .git/, then surely we're better having
>> those API users call a differently named function, or to move those
>> users to using a .git/tmp/ unless they configure things otherwise?
>
> Assuming you can write to .git/tmp means that conceptually read-only
> operations (like verifying tags) require write access to the repository.

That leaves the "differently named function" which I think we should
really do in either case.

I.e. if I'm verifying lots of tags then I'm better off on a modern
systemd system using /run/user/`id -u`, as opposed to /tmp/ which is
often disk-backed. So being aware of $XDG_RUNTIME_DIR seems like a
sensible thing in either case.

And on those systems the DoS aspect of this becomes a non-issue, that
directory is only writable by one (non-super)user.

I think there's a big advantage to having any tricky CSPRNG-implementing
code in its own corner like that.

It means that e.g. if gpg learns some mode to do this that doesn't
require tempfiles, and we're confident we don't create things in /tmp
otherwise that we could drop it, or users who don't want git shipping a
CSPRNG can compile it out.

But I really don't see why it isn't an acceptable solution for git to
just die here if we fail to create the Nth tempfile in a row.

Or something simpler like having the "git verify-tag" code fall back to
writing in say $HOME/.cache/git, which is another simple way to avoid
the issue entirely in most cases.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17  3:36                 ` Carlo Arenas
@ 2021-11-17 20:01                   ` Jeff King
  2021-11-17 20:19                     ` rsbecker
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff King @ 2021-11-17 20:01 UTC (permalink / raw)
  To: Carlo Arenas; +Cc: brian m. carlson, rsbecker, git

On Tue, Nov 16, 2021 at 07:36:51PM -0800, Carlo Arenas wrote:

> > > for the little amount of random data we need, it might be wiser to
> > > fallback to something POSIX like lrand48 which is most likely to be
> > > available, but of course your tests that consume lots of random data
> > > will need to change.
> >
> > Unfortunately that won't help. You have to seed lrand48 with something,
> > which usually means pid and/or timestamp. Which are predictable to an
> > attacker, which was the start of the whole conversation. You really need
> > _some_ source of entropy, and only the OS can provide that.
> 
> again, showing my ignorance here; but that "something" doesn't need to
> be guessable externally; ex: git add could use as seed contents from
> the file that is adding, or even better mix it up with the other
> sources as a poor man's /dev/urandom

Those contents are still predictable. So you've made the attacker's job
a little harder (now they have to block tempfiles for, say, each tag
you're going to verify), but haven't changed the fundamental problem.

It definitely would help in _some_ threat models, but I think we should
strive for a solution that can be explained clearly as "nobody can DoS
your tempfiles" without complicated qualifications.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17 20:01                   ` Jeff King
@ 2021-11-17 20:19                     ` rsbecker
  2021-11-17 23:30                       ` brian m. carlson
  0 siblings, 1 reply; 37+ messages in thread
From: rsbecker @ 2021-11-17 20:19 UTC (permalink / raw)
  To: 'Jeff King', 'Carlo Arenas'
  Cc: 'brian m. carlson', git

On November 17, 2021 3:02 PM, Jeff King wrote:
> On Tue, Nov 16, 2021 at 07:36:51PM -0800, Carlo Arenas wrote:
> 
> > > > for the little amount of random data we need, it might be wiser to
> > > > fallback to something POSIX like lrand48 which is most likely to
> > > > be available, but of course your tests that consume lots of random
> > > > data will need to change.
> > >
> > > Unfortunately that won't help. You have to seed lrand48 with
> > > something, which usually means pid and/or timestamp. Which are
> > > predictable to an attacker, which was the start of the whole
> > > conversation. You really need _some_ source of entropy, and only the OS
> can provide that.
> >
> > again, showing my ignorance here; but that "something" doesn't need to
> > be guessable externally; ex: git add could use as seed contents from
> > the file that is adding, or even better mix it up with the other
> > sources as a poor man's /dev/urandom
> 
> Those contents are still predictable. So you've made the attacker's job a little
> harder (now they have to block tempfiles for, say, each tag you're going to
> verify), but haven't changed the fundamental problem.
> 
> It definitely would help in _some_ threat models, but I think we should strive
> for a solution that can be explained clearly as "nobody can DoS your
> tempfiles" without complicated qualifications.

I missed this one... lrand48 is also not generally available. I don’t think it is even available on Windows.

If we need a generalized solution, it probably needs to be abstracted in git-compat-util.h and compat/rand.[ch], so that the platform maintainers can plug in whatever decent platform randomization happens to be available, if any. We know that rand() is vulnerable, but it might be the only generally available fallback. Perhaps get the compat layer in place with a test suite that exercises the implementation before getting into the general git code base - maybe based on jitterentropy or sslrng. Agree on an interface, decide on a period of time to implement, send the word out that this needs to get done, and hope for the best. I have code that passes FIPS-140 for NonStop ia64 (-ish although not jitterentropy) and x86, and I'm happy to contribute some of this.

Randall

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17  7:39   ` Junio C Hamano
@ 2021-11-17 23:01     ` brian m. carlson
  2021-11-18  7:19       ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2021-11-17 23:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

On 2021-11-17 at 07:39:08, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > Finally, add a self-test option here to make sure that our buffer
> > handling is correct and we aren't truncating data.  We simply read 64
> > KiB and then make sure we've seen each byte.  The probability of this
> > test failing spuriously is less than 10^-100.
> 
> I saw that 10^-100 math in the other message, and have no problem
> with that, but I am not sure how such a test makes "sure that our
> buffer handling is correct and we aren't truncating data."  If you
> thought you are generate 64kiB of random bytes but a bug caused you
> to actually use 32kiB of random bytes with 32kiB of other garbage,
> wouldn't you still have enough entropy left that you would be likely
> to paint all 256 buckets?

True, but our code processes smaller chunks at a time, which means that
theoretically we'd notice before then.  For example, getentropy(2) won't
process chunks larger than 256 bytes.

If we don't think there's value, I can just remove it.

> I also agree with Peff's comment about making these look as if many
> of them can be specified at once, when only one of them would
> actually be in effect.  Giving one Makefile macro that the builder
> can set to a single value would be much less confusing.

I can use one Makefile macro, sure.  I think we'll still need multiple
macros for the actual C code because we can't really do a string
comparison in the C preprocessor.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17 20:19                     ` rsbecker
@ 2021-11-17 23:30                       ` brian m. carlson
  2021-11-17 23:34                         ` rsbecker
  0 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2021-11-17 23:30 UTC (permalink / raw)
  To: rsbecker; +Cc: 'Jeff King', 'Carlo Arenas', git

[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]

On 2021-11-17 at 20:19:49, rsbecker@nexbridge.com wrote:
> I missed this one... lrand48 is also not generally available. I don’t think it is even available on Windows.
> 
> If we need a generalized solution, it probably needs to be abstracted in git-compat-util.h and compat/rand.[ch], so that the platform maintainers can plug in whatever decent platform randomization happens to be available, if any. We know that rand() is vulnerable, but it might be the only generally available fallback. Perhaps get the compat layer in place with a test suite that exercises the implementation before getting into the general git code base - maybe based on jitterentropy or sslrng. Agree on an interface, decide on a period of time to implement, send the word out that this needs to get done, and hope for the best. I have code that passes FIPS-140 for NonStop ia64 (-ish although not jitterentropy) and x86, and I'm happy to contribute some of this.

I think in this case I'd like to try to stick with OpenSSL or other
standard interfaces if that's going to meet folks' needs.  I can write
an HMAC-DRBG, but getting entropy is the tricky part, and jitterentropy
approaches are controversial because it's not clear how unpredictable
they are.  I'm also specifically trying to avoid anything that's
architecture specific like RDRAND, since that means we have to carry
assembly code, and on some systems RDRAND is broken, which means that
you have to test for that and then pass the output into another CSPRNG.
I'm also not sure how maintainable such code is, since I don't think
there are many people on the list who would be familiar enough with
those algorithms to maintain it.  Plus there's always the rule, "Don't
write your own crypto."

Using OpenSSL or system-provided interfaces is much, much easier, it
means users can use Git in FIPS-certified environments, and it avoids us
ending up with subtly broken code in the future.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17 23:30                       ` brian m. carlson
@ 2021-11-17 23:34                         ` rsbecker
  0 siblings, 0 replies; 37+ messages in thread
From: rsbecker @ 2021-11-17 23:34 UTC (permalink / raw)
  To: 'brian m. carlson'
  Cc: 'Jeff King', 'Carlo Arenas', git

On November 17, 2021 6:31 PM, brian m. carlson wrote:
> To: rsbecker@nexbridge.com
> Cc: 'Jeff King' <peff@peff.net>; 'Carlo Arenas' <carenas@gmail.com>;
> git@vger.kernel.org
> Subject: Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a
> CSPRNG
> 
> On 2021-11-17 at 20:19:49, rsbecker@nexbridge.com wrote:
> > I missed this one... lrand48 is also not generally available. I don’t think it is
> even available on Windows.
> >
> > If we need a generalized solution, it probably needs to be abstracted in git-
> compat-util.h and compat/rand.[ch], so that the platform maintainers can
> plug in whatever decent platform randomization happens to be available, if
> any. We know that rand() is vulnerable, but it might be the only generally
> available fallback. Perhaps get the compat layer in place with a test suite that
> exercises the implementation before getting into the general git code base -
> maybe based on jitterentropy or sslrng. Agree on an interface, decide on a
> period of time to implement, send the word out that this needs to get done,
> and hope for the best. I have code that passes FIPS-140 for NonStop ia64 (-
> ish although not jitterentropy) and x86, and I'm happy to contribute some of
> this.
> 
> I think in this case I'd like to try to stick with OpenSSL or other standard
> interfaces if that's going to meet folks' needs.  I can write an HMAC-DRBG,
> but getting entropy is the tricky part, and jitterentropy approaches are
> controversial because it's not clear how unpredictable they are.  I'm also
> specifically trying to avoid anything that's architecture specific like RDRAND,
> since that means we have to carry assembly code, and on some systems
> RDRAND is broken, which means that you have to test for that and then pass
> the output into another CSPRNG.
> I'm also not sure how maintainable such code is, since I don't think there are
> many people on the list who would be familiar enough with those algorithms
> to maintain it.  Plus there's always the rule, "Don't write your own crypto."
> 
> Using OpenSSL or system-provided interfaces is much, much easier, it means
> users can use Git in FIPS-certified environments, and it avoids us ending up
> with subtly broken code in the future.

I agree wholeheartedly. git in FIPS-certified environments is one of my actual goals - well, in this case, I am a proxy for my customers'. Sticking with OpenSSL would be far preferable to me than basically reimplementing what OpenSSL does. Even OpenSSH uses OpenSSL.

Regards,
Randall


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-17 23:01     ` brian m. carlson
@ 2021-11-18  7:19       ` Junio C Hamano
  2021-11-18 22:16         ` brian m. carlson
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2021-11-18  7:19 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2021-11-17 at 07:39:08, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> 
>> > Finally, add a self-test option here to make sure that our buffer
>> > handling is correct and we aren't truncating data.  We simply read 64
>> > KiB and then make sure we've seen each byte.  The probability of this
>> > test failing spuriously is less than 10^-100.
>> 
>> I saw that 10^-100 math in the other message, and have no problem
>> with that, but I am not sure how such a test makes "sure that our
>> buffer handling is correct and we aren't truncating data."  If you
>> thought you are generate 64kiB of random bytes but a bug caused you
>> to actually use 32kiB of random bytes with 32kiB of other garbage,
>> wouldn't you still have enough entropy left that you would be likely
>> to paint all 256 buckets?
>
> True, but our code processes smaller chunks at a time, which means that
> theoretically we'd notice before then.  For example, getentropy(2) won't
> process chunks larger than 256 bytes.

Sorry, you lost me.

> If we don't think there's value, I can just remove it.

It is not that I do not think there is value.  I am not sure where
this code is getting its value from.

We grab 1k at a time and repeat that 64 times.  

Presumably csprn_bytes() grabs bytes from underlying mechanism in
smaller chunk, but would not return until it fills the buffer---ah,
your "make sure our buffer handling is correct" is primarily about
the check that we get full 1k bytes in the loop?  We ask 1k chunk 64
times and we must get full 1k chunk every time?

What I was wondering about was the other half of the check, ensuring
all buckets[] are painted that gave us the cute 10^-100 math.

+	int buckets[256] = { 0 };
+	unsigned char buf[1024];
+	unsigned long count = 64 * 1024;
+	int i;
+
+	while (count) {
+		if (csprng_bytes(buf, sizeof(buf)) < 0) {
+			perror("failed to read");
+			return 3;
+		}
+		for (i = 0; i < sizeof(buf); i++)
+			buckets[buf[i]]++;
+		count -= sizeof(buf);
+	}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-18  7:19       ` Junio C Hamano
@ 2021-11-18 22:16         ` brian m. carlson
  2021-11-22  9:10           ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2021-11-18 22:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1009 bytes --]

On 2021-11-18 at 07:19:08, Junio C Hamano wrote:
> Presumably csprn_bytes() grabs bytes from underlying mechanism in
> smaller chunk, but would not return until it fills the buffer---ah,
> your "make sure our buffer handling is correct" is primarily about
> the check that we get full 1k bytes in the loop?  We ask 1k chunk 64
> times and we must get full 1k chunk every time?

Yes, that's what we'd expect to happen.

> What I was wondering about was the other half of the check, ensuring
> all buckets[] are painted that gave us the cute 10^-100 math.

Say the buffer handling is incorrect and we read only a few bytes
instead of the full 1 KiB.  Then we'll end up filling only some of the
buckets, and the check will fail much of the time, because we won't get
sufficient number of random bytes to fill all the buckets.

The check is that we got enough data that looks like random bytes over
the course of our requests.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] wrapper: add a helper to generate numbers from a CSPRNG
  2021-11-18 22:16         ` brian m. carlson
@ 2021-11-22  9:10           ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2021-11-22  9:10 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2021-11-18 at 07:19:08, Junio C Hamano wrote:
>> Presumably csprn_bytes() grabs bytes from underlying mechanism in
>> smaller chunk, but would not return until it fills the buffer---ah,
>> your "make sure our buffer handling is correct" is primarily about
>> the check that we get full 1k bytes in the loop?  We ask 1k chunk 64
>> times and we must get full 1k chunk every time?
>
> Yes, that's what we'd expect to happen.
>
>> What I was wondering about was the other half of the check, ensuring
>> all buckets[] are painted that gave us the cute 10^-100 math.
>
> Say the buffer handling is incorrect and we read only a few bytes
> instead of the full 1 KiB.  Then we'll end up filling only some of the
> buckets, and the check will fail much of the time, because we won't get
> sufficient number of random bytes to fill all the buckets.

... meaning (64 * a few bytes) is small enough such that some slots
in buckets[] will be left untouched (and the remainder of 1kB is
untouched --- but the buffer[] is not initialized in any way, so
it's not like such an "oops, we only fed a few bytes" bug would
leave the rest to NUL or anything)?

> The check is that we got enough data that looks like random bytes over
> the course of our requests.

If the check were doing so, yes, I would have understood (whether I
agreed with it or not), but the check is "if we taint each and every
bucket[] even once, we are OK", not "bucket[] should be more or less
evenly touched", and that is why I do/did not understand the test.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-11-22  9:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16  3:35 [PATCH 0/2] Generate temporary files using a CSPRNG brian m. carlson
2021-11-16  3:35 ` [PATCH 1/2] wrapper: add a helper to generate numbers from " brian m. carlson
2021-11-16 15:31   ` Jeff King
2021-11-16 16:01     ` rsbecker
2021-11-16 18:22       ` Taylor Blau
2021-11-16 19:58         ` rsbecker
2021-11-16 22:41       ` brian m. carlson
2021-11-16 23:20         ` rsbecker
2021-11-17  0:47           ` Carlo Arenas
2021-11-17  3:05             ` rsbecker
2021-11-17  1:03           ` brian m. carlson
2021-11-17  1:50             ` Carlo Arenas
2021-11-17  3:04               ` Jeff King
2021-11-17  3:12                 ` rsbecker
2021-11-17  3:36                 ` Carlo Arenas
2021-11-17 20:01                   ` Jeff King
2021-11-17 20:19                     ` rsbecker
2021-11-17 23:30                       ` brian m. carlson
2021-11-17 23:34                         ` rsbecker
2021-11-17  3:03             ` rsbecker
2021-11-17  7:39   ` Junio C Hamano
2021-11-17 23:01     ` brian m. carlson
2021-11-18  7:19       ` Junio C Hamano
2021-11-18 22:16         ` brian m. carlson
2021-11-22  9:10           ` Junio C Hamano
2021-11-16  3:35 ` [PATCH 2/2] wrapper: use a CSPRNG to generate random file names brian m. carlson
2021-11-16 15:36   ` Jeff King
2021-11-16 18:28     ` Taylor Blau
2021-11-16 18:57       ` Junio C Hamano
2021-11-16 19:21         ` Jeff King
2021-11-16 19:33           ` Taylor Blau
2021-11-16 15:44 ` [PATCH 0/2] Generate temporary files using a CSPRNG Jeff King
2021-11-16 22:17   ` brian m. carlson
2021-11-16 22:29     ` rsbecker
2021-11-16 20:35 ` Ævar Arnfjörð Bjarmason
2021-11-16 21:06   ` Jeff King
2021-11-17  8:36     ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).