[PATCH 0/7] read/write_in

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/7] read/write_in_full leftovers
@ 2017-09-25 20:26 Jeff King
  2017-09-25 20:27 ` [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
                   ` (7 more replies)
  0 siblings, 8 replies; 51+ messages in thread
From: Jeff King @ 2017-09-25 20:26 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

This series addresses the bits leftover from the discussion two weeks
ago in:

  https://public-inbox.org/git/20170913170807.cyx7rrpoyhaauvol@sigill.intra.peff.net/

and its subthread. I don't think any of these is a real problem in
practice, so this can be considered as a cleanup. I'm on the fence over
whether the final one is a good idea. See the commit message for
details.

  [1/7]: files-backend: prefer "0" for write_in_full() error check
  [2/7]: notes-merge: drop dead zero-write code
  [3/7]: read_in_full: reset errno before reading
  [4/7]: get-tar-commit-id: prefer "!=" for read_in_full() error check
  [5/7]: worktree: use xsize_t to access file size
  [6/7]: worktree: check the result of read_in_full()
  [7/7]: add xread_in_full() helper

 builtin/get-tar-commit-id.c |  5 +----
 builtin/worktree.c          | 11 ++++++++---
 bulk-checkin.c              |  4 +---
 cache.h                     |  7 +++++++
 combine-diff.c              |  8 +-------
 csum-file.c                 |  6 +-----
 notes-merge.c               |  2 --
 pack-write.c                |  3 +--
 refs/files-backend.c        |  2 +-
 wrapper.c                   | 12 ++++++++++++
 10 files changed, 33 insertions(+), 27 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check
  2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
@ 2017-09-25 20:27 ` Jeff King
  2017-09-25 21:59   ` Jonathan Nieder
  2017-09-25 20:27 ` [PATCH 2/7] notes-merge: drop dead zero-write code Jeff King
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 20:27 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Commit 06f46f237a (avoid "write_in_full(fd, buf, len) !=
len" pattern, 2017-09-13) converted this callsite from:

  write_in_full(...) != 1

to

  write_in_full(...) < 0

But during the conflict resolution in c50424a6f0 (Merge
branch 'jk/write-in-full-fix', 2017-09-25), this morphed
into

  write_in_full(...) < 1

This behaves as we want, but we prefer to avoid modeling the
"less than length" error-check which can be subtly buggy, as
shown in efacf609c8 (config: avoid "write_in_full(fd, buf,
len) < len" pattern, 2017-09-13).

Signed-off-by: Jeff King <peff@peff.net>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index dac33628b3..e35c64c001 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3007,7 +3007,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
 		} else if (update &&
 			   (write_in_full(get_lock_file_fd(&lock->lk),
 				oid_to_hex(&cb.last_kept_oid), GIT_SHA1_HEXSZ) < 0 ||
-			    write_str_in_full(get_lock_file_fd(&lock->lk), "\n") < 1 ||
+			    write_str_in_full(get_lock_file_fd(&lock->lk), "\n") < 0 ||
 			    close_ref_gently(lock) < 0)) {
 			status |= error("couldn't write %s",
 					get_lock_file_path(&lock->lk));
-- 
2.14.1.1148.ga2561536a1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/7] notes-merge: drop dead zero-write code
  2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
  2017-09-25 20:27 ` [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
@ 2017-09-25 20:27 ` Jeff King
  2017-09-25 21:59   ` Jonathan Nieder
  2017-09-25 20:29 ` [PATCH 3/7] read_in_full: reset errno before reading Jeff King
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 20:27 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

We call write_in_full() with a size that we know is greater
than zero. The return value can never be zero, then, since
write_in_full() converts such a failed write() into ENOSPC
and returns -1.  We can just drop this branch of the error
handling entirely.

Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 notes-merge.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/notes-merge.c b/notes-merge.c
index 597d43f65c..4352c34a6e 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -308,8 +308,6 @@ static void write_buf_to_worktree(const struct object_id *obj,
 			if (errno == EPIPE)
 				break;
 			die_errno("notes-merge");
-		} else if (!ret) {
-			die("notes-merge: disk full?");
 		}
 		size -= ret;
 		buf += ret;
-- 
2.14.1.1148.ga2561536a1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
  2017-09-25 20:27 ` [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
  2017-09-25 20:27 ` [PATCH 2/7] notes-merge: drop dead zero-write code Jeff King
@ 2017-09-25 20:29 ` Jeff King
  2017-09-25 22:09   ` Jonathan Nieder
  2017-09-25 20:29 ` [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check Jeff King
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 20:29 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Many callers of read_in_full() complain when we do not read
their full byte-count. But a check like:

  if (read_in_full(fd, buf, len) != len)
	  return error_errno("unable to read");

conflates two problem conditions:

  1. A real error from read().

  2. There were fewer than "len" bytes available.

In the first case, showing the user strerror(errno) is
useful. But in the second, we may see a random errno that
was set by some previous system call.

In an ideal world, callers would always distinguish between
these cases and give a useful message for each. But as an
easy way to make our imperfect world better, let's reset
errno to a known value. The best we can do is "0", which
will yield something like:

  unable to read: Success

That's not great, but at least it's deterministic and makes
it clear that we didn't see an error from read().

Signed-off-by: Jeff King <peff@peff.net>
---
 wrapper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/wrapper.c b/wrapper.c
index 61aba0b5c1..f55debc92d 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -314,6 +314,7 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
 	char *p = buf;
 	ssize_t total = 0;
 
+	errno = 0;
 	while (count > 0) {
 		ssize_t loaded = xread(fd, p, count);
 		if (loaded < 0)
-- 
2.14.1.1148.ga2561536a1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check
  2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
                   ` (2 preceding siblings ...)
  2017-09-25 20:29 ` [PATCH 3/7] read_in_full: reset errno before reading Jeff King
@ 2017-09-25 20:29 ` Jeff King
  2017-09-25 22:10   ` Jonathan Nieder
  2017-09-25 20:30 ` [PATCH 5/7] worktree: use xsize_t to access file size Jeff King
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 20:29 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Comparing the result of read_in_full() using less-than is
potentially dangerous, as discussed in 561598cfcf
(read_pack_header: handle signed/unsigned comparison in read
result, 2017-09-13).

The instance in get-tar-commit-id is OK, because the
HEADERSIZE macro expands to a signed integer. But if it were
switched to an unsigned type like:

  size_t HEADERSIZE = ...;

this would be a bug. Let's use the more robust "!="
construct.

We can also drop the useless "n" variable while we're at it.

Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/get-tar-commit-id.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 6d9a79f9b3..259ad40339 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -20,14 +20,12 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
 	struct ustar_header *header = (struct ustar_header *)buffer;
 	char *content = buffer + RECORDSIZE;
 	const char *comment;
-	ssize_t n;
 
 	if (argc != 1)
 		usage(builtin_get_tar_commit_id_usage);
 
-	n = read_in_full(0, buffer, HEADERSIZE);
-	if (n < HEADERSIZE)
-		die("git get-tar-commit-id: read error");
+	if (read_in_full(0, buffer, HEADERSIZE) != HEADERSIZE)
+		die_errno("git get-tar-commit-id: read error");
 	if (header->typeflag[0] != 'g')
 		return 1;
 	if (!skip_prefix(content, "52 comment=", &comment))
-- 
2.14.1.1148.ga2561536a1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/7] worktree: use xsize_t to access file size
  2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
                   ` (3 preceding siblings ...)
  2017-09-25 20:29 ` [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check Jeff King
@ 2017-09-25 20:30 ` Jeff King
  2017-09-25 22:11   ` Jonathan Nieder
  2017-09-25 20:31 ` [PATCH 6/7] worktree: check the result of read_in_full() Jeff King
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 20:30 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

To read the "gitdir" file into memory, we stat the file and
allocate a buffer. But we store the size in an "int", which
may be truncated. We should use a size_t and xsize_t(),
which will detect truncation.

An overflow is unlikely for a "gitdir" file, but it's a good
practice to model.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/worktree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index de26849f55..2f4a4ef9cd 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -38,7 +38,8 @@ static int prune_worktree(const char *id, struct strbuf *reason)
 {
 	struct stat st;
 	char *path;
-	int fd, len;
+	int fd;
+	size_t len;
 
 	if (!is_directory(git_path("worktrees/%s", id))) {
 		strbuf_addf(reason, _("Removing worktrees/%s: not a valid directory"), id);
@@ -56,7 +57,7 @@ static int prune_worktree(const char *id, struct strbuf *reason)
 			    id, strerror(errno));
 		return 1;
 	}
-	len = st.st_size;
+	len = xsize_t(st.st_size);
 	path = xmallocz(len);
 	read_in_full(fd, path, len);
 	close(fd);
-- 
2.14.1.1148.ga2561536a1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 6/7] worktree: check the result of read_in_full()
  2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
                   ` (4 preceding siblings ...)
  2017-09-25 20:30 ` [PATCH 5/7] worktree: use xsize_t to access file size Jeff King
@ 2017-09-25 20:31 ` Jeff King
  2017-09-25 22:14   ` Jonathan Nieder
  2017-09-25 20:33 ` [PATCH 7/7] add xread_in_full() helper Jeff King
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 20:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

We try to read "len" bytes into a buffer and just assume
that it happened correctly. In practice this should usually
be the case, since we just stat'd the file to get the
length.  But we could be fooled by transient errors or by
other processes racily truncating the file.

Let's be more careful. There's a slim chance this could
catch a real error, but it also prevents people and tools
from getting worried while reading the code.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/worktree.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 2f4a4ef9cd..87b3d70b0b 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf *reason)
 	}
 	len = xsize_t(st.st_size);
 	path = xmallocz(len);
-	read_in_full(fd, path, len);
+	if (read_in_full(fd, path, len) != len) {
+		strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did not match stat (%s)"),
+			    id, strerror(errno));
+		return 1;
+	}
 	close(fd);
 	while (len && (path[len - 1] == '\n' || path[len - 1] == '\r'))
 		len--;
-- 
2.14.1.1148.ga2561536a1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 7/7] add xread_in_full() helper
  2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
                   ` (5 preceding siblings ...)
  2017-09-25 20:31 ` [PATCH 6/7] worktree: check the result of read_in_full() Jeff King
@ 2017-09-25 20:33 ` Jeff King
  2017-09-25 22:16   ` Jonathan Nieder
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 20:33 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Many callers of read_in_full() expect to see exactly "len"
bytes, and die if that isn't the case. Since there are two
interesting error cases:

  1. We saw a read() error.

  2. We saw EOF with fewer bytes than expected.

we would ideally distinguish between them when reporting to
the user. Let's add a helper to make this easier. We can
convert cases that currently lump the two conditions
together (improving their error messages) and ones that
already distinguish (shortening their code).

There are a number of read_in_full() calls left that we
can't (or shouldn't) convert, for one or more of these
reasons:

 - Some are not trying to read an exact number in the first
   place. So they only care about case 1, and are fine.

 - Some call error() instead of die(). We could possibly
   accommodate these, at the expense of making our helper a
   bit more clunky.

 - Some print nothing and return a failing code up the
   stack, in which case somebody eventually looks at errno.
   Handling these with a helper would be hard. Ideally there
   would be some errno value that we could set for short
   read, but there's no standard one (ENODATA is kind-of
   accurate, but it's not really "no data"; it's "not enough
   data", and we wouldn't want to get confused with a real
   ENODATA error).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/get-tar-commit-id.c |  3 +--
 bulk-checkin.c              |  4 +---
 cache.h                     |  7 +++++++
 combine-diff.c              |  8 +-------
 csum-file.c                 |  6 +-----
 pack-write.c                |  3 +--
 wrapper.c                   | 11 +++++++++++
 7 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 259ad40339..0b97196afc 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -24,8 +24,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
 	if (argc != 1)
 		usage(builtin_get_tar_commit_id_usage);
 
-	if (read_in_full(0, buffer, HEADERSIZE) != HEADERSIZE)
-		die_errno("git get-tar-commit-id: read error");
+	xread_in_full(0, buffer, HEADERSIZE, "stdin");
 	if (header->typeflag[0] != 'g')
 		return 1;
 	if (!skip_prefix(content, "52 comment=", &comment))
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9a1f6c49ab..a9743785fb 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -115,9 +115,7 @@ static int stream_to_pack(struct bulk_checkin_state *state,
 
 		if (size && !s.avail_in) {
 			ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf);
-			if (read_in_full(fd, ibuf, rsize) != rsize)
-				die("failed to read %d bytes from '%s'",
-				    (int)rsize, path);
+			xread_in_full(fd, ibuf, rsize, path);
 			offset += rsize;
 			if (*already_hashed_to < offset) {
 				size_t hsize = offset - *already_hashed_to;
diff --git a/cache.h b/cache.h
index 49b083ee0a..deec532228 100644
--- a/cache.h
+++ b/cache.h
@@ -1757,6 +1757,13 @@ extern ssize_t read_in_full(int fd, void *buf, size_t count);
 extern ssize_t write_in_full(int fd, const void *buf, size_t count);
 extern ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);
 
+/*
+ * Like read_in_full(), but die on errors or a failure to read exactly "count"
+ * bytes. The "desc" string will be inserted into the error message as "reading
+ * %s".
+ */
+extern void xread_in_full(int fd, void *buf, size_t count, const char *desc);
+
 static inline ssize_t write_str_in_full(int fd, const char *str)
 {
 	return write_in_full(fd, str, strlen(str));
diff --git a/combine-diff.c b/combine-diff.c
index 9e163d5ada..4bdf797a10 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1027,7 +1027,6 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			free_filespec(df);
 		} else if (0 <= (fd = open(elem->path, O_RDONLY))) {
 			size_t len = xsize_t(st.st_size);
-			ssize_t done;
 			int is_file, i;
 
 			elem->mode = canon_mode(st.st_mode);
@@ -1042,12 +1041,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 
 			result_size = len;
 			result = xmallocz(len);
-
-			done = read_in_full(fd, result, len);
-			if (done < 0)
-				die_errno("read error '%s'", elem->path);
-			else if (done < len)
-				die("early EOF '%s'", elem->path);
+			xread_in_full(fd, result, len, elem->path);
 
 			/* If not a fake symlink, apply filters, e.g. autocrlf */
 			if (is_file) {
diff --git a/csum-file.c b/csum-file.c
index a172199e44..8a9a6075a4 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -15,12 +15,8 @@ static void flush(struct sha1file *f, const void *buf, unsigned int count)
 {
 	if (0 <= f->check_fd && count)  {
 		unsigned char check_buffer[8192];
-		ssize_t ret = read_in_full(f->check_fd, check_buffer, count);
 
-		if (ret < 0)
-			die_errno("%s: sha1 file read error", f->name);
-		if (ret < count)
-			die("%s: sha1 file truncated", f->name);
+		xread_in_full(f->check_fd, check_buffer, count, f->name);
 		if (memcmp(buf, check_buffer, count))
 			die("sha1 file '%s' validation error", f->name);
 	}
diff --git a/pack-write.c b/pack-write.c
index a333ec6754..05a7defdd2 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -219,8 +219,7 @@ void fixup_pack_header_footer(int pack_fd,
 
 	if (lseek(pack_fd, 0, SEEK_SET) != 0)
 		die_errno("Failed seeking to start of '%s'", pack_name);
-	if (read_in_full(pack_fd, &hdr, sizeof(hdr)) != sizeof(hdr))
-		die_errno("Unable to reread header of '%s'", pack_name);
+	xread_in_full(pack_fd, &hdr, sizeof(hdr), pack_name);
 	if (lseek(pack_fd, 0, SEEK_SET) != 0)
 		die_errno("Failed seeking to start of '%s'", pack_name);
 	git_SHA1_Update(&old_sha1_ctx, &hdr, sizeof(hdr));
diff --git a/wrapper.c b/wrapper.c
index f55debc92d..66ba4dbff3 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -329,6 +329,17 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
 	return total;
 }
 
+void xread_in_full(int fd, void *buf, size_t count, const char *desc)
+{
+	ssize_t ret = read_in_full(fd, buf, count);
+
+	if (ret < 0)
+		die_errno("error reading %s", desc);
+	if (ret != count)
+		die("too few bytes while reading %s (expected %"PRIuMAX", got %"PRIuMAX")",
+		    desc, (uintmax_t)count, (uintmax_t)ret);
+}
+
 ssize_t write_in_full(int fd, const void *buf, size_t count)
 {
 	const char *p = buf;
-- 
2.14.1.1148.ga2561536a1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check
  2017-09-25 20:27 ` [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
@ 2017-09-25 21:59   ` Jonathan Nieder
  2017-09-25 23:13     ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 21:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
> Commit 06f46f237a (avoid "write_in_full(fd, buf, len) !=
> len" pattern, 2017-09-13) converted this callsite from:
>
>   write_in_full(...) != 1
>
> to
>
>   write_in_full(...) < 0
>
> But during the conflict resolution in c50424a6f0 (Merge
> branch 'jk/write-in-full-fix', 2017-09-25), this morphed
> into
>
>   write_in_full(...) < 1
>
> This behaves as we want, but we prefer to avoid modeling the
> "less than length" error-check which can be subtly buggy, as
> shown in efacf609c8 (config: avoid "write_in_full(fd, buf,
> len) < len" pattern, 2017-09-13).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  refs/files-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Good eyes.  Thanks.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

By the way, the description from that merge commit

    Many codepaths did not diagnose write failures correctly when disks
    go full, due to their misuse of write_in_full() helper function,
    which have been corrected.

does not look accurate to me.  At least the "Many codepaths" part.
All of those changes except for three should be no-ops.  The scariest
one is the 'long ret = write_in_full(fd, buf, size)' in notes-merge.c,
which is about overflowing a "long" on Windows instead of error
handling.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/7] notes-merge: drop dead zero-write code
  2017-09-25 20:27 ` [PATCH 2/7] notes-merge: drop dead zero-write code Jeff King
@ 2017-09-25 21:59   ` Jonathan Nieder
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 21:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> We call write_in_full() with a size that we know is greater
> than zero. The return value can never be zero, then, since
> write_in_full() converts such a failed write() into ENOSPC
> and returns -1.  We can just drop this branch of the error
> handling entirely.
>
> Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  notes-merge.c | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks for tying up this loose end.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 20:29 ` [PATCH 3/7] read_in_full: reset errno before reading Jeff King
@ 2017-09-25 22:09   ` Jonathan Nieder
  2017-09-25 23:23     ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 22:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> In an ideal world, callers would always distinguish between
> these cases and give a useful message for each. But as an
> easy way to make our imperfect world better, let's reset
> errno to a known value. The best we can do is "0", which
> will yield something like:
>
>   unable to read: Success
>
> That's not great, but at least it's deterministic and makes
> it clear that we didn't see an error from read().

Yuck.  Can we set errno to something more specific instead?

read(2) also doesn't promise not to clobber errno on success.

How about something like the following?

-- >8 --
Subject: read_in_full: set errno for partial reads

Many callers of read_in_full() complain when we do not read their full
byte-count. But a check like:

  if (read_in_full(fd, buf, len) != len)
	return error_errno("unable to read");

conflates two problem conditions:

  1. A real error from read().
  2. There were fewer than "len" bytes available.

In the first case, showing the user strerror(errno) is useful. In the
second, we may see a random errno that was set by some previous system
call.

In an ideal world, callers would always distinguish between these
cases and give a useful message for each. But as an easy way to make
our imperfect world better, let's set errno in the second case to a
known value. There is no standard "Unexpected end of file" errno
value, so instead use EILSEQ, which yields a message like

  unable to read: Illegal byte sequence

That's not great, but at least it's deterministic and makes it
possible to reverse-engineer what went wrong.

Change-Id: I48138052f71b7c6cf35c2d00a10dc8febaca4f10
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 wrapper.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/wrapper.c b/wrapper.c
index 61aba0b5c1..1842a99b87 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
 		ssize_t loaded = xread(fd, p, count);
 		if (loaded < 0)
 			return -1;
-		if (loaded == 0)
+		if (loaded == 0) {
+			errno = EILSEQ;
 			return total;
+		}
 		count -= loaded;
 		p += loaded;
 		total += loaded;
-- 
2.14.1.821.g8fa685d3b7

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check
  2017-09-25 20:29 ` [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check Jeff King
@ 2017-09-25 22:10   ` Jonathan Nieder
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 22:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/get-tar-commit-id.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] worktree: use xsize_t to access file size
  2017-09-25 20:30 ` [PATCH 5/7] worktree: use xsize_t to access file size Jeff King
@ 2017-09-25 22:11   ` Jonathan Nieder
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 22:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> To read the "gitdir" file into memory, we stat the file and
> allocate a buffer. But we store the size in an "int", which
> may be truncated. We should use a size_t and xsize_t(),
> which will detect truncation.
>
> An overflow is unlikely for a "gitdir" file, but it's a good
> practice to model.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/worktree.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/7] worktree: check the result of read_in_full()
  2017-09-25 20:31 ` [PATCH 6/7] worktree: check the result of read_in_full() Jeff King
@ 2017-09-25 22:14   ` Jonathan Nieder
  2017-09-25 23:25     ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 22:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyễn Thái Ngọc Duy

Jeff King wrote:

> We try to read "len" bytes into a buffer and just assume
> that it happened correctly. In practice this should usually
> be the case, since we just stat'd the file to get the
> length.  But we could be fooled by transient errors or by
> other processes racily truncating the file.
>
> Let's be more careful. There's a slim chance this could
> catch a real error, but it also prevents people and tools
> from getting worried while reading the code.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/worktree.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 2f4a4ef9cd..87b3d70b0b 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf *reason)
>  	}
>  	len = xsize_t(st.st_size);
>  	path = xmallocz(len);
> -	read_in_full(fd, path, len);
> +	if (read_in_full(fd, path, len) != len) {
> +		strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did not match stat (%s)"),
> +			    id, strerror(errno));

I'm a little confused.  The 'if' condition checks for a read error but
the message says something about 'stat'.

If we're trying to double-check the 'stat' result, shouldn't we read
all the way to EOF in case the file got longer?

Puzzled,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] add xread_in_full() helper
  2017-09-25 20:33 ` [PATCH 7/7] add xread_in_full() helper Jeff King
@ 2017-09-25 22:16   ` Jonathan Nieder
  2017-09-25 23:27     ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 22:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> Many callers of read_in_full() expect to see exactly "len"
> bytes, and die if that isn't the case.

micronit: Can this be named read_in_full_or_die?

Otherwise it's too easy to mistake for a function like xread, which
has different semantics.

I realize that xmalloc, xmemdupz, etc use a different convention.
That's yet another reason to be explicit, IMHO.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check
  2017-09-25 21:59   ` Jonathan Nieder
@ 2017-09-25 23:13     ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-25 23:13 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Sep 25, 2017 at 02:59:27PM -0700, Jonathan Nieder wrote:

> > But during the conflict resolution in c50424a6f0 (Merge
> > branch 'jk/write-in-full-fix', 2017-09-25), this morphed
> > into
> [...]
> Good eyes.  Thanks.

Sort of. :) I usually continually rebase my topics until they end up
empty. So I notice bits like this either during conflict resolution, or
when the topic is left unexpectedly non-empty.

It's not foolproof, though. Sometimes rebasing a topic on itself is so
painful that I "rebase --skip" liberally in the early parts, and would
miss any merge funniness.

> By the way, the description from that merge commit
> 
>     Many codepaths did not diagnose write failures correctly when disks
>     go full, due to their misuse of write_in_full() helper function,
>     which have been corrected.
> 
> does not look accurate to me.  At least the "Many codepaths" part.
> All of those changes except for three should be no-ops.  The scariest
> one is the 'long ret = write_in_full(fd, buf, size)' in notes-merge.c,
> which is about overflowing a "long" on Windows instead of error
> handling.

The ones in config.c are definitely wrong, too. Though I'd agree that
"many" might be overstating it.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 22:09   ` Jonathan Nieder
@ 2017-09-25 23:23     ` Jeff King
  2017-09-25 23:33       ` Jonathan Nieder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 23:23 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Sep 25, 2017 at 03:09:14PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > In an ideal world, callers would always distinguish between
> > these cases and give a useful message for each. But as an
> > easy way to make our imperfect world better, let's reset
> > errno to a known value. The best we can do is "0", which
> > will yield something like:
> >
> >   unable to read: Success
> >
> > That's not great, but at least it's deterministic and makes
> > it clear that we didn't see an error from read().
> 
> Yuck.  Can we set errno to something more specific instead?

Yes, but what? You've suggested EILSEQ here, but that feels a
bit...funny. I guess at least it's unlikely that read() would ever set
errno to that itself, so we can be reasonably sure that seeing it is a
sign of a short read. I thought ERANGE might be a possible candidate,
but it doesn't quite fit. Ditto for ENODATA.

I _would_ like to have something meaningful, as it would mean the whole
xread_in_full() nonsense from the final patch would be unnecessary. It
would simply be reasonable to show the errno after read_in_full.

Another question is whether EILSEQ (or the others) is actually defined
everywhere we compile. If it isn't, we'll have to define it ourselves
(but to what? strerror() won't return anything useful for it).

> read(2) also doesn't promise not to clobber errno on success.

Yes, though it's only a problem if you're using something other than 0.

Speaking of funny errno clobbering, with your patch:

> diff --git a/wrapper.c b/wrapper.c
> index 61aba0b5c1..1842a99b87 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
>  		ssize_t loaded = xread(fd, p, count);
>  		if (loaded < 0)
>  			return -1;
> -		if (loaded == 0)
> +		if (loaded == 0) {
> +			errno = EILSEQ;
>  			return total;
> +		}

If I do this:

  errno = 0;
  read_in_full(fd, buf, sizeof(buf));
  if (errno)
	die_errno("oops");

then we'll claim an error, even though there was none (remember that
it's only an error for _some_ callers to not read the whole length).

This may be sufficiently odd that we don't need to care about it. There
are some calls (like strtoul) which require this kind of clear-and-check
strategy with errno, but in general we frown on it for calls like
read().

We could also introduce a better helper like this:

  int read_exactly(int fd, void *buf, size_t count)
  {
	ssize_t ret = read_in_full(fd, buf, count);
	if (ret < 0)
		return -1;
	if (ret != count) {
		errno = EILSEQ;
		return -1;
	}
	return 0;
  }

Then we know that touching errno always coincides with an error return.
And it's shorter to check "< 0" compared to "!= count" in the caller.
But of course a caller which wants to distinguish the two cases for its
error messages then has to look at errno:

  if (read_exactly(fd, buf, len) < 0) {
	if (errno == EILSEQ) /* eek, now this abstraction is leaky */
		die("short read");
	else
		die_errno("read error");
  }

I dunno. All options seem pretty gross to me.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/7] worktree: check the result of read_in_full()
  2017-09-25 22:14   ` Jonathan Nieder
@ 2017-09-25 23:25     ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-25 23:25 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Nguyễn Thái Ngọc Duy

On Mon, Sep 25, 2017 at 03:14:26PM -0700, Jonathan Nieder wrote:

> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > index 2f4a4ef9cd..87b3d70b0b 100644
> > --- a/builtin/worktree.c
> > +++ b/builtin/worktree.c
> > @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf *reason)
> >  	}
> >  	len = xsize_t(st.st_size);
> >  	path = xmallocz(len);
> > -	read_in_full(fd, path, len);
> > +	if (read_in_full(fd, path, len) != len) {
> > +		strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did not match stat (%s)"),
> > +			    id, strerror(errno));
> 
> I'm a little confused.  The 'if' condition checks for a read error but
> the message says something about 'stat'.
> 
> If we're trying to double-check the 'stat' result, shouldn't we read
> all the way to EOF in case the file got longer?

If you wanted to really double-check the stat result, yes you'd want to
make sure there aren't extra bytes either. But really we're just making
sure we were able to read _enough_ bytes.

To be honest, I'm tempted to rip out the whole thing and replace it with
strbuf_read_file(), which seems more straightforward.

The function does seem to rely on the stat() to get the mtime, so we'd
have to leave that part in.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] add xread_in_full() helper
  2017-09-25 22:16   ` Jonathan Nieder
@ 2017-09-25 23:27     ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-25 23:27 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Sep 25, 2017 at 03:16:08PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > Many callers of read_in_full() expect to see exactly "len"
> > bytes, and die if that isn't the case.
> 
> micronit: Can this be named read_in_full_or_die?
> 
> Otherwise it's too easy to mistake for a function like xread, which
> has different semantics.
> 
> I realize that xmalloc, xmemdupz, etc use a different convention.
> That's yet another reason to be explicit, IMHO.

Yeah, we've definitely misused the "x" prefix for different things. I
agree that being explicit probably can't hurt.

I wonder if it's worth calling it "read_exactly_or_die()" to emphasize
that not reading enough bytes is a die-able offense.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 23:23     ` Jeff King
@ 2017-09-25 23:33       ` Jonathan Nieder
  2017-09-25 23:37         ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 23:33 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

>> --- a/wrapper.c
>> +++ b/wrapper.c
>> @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
>>  		ssize_t loaded = xread(fd, p, count);
>>  		if (loaded < 0)
>>  			return -1;
>> -		if (loaded == 0)
>> +		if (loaded == 0) {
>> +			errno = EILSEQ;
>>  			return total;
>> +		}
>
> If I do this:
>
>   errno = 0;
>   read_in_full(fd, buf, sizeof(buf));
>   if (errno)
> 	die_errno("oops");
>
> then we'll claim an error, even though there was none (remember that
> it's only an error for _some_ callers to not read the whole length).
>
> This may be sufficiently odd that we don't need to care about it. There
> are some calls (like strtoul) which require this kind of clear-and-check
> strategy with errno, but in general we frown on it for calls like
> read().

Correct.  Actually more than "frown on": except for with the few calls
like strtoul that are advertised to work this way, POSIX does not make
the guarantee the above code would rely on, at all.

So it's not just frowned upon: it's so unportable that the standard
calls it out as something that won't work.

> We could also introduce a better helper like this:
>
>   int read_exactly(int fd, void *buf, size_t count)
>   {
> 	ssize_t ret = read_in_full(fd, buf, count);
> 	if (ret < 0)
> 		return -1;
> 	if (ret != count) {
> 		errno = EILSEQ;
> 		return -1;
> 	}
> 	return 0;
>   }
>
> Then we know that touching errno always coincides with an error return.
> And it's shorter to check "< 0" compared to "!= count" in the caller.
> But of course a caller which wants to distinguish the two cases for its
> error messages then has to look at errno:

That sounds nice, but it doesn't solve the original problem of callers
using read_in_full that way.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 23:33       ` Jonathan Nieder
@ 2017-09-25 23:37         ` Jeff King
  2017-09-25 23:45           ` Jeff King
  2017-09-25 23:52           ` Jonathan Nieder
  0 siblings, 2 replies; 51+ messages in thread
From: Jeff King @ 2017-09-25 23:37 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote:

> > If I do this:
> >
> >   errno = 0;
> >   read_in_full(fd, buf, sizeof(buf));
> >   if (errno)
> > 	die_errno("oops");
> >
> > then we'll claim an error, even though there was none (remember that
> > it's only an error for _some_ callers to not read the whole length).
> >
> > This may be sufficiently odd that we don't need to care about it. There
> > are some calls (like strtoul) which require this kind of clear-and-check
> > strategy with errno, but in general we frown on it for calls like
> > read().
> 
> Correct.  Actually more than "frown on": except for with the few calls
> like strtoul that are advertised to work this way, POSIX does not make
> the guarantee the above code would rely on, at all.
> 
> So it's not just frowned upon: it's so unportable that the standard
> calls it out as something that won't work.

Is it unportable? Certainly read() is free reset errno to zero on
success. But is it allowed to set it to another random value?

I think we're getting pretty academic here, but I'm curious if you have
a good reference.

> > We could also introduce a better helper like this:
> >
> >   int read_exactly(int fd, void *buf, size_t count)
> >   {
> > 	ssize_t ret = read_in_full(fd, buf, count);
> > 	if (ret < 0)
> > 		return -1;
> > 	if (ret != count) {
> > 		errno = EILSEQ;
> > 		return -1;
> > 	}
> > 	return 0;
> >   }
> >
> > Then we know that touching errno always coincides with an error return.
> > And it's shorter to check "< 0" compared to "!= count" in the caller.
> > But of course a caller which wants to distinguish the two cases for its
> > error messages then has to look at errno:
> 
> That sounds nice, but it doesn't solve the original problem of callers
> using read_in_full that way.

Right. None of the options discussed in this patch can do so. They can
only take a caller which doesn't distinguish between the cases, and give
it a deterministic value in errno for the short-read case.

IMHO as long as it _is_ deterministic and recognize as not an error from
read(), that's the best we can do. Which is why I went with "0" in the
first place. Seeing "read error: success" is a common-ish idiom (to me
anyway) for "read didn't fail, but some user-space logic did", if only
because it often happens accidentally.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 23:37         ` Jeff King
@ 2017-09-25 23:45           ` Jeff King
  2017-09-25 23:55             ` Jonathan Nieder
  2017-09-25 23:52           ` Jonathan Nieder
  1 sibling, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-25 23:45 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Sep 25, 2017 at 07:37:32PM -0400, Jeff King wrote:

> > Correct.  Actually more than "frown on": except for with the few calls
> > like strtoul that are advertised to work this way, POSIX does not make
> > the guarantee the above code would rely on, at all.
> > 
> > So it's not just frowned upon: it's so unportable that the standard
> > calls it out as something that won't work.
> 
> Is it unportable? Certainly read() is free reset errno to zero on
> success. But is it allowed to set it to another random value?
> 
> I think we're getting pretty academic here, but I'm curious if you have
> a good reference.

Answering my own question. POSIX says:

  No function in this volume of IEEE Std 1003.1-2001 shall set errno to
  0. The setting of errno after a successful call to a function is
  unspecified unless the description of that function specifies that
  errno shall not be modified.

So that does seem to outlaw errno-only checks for most functions.

It makes me wonder if the recent getdelim() fix is technically violating
this. It should instead be explicitly checking for feof().

> IMHO as long as it _is_ deterministic and recognize as not an error from
> read(), that's the best we can do. Which is why I went with "0" in the
> first place. Seeing "read error: success" is a common-ish idiom (to me
> anyway) for "read didn't fail, but some user-space logic did", if only
> because it often happens accidentally.

Another option I ran across from POSIX:

  [EOVERFLOW]
    The file is a regular file, nbyte is greater than 0, the starting
    position is before the end-of-file, and the starting position is
    greater than or equal to the offset maximum established in the open
    file description associated with fildes.

That's not _exactly_ what's going on here, but it's pretty close. And is
what you would get if you implemented read_exactly() in terms of
something like pread().

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 23:37         ` Jeff King
  2017-09-25 23:45           ` Jeff King
@ 2017-09-25 23:52           ` Jonathan Nieder
  1 sibling, 0 replies; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 23:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
> On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote:
>> Jef King wrote:

>>>   errno = 0;
>>>   read_in_full(fd, buf, sizeof(buf));
>>>   if (errno)
>>> 	die_errno("oops");
[...]
>>>                          in general we frown on it for calls like
>>> read().
>>
>> Correct.  Actually more than "frown on": except for with the few calls
>> like strtoul that are advertised to work this way, POSIX does not make
>> the guarantee the above code would rely on, at all.
>>
>> So it's not just frowned upon: it's so unportable that the standard
>> calls it out as something that won't work.
>
> Is it unportable? Certainly read() is free reset errno to zero on
> success. But is it allowed to set it to another random value?
>
> I think we're getting pretty academic here, but I'm curious if you have
> a good reference.

Yes, it is allowed to set it to another random value.  It's a common
thing for implementations to do when they hit a recoverable error,
which they sometimes do (e.g. think EFAULT, EINTR, ETIMEDOUT, or an
implementation calling strtoul and getting EINVAL or ERANGE).

glibc only recently started trying to avoid this kind of behavior,
because even though the standard allows it, users hate it.

POSIX.1-2008 XSI 2.3 "Error Numbers" says

	Some functions provide the error number in a variable accessed
	through the symbol errno, defined by including the <errno.h>
	header.  The value of errno should only be examined when it is
	indicated to be valid by a function's return value.

See http://austingroupbugs.net/view.php?id=374 for an example where
someone wanted to add a new guarantee of that kind to a function.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 23:45           ` Jeff King
@ 2017-09-25 23:55             ` Jonathan Nieder
  2017-09-26  0:01               ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-25 23:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

>   [EOVERFLOW]
>     The file is a regular file, nbyte is greater than 0, the starting
>     position is before the end-of-file, and the starting position is
>     greater than or equal to the offset maximum established in the open
>     file description associated with fildes.
>
> That's not _exactly_ what's going on here, but it's pretty close. And is
> what you would get if you implemented read_exactly() in terms of
> something like pread().

All that really matters is the strerror() that the user would see.

For EOVERFLOW, that is

	Value too large to be stored in data type

For EILSEQ, it is

	Illegal byte sequence

Neither is perfect, but the latter seems like a better fit for the kind
of file format errors we're describing here.  Of course, it's even
better if we fix the callers and don't try to wedge this into errno.

If you are okay with the too-long/too-short confusion in EOVERFLOW, then
there is EMSGSIZE:

	Message too long

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-25 23:55             ` Jonathan Nieder
@ 2017-09-26  0:01               ` Jeff King
  2017-09-26  0:06                 ` Stefan Beller
  2017-09-26  0:13                 ` Jonathan Nieder
  0 siblings, 2 replies; 51+ messages in thread
From: Jeff King @ 2017-09-26  0:01 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> >   [EOVERFLOW]
> >     The file is a regular file, nbyte is greater than 0, the starting
> >     position is before the end-of-file, and the starting position is
> >     greater than or equal to the offset maximum established in the open
> >     file description associated with fildes.
> >
> > That's not _exactly_ what's going on here, but it's pretty close. And is
> > what you would get if you implemented read_exactly() in terms of
> > something like pread().
> 
> All that really matters is the strerror() that the user would see.

Agreed, though I didn't think those were necessarily standardized.

> For EOVERFLOW, that is
> 
> 	Value too large to be stored in data type

Interestingly that does not seem to be a good description for the case
POSIX describes above.

> For EILSEQ, it is
> 
> 	Illegal byte sequence
> 
> Neither is perfect, but the latter seems like a better fit for the kind
> of file format errors we're describing here.

I find that one actively misleading. It's not the bytes we saw, it's the
lack of them.

> Of course, it's even
> better if we fix the callers and don't try to wedge this into errno.

Yes. This patch is just a stop-gap. Perhaps we should abandon it
entirely. But I couldn't find a way to fix all the callers. If you have
a function that just returns "-1" when it sees a read_in_full() error,
how does _its_ caller tell the difference?

I guess the answer is that the interface of the sub-function calling
read_in_full() needs to change.

> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
> there is EMSGSIZE:
> 
> 	Message too long

I don't really like that one either.

I actually prefer "0" of the all of the options discussed. At least it
is immediately clear that it is not a syscall error.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:01               ` Jeff King
@ 2017-09-26  0:06                 ` Stefan Beller
  2017-09-26  0:09                   ` Jeff King
  2017-09-26  0:13                 ` Jonathan Nieder
  1 sibling, 1 reply; 51+ messages in thread
From: Stefan Beller @ 2017-09-26  0:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, git@vger.kernel.org

On Mon, Sep 25, 2017 at 5:01 PM, Jeff King <peff@peff.net> wrote:

>
>> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
>> there is EMSGSIZE:
>>
>>       Message too long
>
> I don't really like that one either.
>
> I actually prefer "0" of the all of the options discussed. At least it
> is immediately clear that it is not a syscall error.
>
> -Peff

After reading this discussion from the sideline, maybe

  ENODATA         No message is available on the STREAM head read
queue (POSIX.1)

is not too bad after all. Or

  ESPIPE          Invalid seek (POSIX.1)

Technically we do not seek, but one could imagine we'd seek there
to read?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:06                 ` Stefan Beller
@ 2017-09-26  0:09                   ` Jeff King
  2017-09-26  0:16                     ` Jonathan Nieder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-26  0:09 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Nieder, git@vger.kernel.org

On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote:

> >> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
> >> there is EMSGSIZE:
> >>
> >>       Message too long
> >
> > I don't really like that one either.
> >
> > I actually prefer "0" of the all of the options discussed. At least it
> > is immediately clear that it is not a syscall error.
> >
> > -Peff
> 
> After reading this discussion from the sideline, maybe
> 
>   ENODATA         No message is available on the STREAM head read
> queue (POSIX.1)
> 
> is not too bad after all. Or
> 
>   ESPIPE          Invalid seek (POSIX.1)
> 
> Technically we do not seek, but one could imagine we'd seek there
> to read?

ENODATA is not too bad. On my glibc system it yields "No data available"
from strerror(), which is at least comprehensible.

We're still left with the question of whether it is defined everywhere
(and what to fallback to when it isn't).

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:01               ` Jeff King
  2017-09-26  0:06                 ` Stefan Beller
@ 2017-09-26  0:13                 ` Jonathan Nieder
  2017-09-26  0:17                   ` Jeff King
  1 sibling, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-26  0:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
> On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote:

>> All that really matters is the strerror() that the user would see.
>
> Agreed, though I didn't think those were necessarily standardized.

The standard allows them to vary for the sake of internationalization,
but they are de facto constants (probably because of application authors
like us abusing them ;-)).

[...]
>> Of course, it's even
>> better if we fix the callers and don't try to wedge this into errno.
>
> Yes. This patch is just a stop-gap. Perhaps we should abandon it
> entirely. But I couldn't find a way to fix all the callers. If you have
> a function that just returns "-1" when it sees a read_in_full() error,
> how does _its_ caller tell the difference?
>
> I guess the answer is that the interface of the sub-function calling
> read_in_full() needs to change.

Yes. :/

>> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
>> there is EMSGSIZE:
>>
>> 	Message too long
>
> I don't really like that one either.
>
> I actually prefer "0" of the all of the options discussed. At least it
> is immediately clear that it is not a syscall error.

I have a basic aversion to ": Success" in error messages.  Whenever I
see such an error message, I stop trusting the program that produced it
not to be seriously buggy.  Maybe I'm the only one?

If no actual errno works, we could make a custom strerror-like function
with our own custom errors (making them negative to avoid clashing with
standard errno values), but this feels like overkill.

In the same spirit of misleadingly confusing too-long and too-short,
there is also ERANGE ("Result too large"), which doesn't work here.
There's also EPROTO "Protocol error", but that's about protocols, not
file formats.  More vague and therefor maybe more applicable is EBADMSG
"Bad message".  There's also ENOMSG "No message of the desired type".

If the goal is to support debugging, an error like EPIPE "Broken pipe"
or ESPIPE "Invalid seek" would be likely to lead me in the right
direction (wrong file size), even though it is misleading about how
the error surfaced.

We could also avoid trying to be cute and use something generic like
EIO "Input/output error".

Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:09                   ` Jeff King
@ 2017-09-26  0:16                     ` Jonathan Nieder
  2017-09-26  0:17                       ` Jonathan Nieder
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-26  0:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git@vger.kernel.org

On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:
> On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote:

>> After reading this discussion from the sideline, maybe
>>
>>   ENODATA         No message is available on the STREAM head read
>> queue (POSIX.1)
>>
>> is not too bad after all. Or
>>
>>   ESPIPE          Invalid seek (POSIX.1)
>>
>> Technically we do not seek, but one could imagine we'd seek there
>> to read?
>
> ENODATA is not too bad. On my glibc system it yields "No data available"
> from strerror(), which is at least comprehensible.
>
> We're still left with the question of whether it is defined everywhere
> (and what to fallback to when it isn't).

So,

#ifndef EUNDERFLOW
#ifdef ENODATA
#define ENODATA EUNDERFLOW
#else
#define ENODATA ESPIPE
#endif
#endif

?  Windows has ESPIPE, and I'm not sure what other non-POSIX platform we
need to worry about.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:13                 ` Jonathan Nieder
@ 2017-09-26  0:17                   ` Jeff King
  2017-09-26  0:20                     ` Jonathan Nieder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-26  0:17 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Sep 25, 2017 at 05:13:54PM -0700, Jonathan Nieder wrote:

> > I actually prefer "0" of the all of the options discussed. At least it
> > is immediately clear that it is not a syscall error.
> 
> I have a basic aversion to ": Success" in error messages.  Whenever I
> see such an error message, I stop trusting the program that produced it
> not to be seriously buggy.  Maybe I'm the only one?

That's a fair criticism, I think.

> If no actual errno works, we could make a custom strerror-like function
> with our own custom errors (making them negative to avoid clashing with
> standard errno values), but this feels like overkill.

Yes, I also thought about that. It feels pretty invasive (I guess we'd
wrap strerror() with our own custom function so that callers don't have
to remember which one to use).

> In the same spirit of misleadingly confusing too-long and too-short,
> there is also ERANGE ("Result too large"), which doesn't work here.
> There's also EPROTO "Protocol error", but that's about protocols, not
> file formats.  More vague and therefor maybe more applicable is EBADMSG
> "Bad message".  There's also ENOMSG "No message of the desired type".
> 
> If the goal is to support debugging, an error like EPIPE "Broken pipe"
> or ESPIPE "Invalid seek" would be likely to lead me in the right
> direction (wrong file size), even though it is misleading about how
> the error surfaced.
> 
> We could also avoid trying to be cute and use something generic like
> EIO "Input/output error".

I definitely would prefer to avoid EIO, or anything that an actual
read() might return.

What do you think of ENODATA? The glibc text for it is pretty
appropriate. If it's not available everywhere, we'd have to fallback to
something else (EIO? 0?). I don't know how esoteric it is. The likely
candidate to be lacking it is Windows.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:16                     ` Jonathan Nieder
@ 2017-09-26  0:17                       ` Jonathan Nieder
  2017-09-26  0:19                         ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-26  0:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git@vger.kernel.org

Jonathan Nieder wrote:
> On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:

>> ENODATA is not too bad. On my glibc system it yields "No data available"
>> from strerror(), which is at least comprehensible.
>>
>> We're still left with the question of whether it is defined everywhere
>> (and what to fallback to when it isn't).
>
> So,
>
> #ifndef EUNDERFLOW
> #ifdef ENODATA
> #define ENODATA EUNDERFLOW
> #else
> #define ENODATA ESPIPE
> #endif
> #endif
>
> ?

Uh, pretend I said

#ifndef EUNDERFLOW
# ifdef ENODATA
#  define EUNDERFLOW ENODATA
# else
#  define EUNDERFLOW ESPIPE
# endif
#endif

Sorry for the nonsense.
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:17                       ` Jonathan Nieder
@ 2017-09-26  0:19                         ` Jeff King
  2017-09-26  0:22                           ` Jonathan Nieder
  2017-09-26  4:11                           ` Junio C Hamano
  0 siblings, 2 replies; 51+ messages in thread
From: Jeff King @ 2017-09-26  0:19 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Stefan Beller, git@vger.kernel.org

On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote:

> Jonathan Nieder wrote:
> > On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:
> 
> >> ENODATA is not too bad. On my glibc system it yields "No data available"
> >> from strerror(), which is at least comprehensible.
> >>
> >> We're still left with the question of whether it is defined everywhere
> >> (and what to fallback to when it isn't).
> >
> > So,
> >
> > #ifndef EUNDERFLOW
> > #ifdef ENODATA
> > #define ENODATA EUNDERFLOW
> > #else
> > #define ENODATA ESPIPE
> > #endif
> > #endif
> >
> > ?
> 
> Uh, pretend I said
> 
> #ifndef EUNDERFLOW
> # ifdef ENODATA
> #  define EUNDERFLOW ENODATA
> # else
> #  define EUNDERFLOW ESPIPE
> # endif
> #endif

Right, I think our mails just crossed but I'm leaning in this direction.
I'd prefer to call it SHORT_READ_ERRNO or something, though. Your
"#ifndef EUNDERFLOW" had me thinking that this was something that a real
platform might have, but AFAICT you just made it up.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:17                   ` Jeff King
@ 2017-09-26  0:20                     ` Jonathan Nieder
  2017-09-26  0:25                       ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-26  0:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> I definitely would prefer to avoid EIO, or anything that an actual
> read() might return.
>
> What do you think of ENODATA? The glibc text for it is pretty
> appropriate. If it's not available everywhere, we'd have to fallback to
> something else (EIO? 0?). I don't know how esoteric it is. The likely
> candidate to be lacking it is Windows.

ENODATA with a fallback to ESPIPE sounds fine to me.

read() would never produce ESPIPE because it doesn't seek.

So that would be:

/* errno value to use for early EOF */
#ifndef ENODATA
#define ENODATA ESPIPE
#endif

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:19                         ` Jeff King
@ 2017-09-26  0:22                           ` Jonathan Nieder
  2017-09-26  4:11                           ` Junio C Hamano
  1 sibling, 0 replies; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-26  0:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git@vger.kernel.org

Jeff King wrote:
> On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote:

>> #ifndef EUNDERFLOW
>> # ifdef ENODATA
>> #  define EUNDERFLOW ENODATA
>> # else
>> #  define EUNDERFLOW ESPIPE
>> # endif
>> #endif
>
> Right, I think our mails just crossed but I'm leaning in this direction.
> I'd prefer to call it SHORT_READ_ERRNO or something, though. Your
> "#ifndef EUNDERFLOW" had me thinking that this was something that a real
> platform might have, but AFAICT you just made it up.

Agreed.  Two of the risks of replying too quickly. :)

Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:20                     ` Jonathan Nieder
@ 2017-09-26  0:25                       ` Jeff King
  2017-09-26  0:28                         ` Jonathan Nieder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-26  0:25 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Sep 25, 2017 at 05:20:20PM -0700, Jonathan Nieder wrote:

> > What do you think of ENODATA? The glibc text for it is pretty
> > appropriate. If it's not available everywhere, we'd have to fallback to
> > something else (EIO? 0?). I don't know how esoteric it is. The likely
> > candidate to be lacking it is Windows.
> 
> ENODATA with a fallback to ESPIPE sounds fine to me.
> 
> read() would never produce ESPIPE because it doesn't seek.
> 
> So that would be:
> 
> /* errno value to use for early EOF */
> #ifndef ENODATA
> #define ENODATA ESPIPE
> #endif

Thanks, I'll re-roll with that (and thank you Stefan for mentioning
ENODATA again; I came across it early in my search but discounted it as
not quite right semantically. I should have been looking at the
strerror() text, which is what really matters).

What do you think of patch 7 in light of this? If the short-read case
gives us a sane errno, should we even bother trying to consistently
handle its error separately?

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:25                       ` Jeff King
@ 2017-09-26  0:28                         ` Jonathan Nieder
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Nieder @ 2017-09-26  0:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> What do you think of patch 7 in light of this? If the short-read case
> gives us a sane errno, should we even bother trying to consistently
> handle its error separately?

I like the read_exactly_or_die variant because it makes callers more
concise, but on the other hand a caller handling the error can write a
more meaningful error message with the right amount of context.

So I think you're right that it's better to drop patch 7.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  0:19                         ` Jeff King
  2017-09-26  0:22                           ` Jonathan Nieder
@ 2017-09-26  4:11                           ` Junio C Hamano
  2017-09-27  4:07                             ` Jeff King
  1 sibling, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2017-09-26  4:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Stefan Beller, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

>> #ifndef EUNDERFLOW
>> # ifdef ENODATA
>> #  define EUNDERFLOW ENODATA
>> # else
>> #  define EUNDERFLOW ESPIPE
>> # endif
>> #endif
>
> Right, I think our mails just crossed but I'm leaning in this direction.

Hmph, I may be slow (or may be skimming the exchanges too fast), but
what exactly is wrong with "0"?  As long as we do not have to tell
two or more "not exactly an error from syscall in errno" cases, I
would think "0" is the best value to use.

If the syserror message _is_ the issue, then we'd need to either
pick an existing errno that is available everywhere (with possibly
suboptimal message), or pick something and prepare a fallback for
platforms that lack the errno, so picking "0" as the value and use
whatever logic we would have used for the "fallback" would not sound
too bad.  I.e.

	if (read_in_full(..., size) != size)
		if (errno)
			die_errno("oops");
		else
			die("short read");

If the callsite were too numerous,

#define die_errno_or(msg1, msg2) if (errno) die_errno(msg1); else die(msg2)

perhaps?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-26  4:11                           ` Junio C Hamano
@ 2017-09-27  4:07                             ` Jeff King
  2017-09-27  4:27                               ` Junio C Hamano
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-27  4:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Stefan Beller, git@vger.kernel.org

On Tue, Sep 26, 2017 at 01:11:59PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> #ifndef EUNDERFLOW
> >> # ifdef ENODATA
> >> #  define EUNDERFLOW ENODATA
> >> # else
> >> #  define EUNDERFLOW ESPIPE
> >> # endif
> >> #endif
> >
> > Right, I think our mails just crossed but I'm leaning in this direction.
> 
> Hmph, I may be slow (or may be skimming the exchanges too fast), but
> what exactly is wrong with "0"?  As long as we do not have to tell
> two or more "not exactly an error from syscall in errno" cases, I
> would think "0" is the best value to use.

The main reason to avoid "0" is just that the "read error: Success"
message is a bit funny.

> If the syserror message _is_ the issue, then we'd need to either
> pick an existing errno that is available everywhere (with possibly
> suboptimal message), or pick something and prepare a fallback for
> platforms that lack the errno, so picking "0" as the value and use
> whatever logic we would have used for the "fallback" would not sound
> too bad.  I.e.
> 
> 	if (read_in_full(..., size) != size)
> 		if (errno)
> 			die_errno("oops");
> 		else
> 			die("short read");
> 
> If the callsite were too numerous,

You actually don't need errno for that. You can write:

  ret = read_in_full(..., size);
  if (ret < 0)
	die_errno("oops");
  else if (ret != size)
	die("short read");

So I think using errno as a sentinel value to tell between the two cases
doesn't have much value.

> #define die_errno_or(msg1, msg2) if (errno) die_errno(msg1); else die(msg2)
> 
> perhaps?

If you just care about dying, you can wrap the whole read_in_full() in a
helper (which is what my original patch 7 did). But I found there were
cases that didn't want to die. Of course _most_ of those don't bother
looking at errno in the first place. I think the only one that does is
index_core(), which calls error_errno().

So with two helpers (one for die and one for error), I think we could
cover the existing cases. Or we could convert the single error() case to
just handle the logic inline.

One of the reasons I dislike the helper I showed earlier is that it
involves assembling a lego sentence. But the alternative is requiring
the caller to write both sentences from scratch (and writing a good
sentence for the short read really is long: you expected X bytes but got
Y).

I guess just tweaking the errno value does not really give a good "short
read" error either. You get "unable to read foo: No data available",
without the X/Y information.

Sometimes a custom "short read" message can be better, if it's "could
not read enough bytes for packfile header" or similar. You don't need to
know the exact number then.

So I dunno. Maybe I have just talked myself into actually just fixing
all of the read_in_full() call sites manually.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] read_in_full: reset errno before reading
  2017-09-27  4:07                             ` Jeff King
@ 2017-09-27  4:27                               ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2017-09-27  4:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Stefan Beller, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

> You actually don't need errno for that. You can write:
>
>   ret = read_in_full(..., size);
>   if (ret < 0)
> 	die_errno("oops");
>   else if (ret != size)
> 	die("short read");
>
> So I think using errno as a sentinel value to tell between the two cases
> doesn't have much value.
> ...
> One of the reasons I dislike the helper I showed earlier is that it
> involves assembling a lego sentence. But the alternative is requiring
> the caller to write both sentences from scratch (and writing a good
> sentence for the short read really is long: you expected X bytes but got
> Y).

All sensible considerations.  I probably should have read the entire
discussion twice before starting to type X-<.  Sorry about the
noise.

> Sometimes a custom "short read" message can be better, if it's "could
> not read enough bytes for packfile header" or similar. You don't need to
> know the exact number then.

That's true, too.  I dunno, either.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2 0/7] read/write_in_full leftovers
  2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
                   ` (6 preceding siblings ...)
  2017-09-25 20:33 ` [PATCH 7/7] add xread_in_full() helper Jeff King
@ 2017-09-27  5:54 ` Jeff King
  2017-09-27  6:00   ` [PATCH v2 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
                     ` (7 more replies)
  7 siblings, 8 replies; 51+ messages in thread
From: Jeff King @ 2017-09-27  5:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Junio C Hamano

On Mon, Sep 25, 2017 at 04:26:47PM -0400, Jeff King wrote:

> This series addresses the bits leftover from the discussion two weeks
> ago in:
> 
>   https://public-inbox.org/git/20170913170807.cyx7rrpoyhaauvol@sigill.intra.peff.net/
> 
> and its subthread. I don't think any of these is a real problem in
> practice, so this can be considered as a cleanup. I'm on the fence over
> whether the final one is a good idea. See the commit message for
> details.
> 
>   [1/7]: files-backend: prefer "0" for write_in_full() error check
>   [2/7]: notes-merge: drop dead zero-write code
>   [3/7]: read_in_full: reset errno before reading
>   [4/7]: get-tar-commit-id: prefer "!=" for read_in_full() error check
>   [5/7]: worktree: use xsize_t to access file size
>   [6/7]: worktree: check the result of read_in_full()
>   [7/7]: add xread_in_full() helper

Here's a followup based on the discussion on v1. There are actually a
few changes here.

I dropped the "read_in_full() should set errno on short reads" idea (3/7
in the earlier series). It really is the caller's fault for looking at
errno when they know there hasn't been an error in the first place. We
should just bite the bullet and have the callers do the right thing.

I also dropped the "xread_in_full" helper (7/7 earlier). The lego
sentences it created just weren't worth the hassle. Instead, I've fixed
all of the relevant callers to provide good error messages for both
cases. It's a few more lines of code, and it's probably rare for users
to see these in the first place. But it doesn't hurt too much to be
thorough, and I think it's good to model correct error handling. This is
in patches 4 and 5 below.

There are a handful of other changes:

  [1/7]: files-backend: prefer "0" for write_in_full() error check
  [2/7]: notes-merge: drop dead zero-write code

    These two are the same as before.

  [3/7]: prefer "!=" when checking read_in_full() result

    Along with the get-tar-commit-id case, I found two other cases which
    aren't bugs, but which should be updated for style reasons.

  [4/7]: avoid looking at errno for short read_in_full() returns
  [5/7]: distinguish error versus short_read from read_in_full()

    These two are new, and add in the messages. The ones in patch 4 are
    actively wrong (they show a bogus errno). The ones in 5 are just
    less descriptive than they could be. So if we find the extra lines
    of code too much, we could drop 5 (or even convert the ones in 4 to
    just stop looking at errno).

  [6/7]: worktree: use xsize_t to access file size

    Same as before.

  [7/7]: worktree: check the result of read_in_full()

    Similar, but now handles errors and short_reads separately. It also
    avoids leaking descriptors and memory in the error path (noticed by
    Coverity).

 builtin/get-tar-commit-id.c |  6 ++++--
 builtin/worktree.c          | 24 +++++++++++++++++++++---
 bulk-checkin.c              |  5 ++++-
 csum-file.c                 |  2 +-
 notes-merge.c               |  2 --
 pack-write.c                |  7 ++++++-
 packfile.c                  | 11 +++++++++--
 pkt-line.c                  |  2 +-
 refs/files-backend.c        |  2 +-
 sha1_file.c                 | 11 ++++++++---
 10 files changed, 55 insertions(+), 17 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2 1/7] files-backend: prefer "0" for write_in_full() error check
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
@ 2017-09-27  6:00   ` Jeff King
  2017-09-27  6:00   ` [PATCH v2 2/7] notes-merge: drop dead zero-write code Jeff King
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-27  6:00 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Junio C Hamano

Commit 06f46f237a (avoid "write_in_full(fd, buf, len) !=
len" pattern, 2017-09-13) converted this callsite from:

  write_in_full(...) != 1

to

  write_in_full(...) < 0

But during the conflict resolution in c50424a6f0 (Merge
branch 'jk/write-in-full-fix', 2017-09-25), this morphed
into

  write_in_full(...) < 1

This behaves as we want, but we prefer to avoid modeling the
"less than length" error-check which can be subtly buggy, as
shown in efacf609c8 (config: avoid "write_in_full(fd, buf,
len) < len" pattern, 2017-09-13).

Signed-off-by: Jeff King <peff@peff.net>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index dac33628b3..e35c64c001 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3007,7 +3007,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
 		} else if (update &&
 			   (write_in_full(get_lock_file_fd(&lock->lk),
 				oid_to_hex(&cb.last_kept_oid), GIT_SHA1_HEXSZ) < 0 ||
-			    write_str_in_full(get_lock_file_fd(&lock->lk), "\n") < 1 ||
+			    write_str_in_full(get_lock_file_fd(&lock->lk), "\n") < 0 ||
 			    close_ref_gently(lock) < 0)) {
 			status |= error("couldn't write %s",
 					get_lock_file_path(&lock->lk));
-- 
2.14.2.988.g01c8b37dde


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 2/7] notes-merge: drop dead zero-write code
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
  2017-09-27  6:00   ` [PATCH v2 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
@ 2017-09-27  6:00   ` Jeff King
  2017-09-27  6:00   ` [PATCH v2 3/7] prefer "!=" when checking read_in_full() result Jeff King
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-27  6:00 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Junio C Hamano

We call write_in_full() with a size that we know is greater
than zero. The return value can never be zero, then, since
write_in_full() converts such a failed write() into ENOSPC
and returns -1.  We can just drop this branch of the error
handling entirely.

Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 notes-merge.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/notes-merge.c b/notes-merge.c
index 597d43f65c..4352c34a6e 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -308,8 +308,6 @@ static void write_buf_to_worktree(const struct object_id *obj,
 			if (errno == EPIPE)
 				break;
 			die_errno("notes-merge");
-		} else if (!ret) {
-			die("notes-merge: disk full?");
 		}
 		size -= ret;
 		buf += ret;
-- 
2.14.2.988.g01c8b37dde


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 3/7] prefer "!=" when checking read_in_full() result
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
  2017-09-27  6:00   ` [PATCH v2 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
  2017-09-27  6:00   ` [PATCH v2 2/7] notes-merge: drop dead zero-write code Jeff King
@ 2017-09-27  6:00   ` Jeff King
  2017-09-27  6:59     ` Junio C Hamano
  2017-09-27  6:01   ` [PATCH v2 4/7] avoid looking at errno for short read_in_full() returns Jeff King
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-27  6:00 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Junio C Hamano

Comparing the result of read_in_full() using less-than is
potentially dangerous, as a negative return value may be
converted to an unsigned type and be considered a success.
This is discussed further in 561598cfcf (read_pack_header:
handle signed/unsigned comparison in read result,
2017-09-13).

Each of these instances is actually fine in practice:

 - in get-tar-commit-id, the HEADERSIZE macro expands to a
   signed integer. If it were switched to an unsigned type
   (e.g., a size_t), then it would be a bug.

 - the other two callers check for a short read only after
   handling a negative return separately. This is a fine
   practice, but we'd prefer to model "!=" as a general
   rule.

So all of these cases can be considered cleanups and not
actual bugfixes.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/get-tar-commit-id.c | 2 +-
 csum-file.c                 | 2 +-
 pkt-line.c                  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 6d9a79f9b3..cd3e656828 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -26,7 +26,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
 		usage(builtin_get_tar_commit_id_usage);
 
 	n = read_in_full(0, buffer, HEADERSIZE);
-	if (n < HEADERSIZE)
+	if (n != HEADERSIZE)
 		die("git get-tar-commit-id: read error");
 	if (header->typeflag[0] != 'g')
 		return 1;
diff --git a/csum-file.c b/csum-file.c
index a172199e44..2adae04073 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -19,7 +19,7 @@ static void flush(struct sha1file *f, const void *buf, unsigned int count)
 
 		if (ret < 0)
 			die_errno("%s: sha1 file read error", f->name);
-		if (ret < count)
+		if (ret != count)
 			die("%s: sha1 file truncated", f->name);
 		if (memcmp(buf, check_buffer, count))
 			die("sha1 file '%s' validation error", f->name);
diff --git a/pkt-line.c b/pkt-line.c
index 647bbd3bce..93ea311443 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -258,7 +258,7 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 	}
 
 	/* And complain if we didn't get enough bytes to satisfy the read. */
-	if (ret < size) {
+	if (ret != size) {
 		if (options & PACKET_READ_GENTLE_ON_EOF)
 			return -1;
 
-- 
2.14.2.988.g01c8b37dde


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 4/7] avoid looking at errno for short read_in_full() returns
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
                     ` (2 preceding siblings ...)
  2017-09-27  6:00   ` [PATCH v2 3/7] prefer "!=" when checking read_in_full() result Jeff King
@ 2017-09-27  6:01   ` Jeff King
  2017-09-27  6:59     ` Junio C Hamano
  2017-09-27  6:02   ` [PATCH v2 5/7] distinguish error versus short read from read_in_full() Jeff King
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2017-09-27  6:01 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Junio C Hamano

When a caller tries to read a particular set of bytes via
read_in_full(), there are three possible outcomes:

  1. An error, in which case -1 is returned and errno is
     set.

  2. A short read, in which fewer bytes are returned and
     errno is unspecified (we never saw a read error, so we
     may have some random value from whatever syscall failed
     last).

  3. The full read completed successfully.

Many callers handle cases 1 and 2 together by just checking
the result against the requested size. If their combined
error path looks at errno (e.g., by calling die_errno), they
may report a nonsense value.

Let's fix these sites by having them distinguish between the
two error cases. That avoids the random errno confusion, and
lets us give more detailed error messages.

Signed-off-by: Jeff King <peff@peff.net>
---
 pack-write.c |  7 ++++++-
 sha1_file.c  | 11 ++++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/pack-write.c b/pack-write.c
index a333ec6754..fea6284192 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -213,14 +213,19 @@ void fixup_pack_header_footer(int pack_fd,
 	git_SHA_CTX old_sha1_ctx, new_sha1_ctx;
 	struct pack_header hdr;
 	char *buf;
+	ssize_t read_result;
 
 	git_SHA1_Init(&old_sha1_ctx);
 	git_SHA1_Init(&new_sha1_ctx);
 
 	if (lseek(pack_fd, 0, SEEK_SET) != 0)
 		die_errno("Failed seeking to start of '%s'", pack_name);
-	if (read_in_full(pack_fd, &hdr, sizeof(hdr)) != sizeof(hdr))
+	read_result = read_in_full(pack_fd, &hdr, sizeof(hdr));
+	if (read_result < 0)
 		die_errno("Unable to reread header of '%s'", pack_name);
+	else if (read_result != sizeof(hdr))
+		die_errno("Unexpected short read for header of '%s'",
+			  pack_name);
 	if (lseek(pack_fd, 0, SEEK_SET) != 0)
 		die_errno("Failed seeking to start of '%s'", pack_name);
 	git_SHA1_Update(&old_sha1_ctx, &hdr, sizeof(hdr));
diff --git a/sha1_file.c b/sha1_file.c
index 5a2014811f..09ad64ce55 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1748,10 +1748,15 @@ static int index_core(unsigned char *sha1, int fd, size_t size,
 		ret = index_mem(sha1, "", size, type, path, flags);
 	} else if (size <= SMALL_FILE_SIZE) {
 		char *buf = xmalloc(size);
-		if (size == read_in_full(fd, buf, size))
-			ret = index_mem(sha1, buf, size, type, path, flags);
+		ssize_t read_result = read_in_full(fd, buf, size);
+		if (read_result < 0)
+			ret = error_errno("read error while indexing %s",
+					  path ? path : "<unknown>");
+		else if (read_result != size)
+			ret = error("short read while indexing %s",
+				    path ? path : "<unknown>");
 		else
-			ret = error_errno("short read");
+			ret = index_mem(sha1, buf, size, type, path, flags);
 		free(buf);
 	} else {
 		void *buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
-- 
2.14.2.988.g01c8b37dde


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 5/7] distinguish error versus short read from read_in_full()
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
                     ` (3 preceding siblings ...)
  2017-09-27  6:01   ` [PATCH v2 4/7] avoid looking at errno for short read_in_full() returns Jeff King
@ 2017-09-27  6:02   ` Jeff King
  2017-09-27  6:02   ` [PATCH v2 6/7] worktree: use xsize_t to access file size Jeff King
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-27  6:02 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Junio C Hamano

Many callers of read_in_full() expect to see the exact
number of bytes requested, but their error handling lumps
together true read errors and short reads due to unexpected
EOF.

We can give more specific error messages by separating these
cases (showing errno when appropriate, and otherwise
describing the short read).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/get-tar-commit-id.c |  4 +++-
 bulk-checkin.c              |  5 ++++-
 packfile.c                  | 11 +++++++++--
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index cd3e656828..2706fcfaf2 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -26,8 +26,10 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
 		usage(builtin_get_tar_commit_id_usage);
 
 	n = read_in_full(0, buffer, HEADERSIZE);
+	if (n < 0)
+		die_errno("git get-tar-commit-id: read error");
 	if (n != HEADERSIZE)
-		die("git get-tar-commit-id: read error");
+		die_errno("git get-tar-commit-id: EOF before reading tar header");
 	if (header->typeflag[0] != 'g')
 		return 1;
 	if (!skip_prefix(content, "52 comment=", &comment))
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9a1f6c49ab..3310fd210a 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -115,7 +115,10 @@ static int stream_to_pack(struct bulk_checkin_state *state,
 
 		if (size && !s.avail_in) {
 			ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf);
-			if (read_in_full(fd, ibuf, rsize) != rsize)
+			ssize_t read_result = read_in_full(fd, ibuf, rsize);
+			if (read_result < 0)
+				die_errno("failed to read from '%s'", path);
+			if (read_result != rsize)
 				die("failed to read %d bytes from '%s'",
 				    (int)rsize, path);
 			offset += rsize;
diff --git a/packfile.c b/packfile.c
index f69a5c8d60..eab7542487 100644
--- a/packfile.c
+++ b/packfile.c
@@ -442,6 +442,7 @@ static int open_packed_git_1(struct packed_git *p)
 	unsigned char sha1[20];
 	unsigned char *idx_sha1;
 	long fd_flag;
+	ssize_t read_result;
 
 	if (!p->index_data && open_pack_index(p))
 		return error("packfile %s index unavailable", p->pack_name);
@@ -483,7 +484,10 @@ static int open_packed_git_1(struct packed_git *p)
 		return error("cannot set FD_CLOEXEC");
 
 	/* Verify we recognize this pack file format. */
-	if (read_in_full(p->pack_fd, &hdr, sizeof(hdr)) != sizeof(hdr))
+	read_result = read_in_full(p->pack_fd, &hdr, sizeof(hdr));
+	if (read_result < 0)
+		return error_errno("error reading from %s", p->pack_name);
+	if (read_result != sizeof(hdr))
 		return error("file %s is far too short to be a packfile", p->pack_name);
 	if (hdr.hdr_signature != htonl(PACK_SIGNATURE))
 		return error("file %s is not a GIT packfile", p->pack_name);
@@ -500,7 +504,10 @@ static int open_packed_git_1(struct packed_git *p)
 			     p->num_objects);
 	if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1)
 		return error("end of packfile %s is unavailable", p->pack_name);
-	if (read_in_full(p->pack_fd, sha1, sizeof(sha1)) != sizeof(sha1))
+	read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1));
+	if (read_result < 0)
+		return error_errno("error reading from %s", p->pack_name);
+	if (read_result != sizeof(sha1))
 		return error("packfile %s signature is unavailable", p->pack_name);
 	idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40;
 	if (hashcmp(sha1, idx_sha1))
-- 
2.14.2.988.g01c8b37dde


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 6/7] worktree: use xsize_t to access file size
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
                     ` (4 preceding siblings ...)
  2017-09-27  6:02   ` [PATCH v2 5/7] distinguish error versus short read from read_in_full() Jeff King
@ 2017-09-27  6:02   ` Jeff King
  2017-09-27  6:02   ` [PATCH v2 7/7] worktree: check the result of read_in_full() Jeff King
  2017-09-27  6:57   ` [PATCH v2 0/7] read/write_in_full leftovers Junio C Hamano
  7 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-27  6:02 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Junio C Hamano

To read the "gitdir" file into memory, we stat the file and
allocate a buffer. But we store the size in an "int", which
may be truncated. We should use a size_t and xsize_t(),
which will detect truncation.

An overflow is unlikely for a "gitdir" file, but it's a good
practice to model.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/worktree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index de26849f55..2f4a4ef9cd 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -38,7 +38,8 @@ static int prune_worktree(const char *id, struct strbuf *reason)
 {
 	struct stat st;
 	char *path;
-	int fd, len;
+	int fd;
+	size_t len;
 
 	if (!is_directory(git_path("worktrees/%s", id))) {
 		strbuf_addf(reason, _("Removing worktrees/%s: not a valid directory"), id);
@@ -56,7 +57,7 @@ static int prune_worktree(const char *id, struct strbuf *reason)
 			    id, strerror(errno));
 		return 1;
 	}
-	len = st.st_size;
+	len = xsize_t(st.st_size);
 	path = xmallocz(len);
 	read_in_full(fd, path, len);
 	close(fd);
-- 
2.14.2.988.g01c8b37dde


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 7/7] worktree: check the result of read_in_full()
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
                     ` (5 preceding siblings ...)
  2017-09-27  6:02   ` [PATCH v2 6/7] worktree: use xsize_t to access file size Jeff King
@ 2017-09-27  6:02   ` Jeff King
  2017-09-27  6:57   ` [PATCH v2 0/7] read/write_in_full leftovers Junio C Hamano
  7 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-27  6:02 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Junio C Hamano

We try to read "len" bytes into a buffer and just assume
that it happened correctly. In practice this should usually
be the case, since we just stat'd the file to get the
length.  But we could be fooled by transient errors or by
other processes racily truncating the file.

Let's be more careful. There's a slim chance this could
catch a real error, but it also prevents people and tools
from getting worried while reading the code.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/worktree.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 2f4a4ef9cd..7b9307aa58 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -40,6 +40,7 @@ static int prune_worktree(const char *id, struct strbuf *reason)
 	char *path;
 	int fd;
 	size_t len;
+	ssize_t read_result;
 
 	if (!is_directory(git_path("worktrees/%s", id))) {
 		strbuf_addf(reason, _("Removing worktrees/%s: not a valid directory"), id);
@@ -59,8 +60,24 @@ static int prune_worktree(const char *id, struct strbuf *reason)
 	}
 	len = xsize_t(st.st_size);
 	path = xmallocz(len);
-	read_in_full(fd, path, len);
+
+	read_result = read_in_full(fd, path, len);
+	if (read_result < 0) {
+		strbuf_addf(reason, _("Removing worktrees/%s: unable to read gitdir file (%s)"),
+			    id, strerror(errno));
+		close(fd);
+		free(path);
+		return 1;
+	}
 	close(fd);
+
+	if (read_result != len) {
+		strbuf_addf(reason,
+			    _("Removing worktrees/%s: short read (expected %"PRIuMAX" bytes, read %"PRIuMAX")"),
+			    id, (uintmax_t)len, (uintmax_t)read_result);
+		free(path);
+		return 1;
+	}
 	while (len && (path[len - 1] == '\n' || path[len - 1] == '\r'))
 		len--;
 	if (!len) {
-- 
2.14.2.988.g01c8b37dde

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 0/7] read/write_in_full leftovers
  2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
                     ` (6 preceding siblings ...)
  2017-09-27  6:02   ` [PATCH v2 7/7] worktree: check the result of read_in_full() Jeff King
@ 2017-09-27  6:57   ` Junio C Hamano
  7 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2017-09-27  6:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Nieder

Jeff King <peff@peff.net> writes:

> I dropped the "read_in_full() should set errno on short reads" idea (3/7
> in the earlier series). It really is the caller's fault for looking at
> errno when they know there hasn't been an error in the first place. We
> should just bite the bullet and have the callers do the right thing.
>
> I also dropped the "xread_in_full" helper (7/7 earlier). The lego
> sentences it created just weren't worth the hassle. Instead, I've fixed
> all of the relevant callers to provide good error messages for both
> cases. It's a few more lines of code, and it's probably rare for users
> to see these in the first place. But it doesn't hurt too much to be
> thorough, and I think it's good to model correct error handling. This is
> in patches 4 and 5 below.

Thanks for being thorough.  My comment on 3/7 might be taken as
contradicting with how 5/7 ties the loose ends up, but I do not care
too deeply either way.

Will queue.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 3/7] prefer "!=" when checking read_in_full() result
  2017-09-27  6:00   ` [PATCH v2 3/7] prefer "!=" when checking read_in_full() result Jeff King
@ 2017-09-27  6:59     ` Junio C Hamano
  2017-09-27  7:09       ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2017-09-27  6:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Nieder

Jeff King <peff@peff.net> writes:

> diff --git a/csum-file.c b/csum-file.c
> index a172199e44..2adae04073 100644
> --- a/csum-file.c
> +++ b/csum-file.c
> @@ -19,7 +19,7 @@ static void flush(struct sha1file *f, const void *buf, unsigned int count)
>  
>  		if (ret < 0)
>  			die_errno("%s: sha1 file read error", f->name);
> -		if (ret < count)
> +		if (ret != count)
>  			die("%s: sha1 file truncated", f->name);

I personally find that this "ret < count" that comes after checking
if ret is negative expresses what it is checking in a more natural
way than "ret must be exactly count".

It is not a big deal, as either one needs to be read with an
understanding that read_in_full() would read at most "count" bytes
to see if the right condition is being checked to declare
"truncated" anyway.  But I somehow find

	ret = read up to count
	if (ret < 0)
		read failed
	if (ret < count)
		we failed to read as much as expected

a bit more natural.

> diff --git a/pkt-line.c b/pkt-line.c
> index 647bbd3bce..93ea311443 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -258,7 +258,7 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
>  	}
>  
>  	/* And complain if we didn't get enough bytes to satisfy the read. */
> -	if (ret < size) {
> +	if (ret != size) {
>  		if (options & PACKET_READ_GENTLE_ON_EOF)
>  			return -1;

Likewise, even though it is harder to see that this follows another
explicit check for "ret < 0".

Thanks.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/7] avoid looking at errno for short read_in_full() returns
  2017-09-27  6:01   ` [PATCH v2 4/7] avoid looking at errno for short read_in_full() returns Jeff King
@ 2017-09-27  6:59     ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2017-09-27  6:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Nieder

Jeff King <peff@peff.net> writes:

> When a caller tries to read a particular set of bytes via
> read_in_full(), there are three possible outcomes:
>
>   1. An error, in which case -1 is returned and errno is
>      set.
>
>   2. A short read, in which fewer bytes are returned and
>      errno is unspecified (we never saw a read error, so we
>      may have some random value from whatever syscall failed
>      last).
>
>   3. The full read completed successfully.
>
> Many callers handle cases 1 and 2 together by just checking
> the result against the requested size. If their combined
> error path looks at errno (e.g., by calling die_errno), they
> may report a nonsense value.
>
> Let's fix these sites by having them distinguish between the
> two error cases. That avoids the random errno confusion, and
> lets us give more detailed error messages.

The resulting code might be more verbose but I personally think both
of them give a lot more clear error indication.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 3/7] prefer "!=" when checking read_in_full() result
  2017-09-27  6:59     ` Junio C Hamano
@ 2017-09-27  7:09       ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2017-09-27  7:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Nieder

On Wed, Sep 27, 2017 at 03:59:15PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > diff --git a/csum-file.c b/csum-file.c
> > index a172199e44..2adae04073 100644
> > --- a/csum-file.c
> > +++ b/csum-file.c
> > @@ -19,7 +19,7 @@ static void flush(struct sha1file *f, const void *buf, unsigned int count)
> >  
> >  		if (ret < 0)
> >  			die_errno("%s: sha1 file read error", f->name);
> > -		if (ret < count)
> > +		if (ret != count)
> >  			die("%s: sha1 file truncated", f->name);
> 
> I personally find that this "ret < count" that comes after checking
> if ret is negative expresses what it is checking in a more natural
> way than "ret must be exactly count".
> 
> It is not a big deal, as either one needs to be read with an
> understanding that read_in_full() would read at most "count" bytes
> to see if the right condition is being checked to declare
> "truncated" anyway.  But I somehow find
> 
> 	ret = read up to count
> 	if (ret < 0)
> 		read failed
> 	if (ret < count)
> 		we failed to read as much as expected
> 
> a bit more natural.

I actually do, too, and I wouldn't terribly mind to drop these. Mostly I
didn't want people blindly copying only the second half. The fact that
the second condition is OK only because the first condition is present
is somewhat subtle.

I also wondered if this would shut up -Wsign-compare. But it doesn't
seem to complain about the originals, which kind of surprises me.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2017-09-27  7:10 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-25 20:26 [PATCH 0/7] read/write_in_full leftovers Jeff King
2017-09-25 20:27 ` [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
2017-09-25 21:59   ` Jonathan Nieder
2017-09-25 23:13     ` Jeff King
2017-09-25 20:27 ` [PATCH 2/7] notes-merge: drop dead zero-write code Jeff King
2017-09-25 21:59   ` Jonathan Nieder
2017-09-25 20:29 ` [PATCH 3/7] read_in_full: reset errno before reading Jeff King
2017-09-25 22:09   ` Jonathan Nieder
2017-09-25 23:23     ` Jeff King
2017-09-25 23:33       ` Jonathan Nieder
2017-09-25 23:37         ` Jeff King
2017-09-25 23:45           ` Jeff King
2017-09-25 23:55             ` Jonathan Nieder
2017-09-26  0:01               ` Jeff King
2017-09-26  0:06                 ` Stefan Beller
2017-09-26  0:09                   ` Jeff King
2017-09-26  0:16                     ` Jonathan Nieder
2017-09-26  0:17                       ` Jonathan Nieder
2017-09-26  0:19                         ` Jeff King
2017-09-26  0:22                           ` Jonathan Nieder
2017-09-26  4:11                           ` Junio C Hamano
2017-09-27  4:07                             ` Jeff King
2017-09-27  4:27                               ` Junio C Hamano
2017-09-26  0:13                 ` Jonathan Nieder
2017-09-26  0:17                   ` Jeff King
2017-09-26  0:20                     ` Jonathan Nieder
2017-09-26  0:25                       ` Jeff King
2017-09-26  0:28                         ` Jonathan Nieder
2017-09-25 23:52           ` Jonathan Nieder
2017-09-25 20:29 ` [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check Jeff King
2017-09-25 22:10   ` Jonathan Nieder
2017-09-25 20:30 ` [PATCH 5/7] worktree: use xsize_t to access file size Jeff King
2017-09-25 22:11   ` Jonathan Nieder
2017-09-25 20:31 ` [PATCH 6/7] worktree: check the result of read_in_full() Jeff King
2017-09-25 22:14   ` Jonathan Nieder
2017-09-25 23:25     ` Jeff King
2017-09-25 20:33 ` [PATCH 7/7] add xread_in_full() helper Jeff King
2017-09-25 22:16   ` Jonathan Nieder
2017-09-25 23:27     ` Jeff King
2017-09-27  5:54 ` [PATCH v2 0/7] read/write_in_full leftovers Jeff King
2017-09-27  6:00   ` [PATCH v2 1/7] files-backend: prefer "0" for write_in_full() error check Jeff King
2017-09-27  6:00   ` [PATCH v2 2/7] notes-merge: drop dead zero-write code Jeff King
2017-09-27  6:00   ` [PATCH v2 3/7] prefer "!=" when checking read_in_full() result Jeff King
2017-09-27  6:59     ` Junio C Hamano
2017-09-27  7:09       ` Jeff King
2017-09-27  6:01   ` [PATCH v2 4/7] avoid looking at errno for short read_in_full() returns Jeff King
2017-09-27  6:59     ` Junio C Hamano
2017-09-27  6:02   ` [PATCH v2 5/7] distinguish error versus short read from read_in_full() Jeff King
2017-09-27  6:02   ` [PATCH v2 6/7] worktree: use xsize_t to access file size Jeff King
2017-09-27  6:02   ` [PATCH v2 7/7] worktree: check the result of read_in_full() Jeff King
2017-09-27  6:57   ` [PATCH v2 0/7] read/write_in_full leftovers Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).