[PATCH 0/18] snprintf cleanups

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/18] snprintf cleanups
@ 2017-03-28 19:42 Jeff King
  2017-03-28 19:45 ` [PATCH 01/18] do not check odb_mkstemp return value for errors Jeff King
                   ` (19 more replies)
  0 siblings, 20 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:42 UTC (permalink / raw)
  To: git

Our code base calls snprintf() into a fixed-size buffer in a bunch of
places. Sometimes we check the result, and sometimes we accept a silent
truncation. In some cases an overflow is easy given long input. In some
cases it's impossible. And in some cases it depends on how big PATH_MAX
is on your filesystem, and whether it's actually enforced. :)

This series attempts to give more predictable and consistent results by
removing arbitrary buffer limitations. It also tries to make further
audits of snprintf() easier by converting to xsnprintf() where
appropriate.

There are still some snprintf() calls left after this. A few are in code
that's in flux, or is being cleaned up in nearby series (several of my
recent cleanup series were split off from this). A few should probably
remain (e.g., git-daemon will refuse to consider a repo name larger than
PATH_MAX, which may be a reasonable defense against weird memory tricks.
I wouldn't be sad to see this turned into a strbuf with an explicit
length policy enforced separately, though). And there were a few that I
just didn't get around to converting (the dumb-http walker, for example,
but I think it may need a pretty involved audit overall).

It's a lot of patches, but hopefully they're all pretty straightforward
to read.

  [01/18]: do not check odb_mkstemp return value for errors
  [02/18]: odb_mkstemp: write filename into strbuf
  [03/18]: odb_mkstemp: use git_path_buf
  [04/18]: diff: avoid fixed-size buffer for patch-ids
  [05/18]: tag: use strbuf to format tag header
  [06/18]: fetch: use heap buffer to format reflog
  [07/18]: avoid using fixed PATH_MAX buffers for refs
  [08/18]: avoid using mksnpath for refs
  [09/18]: create_branch: move msg setup closer to point of use
  [10/18]: create_branch: use xstrfmt for reflog message
  [11/18]: name-rev: replace static buffer with strbuf
  [12/18]: receive-pack: print --pack-header directly into argv array
  [13/18]: replace unchecked snprintf calls with heap buffers
  [14/18]: combine-diff: replace malloc/snprintf with xstrfmt
  [15/18]: convert unchecked snprintf into xsnprintf
  [16/18]: transport-helper: replace checked snprintf with xsnprintf
  [17/18]: gc: replace local buffer with git_path
  [18/18]: daemon: use an argv_array to exec children

 bisect.c               |  8 +++++---
 branch.c               | 16 ++++++++--------
 builtin/checkout.c     |  5 ++---
 builtin/fetch.c        |  6 ++++--
 builtin/gc.c           |  8 +-------
 builtin/index-pack.c   | 22 ++++++++++++----------
 builtin/ls-remote.c    | 10 ++++++----
 builtin/name-rev.c     | 21 ++++++++++++---------
 builtin/notes.c        |  9 ++++-----
 builtin/receive-pack.c | 17 ++++++++++-------
 builtin/replace.c      | 50 +++++++++++++++++++++++++++-----------------------
 builtin/rev-parse.c    |  5 +++--
 builtin/tag.c          | 42 ++++++++++++++++++------------------------
 cache.h                |  7 +++++--
 combine-diff.c         |  7 ++++---
 daemon.c               | 38 +++++++++++++++++---------------------
 diff.c                 | 20 +++++++++++++-------
 environment.c          | 14 ++++++--------
 fast-import.c          |  9 +++++----
 grep.c                 |  4 ++--
 http.c                 | 10 +++++-----
 imap-send.c            |  2 +-
 pack-bitmap-write.c    | 14 +++++++-------
 pack-write.c           | 16 ++++++++--------
 refs.c                 | 44 ++++++++++++++++++++++++++------------------
 sha1_file.c            |  4 ++--
 submodule.c            |  2 +-
 transport-helper.c     |  5 +----
 28 files changed, 215 insertions(+), 200 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/18] do not check odb_mkstemp return value for errors
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
@ 2017-03-28 19:45 ` Jeff King
  2017-03-28 19:45 ` [PATCH 02/18] odb_mkstemp: write filename into strbuf Jeff King
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:45 UTC (permalink / raw)
  To: git

The odb_mkstemp function does not return an error; it dies
on failure instead. But many of its callers compare the
resulting descriptor against -1 and die themselves.

Mostly this is just pointless, but it does raise a question
when looking at the callers: if they show the results of the
"template" buffer after a failure, what's in it? The answer
is: it doesn't matter, because it cannot happen.

So let's make that clear by removing the bogus error checks.
In bitmap_writer_finish(), we can drop the error-handling
code entirely. In the other two cases, it's shared with the
open() in another code path; we can just move the
error-check next to that open() call.

And while we're at it, let's flesh out the function's
docstring a bit to make the error behavior clear.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/index-pack.c | 7 ++++---
 cache.h              | 5 ++++-
 pack-bitmap-write.c  | 2 --
 pack-write.c         | 4 ++--
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 88d205f85..49e7175d9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -311,10 +311,11 @@ static const char *open_pack_file(const char *pack_name)
 			output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
 						"pack/tmp_pack_XXXXXX");
 			pack_name = xstrdup(tmp_file);
-		} else
+		} else {
 			output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 0600);
-		if (output_fd < 0)
-			die_errno(_("unable to create '%s'"), pack_name);
+			if (output_fd < 0)
+				die_errno(_("unable to create '%s'"), pack_name);
+		}
 		nothread_data.pack_fd = output_fd;
 	} else {
 		input_fd = open(pack_name, O_RDONLY);
diff --git a/cache.h b/cache.h
index db4120c23..acad7078d 100644
--- a/cache.h
+++ b/cache.h
@@ -1673,7 +1673,10 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 extern void pack_report(void);
 
 /*
- * Create a temporary file rooted in the object database directory.
+ * Create a temporary file rooted in the object database directory, or
+ * die on failure. The filename is taken from "pattern", which should have the
+ * usual "XXXXXX" trailer, and the resulting filename is written into the
+ * "template" buffer. Returns the open descriptor.
  */
 extern int odb_mkstemp(char *template, size_t limit, const char *pattern);
 
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 970559601..44492c346 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -517,8 +517,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 
 	int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_bitmap_XXXXXX");
 
-	if (fd < 0)
-		die_errno("unable to create '%s'", tmp_file);
 	f = sha1fd(fd, tmp_file);
 
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
diff --git a/pack-write.c b/pack-write.c
index 88bc7f9f7..19cb514ea 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -77,9 +77,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 		} else {
 			unlink(index_name);
 			fd = open(index_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+			if (fd < 0)
+				die_errno("unable to create '%s'", index_name);
 		}
-		if (fd < 0)
-			die_errno("unable to create '%s'", index_name);
 		f = sha1fd(fd, index_name);
 	}
 
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/18] odb_mkstemp: write filename into strbuf
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
  2017-03-28 19:45 ` [PATCH 01/18] do not check odb_mkstemp return value for errors Jeff King
@ 2017-03-28 19:45 ` Jeff King
  2017-03-28 19:45 ` [PATCH 03/18] odb_mkstemp: use git_path_buf Jeff King
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:45 UTC (permalink / raw)
  To: git

The odb_mkstemp() function expects the caller to provide a
fixed buffer to write the resulting tempfile name into. But
it creates the template using snprintf without checking the
return value. This means we could silently truncate the
filename.

In practice, it's unlikely that the truncation would end in
the template-pattern that mkstemp needs to open the file. So
we'd probably end up failing either way, unless the path was
specially crafted.

The simplest fix would be to notice the truncation and die.
However, we can observe that most callers immediately
xstrdup() the result anyway. So instead, let's switch to
using a strbuf, which is easier for them (and isn't a big
deal for the other 2 callers, who can just strbuf_release
when they're done with it).

Note that many of the callers used static buffers, but this
was purely to avoid putting a large buffer on the stack. We
never passed the static buffers out of the function, so
there's no complicated memory handling we need to change.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/index-pack.c |  6 +++---
 cache.h              |  2 +-
 environment.c        | 16 ++++++++--------
 fast-import.c        |  9 +++++----
 pack-bitmap-write.c  | 12 +++++++-----
 pack-write.c         | 12 ++++++------
 6 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 49e7175d9..f4af2ab37 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -307,10 +307,10 @@ static const char *open_pack_file(const char *pack_name)
 	if (from_stdin) {
 		input_fd = 0;
 		if (!pack_name) {
-			static char tmp_file[PATH_MAX];
-			output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
+			struct strbuf tmp_file = STRBUF_INIT;
+			output_fd = odb_mkstemp(&tmp_file,
 						"pack/tmp_pack_XXXXXX");
-			pack_name = xstrdup(tmp_file);
+			pack_name = strbuf_detach(&tmp_file, NULL);
 		} else {
 			output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 0600);
 			if (output_fd < 0)
diff --git a/cache.h b/cache.h
index acad7078d..1c1889428 100644
--- a/cache.h
+++ b/cache.h
@@ -1678,7 +1678,7 @@ extern void pack_report(void);
  * usual "XXXXXX" trailer, and the resulting filename is written into the
  * "template" buffer. Returns the open descriptor.
  */
-extern int odb_mkstemp(char *template, size_t limit, const char *pattern);
+extern int odb_mkstemp(struct strbuf *template, const char *pattern);
 
 /*
  * Generate the filename to be used for a pack file with checksum "sha1" and
diff --git a/environment.c b/environment.c
index 2fdba7622..88276790d 100644
--- a/environment.c
+++ b/environment.c
@@ -274,7 +274,7 @@ char *get_object_directory(void)
 	return git_object_dir;
 }
 
-int odb_mkstemp(char *template, size_t limit, const char *pattern)
+int odb_mkstemp(struct strbuf *template, const char *pattern)
 {
 	int fd;
 	/*
@@ -282,18 +282,18 @@ int odb_mkstemp(char *template, size_t limit, const char *pattern)
 	 * restrictive except to remove write permission.
 	 */
 	int mode = 0444;
-	snprintf(template, limit, "%s/%s",
-		 get_object_directory(), pattern);
-	fd = git_mkstemp_mode(template, mode);
+	strbuf_reset(template);
+	strbuf_addf(template, "%s/%s", get_object_directory(), pattern);
+	fd = git_mkstemp_mode(template->buf, mode);
 	if (0 <= fd)
 		return fd;
 
 	/* slow path */
 	/* some mkstemp implementations erase template on failure */
-	snprintf(template, limit, "%s/%s",
-		 get_object_directory(), pattern);
-	safe_create_leading_directories(template);
-	return xmkstemp_mode(template, mode);
+	strbuf_reset(template);
+	strbuf_addf(template, "%s/%s", get_object_directory(), pattern);
+	safe_create_leading_directories(template->buf);
+	return xmkstemp_mode(template->buf, mode);
 }
 
 int odb_pack_keep(const char *name)
diff --git a/fast-import.c b/fast-import.c
index 41a539f97..900b9626f 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -890,14 +890,15 @@ static struct tree_content *dup_tree_content(struct tree_content *s)
 
 static void start_packfile(void)
 {
-	static char tmp_file[PATH_MAX];
+	struct strbuf tmp_file = STRBUF_INIT;
 	struct packed_git *p;
 	struct pack_header hdr;
 	int pack_fd;
 
-	pack_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
-			      "pack/tmp_pack_XXXXXX");
-	FLEX_ALLOC_STR(p, pack_name, tmp_file);
+	pack_fd = odb_mkstemp(&tmp_file, "pack/tmp_pack_XXXXXX");
+	FLEX_ALLOC_STR(p, pack_name, tmp_file.buf);
+	strbuf_release(&tmp_file);
+
 	p->pack_fd = pack_fd;
 	p->do_not_close = 1;
 	pack_file = sha1fd(pack_fd, p->pack_name);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 44492c346..e313f4f2b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -508,16 +508,16 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  const char *filename,
 			  uint16_t options)
 {
-	static char tmp_file[PATH_MAX];
 	static uint16_t default_version = 1;
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
+	struct strbuf tmp_file = STRBUF_INIT;
 	struct sha1file *f;
 
 	struct bitmap_disk_header header;
 
-	int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_bitmap_XXXXXX");
+	int fd = odb_mkstemp(&tmp_file, "pack/tmp_bitmap_XXXXXX");
 
-	f = sha1fd(fd, tmp_file);
+	f = sha1fd(fd, tmp_file.buf);
 
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
 	header.version = htons(default_version);
@@ -537,9 +537,11 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 
 	sha1close(f, NULL, CSUM_FSYNC);
 
-	if (adjust_shared_perm(tmp_file))
+	if (adjust_shared_perm(tmp_file.buf))
 		die_errno("unable to make temporary bitmap file readable");
 
-	if (rename(tmp_file, filename))
+	if (rename(tmp_file.buf, filename))
 		die_errno("unable to rename temporary bitmap file to '%s'", filename);
+
+	strbuf_release(&tmp_file);
 }
diff --git a/pack-write.c b/pack-write.c
index 19cb514ea..0c56ce98d 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -71,9 +71,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 		f = sha1fd_check(index_name);
 	} else {
 		if (!index_name) {
-			static char tmp_file[PATH_MAX];
-			fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_idx_XXXXXX");
-			index_name = xstrdup(tmp_file);
+			struct strbuf tmp_file = STRBUF_INIT;
+			fd = odb_mkstemp(&tmp_file, "pack/tmp_idx_XXXXXX");
+			index_name = strbuf_detach(&tmp_file, NULL);
 		} else {
 			unlink(index_name);
 			fd = open(index_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
@@ -326,11 +326,11 @@ int encode_in_pack_object_header(enum object_type type, uintmax_t size, unsigned
 
 struct sha1file *create_tmp_packfile(char **pack_tmp_name)
 {
-	char tmpname[PATH_MAX];
+	struct strbuf tmpname = STRBUF_INIT;
 	int fd;
 
-	fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
-	*pack_tmp_name = xstrdup(tmpname);
+	fd = odb_mkstemp(&tmpname, "pack/tmp_pack_XXXXXX");
+	*pack_tmp_name = strbuf_detach(&tmpname, NULL);
 	return sha1fd(fd, *pack_tmp_name);
 }
 
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/18] odb_mkstemp: use git_path_buf
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
  2017-03-28 19:45 ` [PATCH 01/18] do not check odb_mkstemp return value for errors Jeff King
  2017-03-28 19:45 ` [PATCH 02/18] odb_mkstemp: write filename into strbuf Jeff King
@ 2017-03-28 19:45 ` Jeff King
  2017-03-28 19:46 ` [PATCH 04/18] diff: avoid fixed-size buffer for patch-ids Jeff King
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:45 UTC (permalink / raw)
  To: git

Since git_path_buf() is smart enough to replace "objects/"
with the correct object path, we can use it instead of
manually assembling the path. That's slightly shorter, and
will clean up any non-canonical bits in the path.

Signed-off-by: Jeff King <peff@peff.net>
---
 environment.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/environment.c b/environment.c
index 88276790d..a9bf5658a 100644
--- a/environment.c
+++ b/environment.c
@@ -282,16 +282,14 @@ int odb_mkstemp(struct strbuf *template, const char *pattern)
 	 * restrictive except to remove write permission.
 	 */
 	int mode = 0444;
-	strbuf_reset(template);
-	strbuf_addf(template, "%s/%s", get_object_directory(), pattern);
+	git_path_buf(template, "objects/%s", pattern);
 	fd = git_mkstemp_mode(template->buf, mode);
 	if (0 <= fd)
 		return fd;
 
 	/* slow path */
 	/* some mkstemp implementations erase template on failure */
-	strbuf_reset(template);
-	strbuf_addf(template, "%s/%s", get_object_directory(), pattern);
+	git_path_buf(template, "objects/%s", pattern);
 	safe_create_leading_directories(template->buf);
 	return xmkstemp_mode(template->buf, mode);
 }
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/18] diff: avoid fixed-size buffer for patch-ids
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (2 preceding siblings ...)
  2017-03-28 19:45 ` [PATCH 03/18] odb_mkstemp: use git_path_buf Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:50   ` Jeff King
  2017-03-28 19:46 ` [PATCH 05/18] tag: use strbuf to format tag header Jeff King
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

To generate a patch id, we format the diff header into a
fixed-size buffer, and then feed the result to our sha1
computation. The fixed buffer has size '4*PATH_MAX + 20',
which in theory accomodates the four filenames plus some
extra data. Except:

  1. The filenames may not be constrained to PATH_MAX. The
     static value may not be a real limit on the current
     filesystem. Moreover, we may compute patch-ids for
     names stored only in git, without touching the current
     filesystem at all.

  2. The 20 bytes is not nearly enough to cover the
     extra content we put in the buffer.

As a result, the data we feed to the sha1 computation may be
truncated, and it's possible that a commit with a very long
filename could erroneously collide in the patch-id space
with another commit. For instance, if one commit modified
"really-long-filename/foo" and another modified "bar" in the
same directory.

In practice this is unlikely. Because the filenames are
repeated, and because there's a single cutoff at the end of
the buffer, the offending filename would have to be on the
order of four times larger than PATH_MAX.

But it's easy to fix by moving to a strbuf.

Technically this may change the output of patch-id for very
long filenames, but it's not worth making an exception for
this in the --stable output. It was a bug, and one that only
affected an unlikely set of paths.  And anyway, the exact
value would have varied from platform to platform depending
on the value of PATH_MAX, so there is no "stable" value.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 58cb72d7e..89b5dc890 100644
--- a/diff.c
+++ b/diff.c
@@ -4577,7 +4577,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 	int i;
 	git_SHA_CTX ctx;
 	struct patch_id_t data;
-	char buffer[PATH_MAX * 4 + 20];
+	struct strbuf buffer = STRBUF_INIT;
 
 	git_SHA1_Init(&ctx);
 	memset(&data, 0, sizeof(struct patch_id_t));
@@ -4607,10 +4607,11 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 		diff_fill_sha1_info(p->one);
 		diff_fill_sha1_info(p->two);
 
+		strbuf_reset(&buffer);
 		len1 = remove_space(p->one->path, strlen(p->one->path));
 		len2 = remove_space(p->two->path, strlen(p->two->path));
 		if (p->one->mode == 0)
-			len1 = snprintf(buffer, sizeof(buffer),
+			strbuf_addf(&buffer,
 					"diff--gita/%.*sb/%.*s"
 					"newfilemode%06o"
 					"---/dev/null"
@@ -4620,7 +4621,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 					p->two->mode,
 					len2, p->two->path);
 		else if (p->two->mode == 0)
-			len1 = snprintf(buffer, sizeof(buffer),
+			strbuf_addf(&buffer,
 					"diff--gita/%.*sb/%.*s"
 					"deletedfilemode%06o"
 					"---a/%.*s"
@@ -4630,7 +4631,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 					p->one->mode,
 					len1, p->one->path);
 		else
-			len1 = snprintf(buffer, sizeof(buffer),
+			strbuf_addf(&buffer,
 					"diff--gita/%.*sb/%.*s"
 					"---a/%.*s"
 					"+++b/%.*s",
@@ -4638,14 +4639,16 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 					len2, p->two->path,
 					len1, p->one->path,
 					len2, p->two->path);
-		git_SHA1_Update(&ctx, buffer, len1);
+		git_SHA1_Update(&ctx, buffer.buf, buffer.len);
 
 		if (diff_header_only)
 			continue;
 
 		if (fill_mmfile(&mf1, p->one) < 0 ||
-		    fill_mmfile(&mf2, p->two) < 0)
+		    fill_mmfile(&mf2, p->two) < 0) {
+			strbuf_release(&buffer);
 			return error("unable to read files to diff");
+		}
 
 		if (diff_filespec_is_binary(p->one) ||
 		    diff_filespec_is_binary(p->two)) {
@@ -4660,11 +4663,14 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 		xecfg.ctxlen = 3;
 		xecfg.flags = 0;
 		if (xdi_diff_outf(&mf1, &mf2, patch_id_consume, &data,
-				  &xpp, &xecfg))
+				  &xpp, &xecfg)) {
+			strbuf_release(&buffer);
 			return error("unable to generate patch-id diff for %s",
 				     p->one->path);
+		}
 	}
 
+	strbuf_release(&buffer);
 	git_SHA1_Final(sha1, &ctx);
 	return 0;
 }
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/18] tag: use strbuf to format tag header
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (3 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 04/18] diff: avoid fixed-size buffer for patch-ids Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 06/18] fetch: use heap buffer to format reflog Jeff King
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

We format the tag header into a fixed 1024-byte buffer. But
since the tag-name and tagger ident can be arbitrarily
large, we may unceremoniously die with "tag header too big".
Let's just use a strbuf instead.

Note that it looks at first glance like we can just format
this directly into the "buf" strbuf where it will ultimately
go. But that buffer may already contain the tag message, and
we have no easy way to prepend formatted data to a strbuf
(we can only splice in an already-generated buffer). This
isn't a performance-critical path, so going through an extra
buffer isn't a big deal.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/tag.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index ad29be692..045e7ad23 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -231,26 +231,22 @@ static void create_tag(const unsigned char *object, const char *tag,
 		       unsigned char *prev, unsigned char *result)
 {
 	enum object_type type;
-	char header_buf[1024];
-	int header_len;
+	struct strbuf header = STRBUF_INIT;
 	char *path = NULL;
 
 	type = sha1_object_info(object, NULL);
 	if (type <= OBJ_NONE)
 	    die(_("bad object type."));
 
-	header_len = snprintf(header_buf, sizeof(header_buf),
-			  "object %s\n"
-			  "type %s\n"
-			  "tag %s\n"
-			  "tagger %s\n\n",
-			  sha1_to_hex(object),
-			  typename(type),
-			  tag,
-			  git_committer_info(IDENT_STRICT));
-
-	if (header_len > sizeof(header_buf) - 1)
-		die(_("tag header too big."));
+	strbuf_addf(&header,
+		    "object %s\n"
+		    "type %s\n"
+		    "tag %s\n"
+		    "tagger %s\n\n",
+		    sha1_to_hex(object),
+		    typename(type),
+		    tag,
+		    git_committer_info(IDENT_STRICT));
 
 	if (!opt->message_given) {
 		int fd;
@@ -288,7 +284,8 @@ static void create_tag(const unsigned char *object, const char *tag,
 	if (!opt->message_given && !buf->len)
 		die(_("no tag message?"));
 
-	strbuf_insert(buf, 0, header_buf, header_len);
+	strbuf_insert(buf, 0, header.buf, header.len);
+	strbuf_release(&header);
 
 	if (build_tag_object(buf, opt->sign, result) < 0) {
 		if (path)
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/18] fetch: use heap buffer to format reflog
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (4 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 05/18] tag: use strbuf to format tag header Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 07/18] avoid using fixed PATH_MAX buffers for refs Jeff King
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

Part of the reflog content comes from the environment, which
can be much larger than our fixed buffer. Let's use a heap
buffer so we avoid truncating it.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fetch.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b5ad09d04..4ef7a08af 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -421,7 +421,7 @@ static int s_update_ref(const char *action,
 			struct ref *ref,
 			int check_old)
 {
-	char msg[1024];
+	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
@@ -431,7 +431,7 @@ static int s_update_ref(const char *action,
 		return 0;
 	if (!rla)
 		rla = default_rla.buf;
-	snprintf(msg, sizeof(msg), "%s: %s", rla, action);
+	msg = xstrfmt("%s: %s", rla, action);
 
 	transaction = ref_transaction_begin(&err);
 	if (!transaction ||
@@ -449,11 +449,13 @@ static int s_update_ref(const char *action,
 
 	ref_transaction_free(transaction);
 	strbuf_release(&err);
+	free(msg);
 	return 0;
 fail:
 	ref_transaction_free(transaction);
 	error("%s", err.buf);
 	strbuf_release(&err);
+	free(msg);
 	return df_conflict ? STORE_REF_ERROR_DF_CONFLICT
 			   : STORE_REF_ERROR_OTHER;
 }
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/18] avoid using fixed PATH_MAX buffers for refs
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (5 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 06/18] fetch: use heap buffer to format reflog Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-04-17  6:00   ` Junio C Hamano
  2017-03-28 19:46 ` [PATCH 08/18] avoid using mksnpath " Jeff King
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

Many functions which handle refs use a PATH_MAX-sized buffer
to do so. This is mostly reasonable as we have to write
loose refs into the filesystem, and at least on Linux the 4K
PATH_MAX is big enough that nobody would care. But:

  1. The static PATH_MAX is not always the filesystem limit.

  2. On other platforms, PATH_MAX may be much smaller.

  3. As we move to alternate ref storage, we won't be bound
     by filesystem limits.

Let's convert these to heap buffers so we don't have to
worry about truncation or size limits.

We may want to eventually constrain ref lengths for sanity
and to prevent malicious names, but we should do so
consistently across all platforms, and in a central place
(like the ref code).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/checkout.c  |  5 ++---
 builtin/ls-remote.c | 10 ++++++----
 builtin/replace.c   | 50 +++++++++++++++++++++++++++-----------------------
 builtin/tag.c       | 15 ++++++---------
 4 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 81f07c3ef..93e8dfc82 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -889,11 +889,10 @@ static int check_tracking_name(struct remote *remote, void *cb_data)
 static const char *unique_tracking_name(const char *name, struct object_id *oid)
 {
 	struct tracking_name_data cb_data = { NULL, NULL, NULL, 1 };
-	char src_ref[PATH_MAX];
-	snprintf(src_ref, PATH_MAX, "refs/heads/%s", name);
-	cb_data.src_ref = src_ref;
+	cb_data.src_ref = xstrfmt("refs/heads/%s", name);
 	cb_data.dst_oid = oid;
 	for_each_remote(check_tracking_name, &cb_data);
+	free(cb_data.src_ref);
 	if (cb_data.unique)
 		return cb_data.dst_ref;
 	free(cb_data.dst_ref);
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 66cdd45cc..b2d7d5ce6 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -17,17 +17,19 @@ static const char * const ls_remote_usage[] = {
 static int tail_match(const char **pattern, const char *path)
 {
 	const char *p;
-	char pathbuf[PATH_MAX];
+	char *pathbuf;
 
 	if (!pattern)
 		return 1; /* no restriction */
 
-	if (snprintf(pathbuf, sizeof(pathbuf), "/%s", path) > sizeof(pathbuf))
-		return error("insanely long ref %.*s...", 20, path);
+	pathbuf = xstrfmt("/%s", path);
 	while ((p = *(pattern++)) != NULL) {
-		if (!wildmatch(p, pathbuf, 0, NULL))
+		if (!wildmatch(p, pathbuf, 0, NULL)) {
+			free(pathbuf);
 			return 1;
+		}
 	}
+	free(pathbuf);
 	return 0;
 }
 
diff --git a/builtin/replace.c b/builtin/replace.c
index f83e7b8fc..065515bab 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -93,26 +93,31 @@ typedef int (*each_replace_name_fn)(const char *name, const char *ref,
 static int for_each_replace_name(const char **argv, each_replace_name_fn fn)
 {
 	const char **p, *full_hex;
-	char ref[PATH_MAX];
+	struct strbuf ref = STRBUF_INIT;
+	size_t base_len;
 	int had_error = 0;
 	struct object_id oid;
 
+	strbuf_addstr(&ref, git_replace_ref_base);
+	base_len = ref.len;
+
 	for (p = argv; *p; p++) {
 		if (get_oid(*p, &oid)) {
 			error("Failed to resolve '%s' as a valid ref.", *p);
 			had_error = 1;
 			continue;
 		}
-		full_hex = oid_to_hex(&oid);
-		snprintf(ref, sizeof(ref), "%s%s", git_replace_ref_base, full_hex);
-		/* read_ref() may reuse the buffer */
-		full_hex = ref + strlen(git_replace_ref_base);
-		if (read_ref(ref, oid.hash)) {
+
+		strbuf_setlen(&ref, base_len);
+		strbuf_addstr(&ref, oid_to_hex(&oid));
+		full_hex = ref.buf + base_len;
+
+		if (read_ref(ref.buf, oid.hash)) {
 			error("replace ref '%s' not found.", full_hex);
 			had_error = 1;
 			continue;
 		}
-		if (fn(full_hex, ref, &oid))
+		if (fn(full_hex, ref.buf, &oid))
 			had_error = 1;
 	}
 	return had_error;
@@ -129,21 +134,18 @@ static int delete_replace_ref(const char *name, const char *ref,
 
 static void check_ref_valid(struct object_id *object,
 			    struct object_id *prev,
-			    char *ref,
-			    int ref_size,
+			    struct strbuf *ref,
 			    int force)
 {
-	if (snprintf(ref, ref_size,
-		     "%s%s", git_replace_ref_base,
-		     oid_to_hex(object)) > ref_size - 1)
-		die("replace ref name too long: %.*s...", 50, ref);
-	if (check_refname_format(ref, 0))
-		die("'%s' is not a valid ref name.", ref);
-
-	if (read_ref(ref, prev->hash))
+	strbuf_reset(ref);
+	strbuf_addf(ref, "%s%s", git_replace_ref_base, oid_to_hex(object));
+	if (check_refname_format(ref->buf, 0))
+		die("'%s' is not a valid ref name.", ref->buf);
+
+	if (read_ref(ref->buf, prev->hash))
 		oidclr(prev);
 	else if (!force)
-		die("replace ref '%s' already exists", ref);
+		die("replace ref '%s' already exists", ref->buf);
 }
 
 static int replace_object_oid(const char *object_ref,
@@ -154,7 +156,7 @@ static int replace_object_oid(const char *object_ref,
 {
 	struct object_id prev;
 	enum object_type obj_type, repl_type;
-	char ref[PATH_MAX];
+	struct strbuf ref = STRBUF_INIT;
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
 
@@ -167,16 +169,17 @@ static int replace_object_oid(const char *object_ref,
 		    object_ref, typename(obj_type),
 		    replace_ref, typename(repl_type));
 
-	check_ref_valid(object, &prev, ref, sizeof(ref), force);
+	check_ref_valid(object, &prev, &ref, force);
 
 	transaction = ref_transaction_begin(&err);
 	if (!transaction ||
-	    ref_transaction_update(transaction, ref, repl->hash, prev.hash,
+	    ref_transaction_update(transaction, ref.buf, repl->hash, prev.hash,
 				   0, NULL, &err) ||
 	    ref_transaction_commit(transaction, &err))
 		die("%s", err.buf);
 
 	ref_transaction_free(transaction);
+	strbuf_release(&ref);
 	return 0;
 }
 
@@ -280,7 +283,7 @@ static int edit_and_replace(const char *object_ref, int force, int raw)
 	char *tmpfile = git_pathdup("REPLACE_EDITOBJ");
 	enum object_type type;
 	struct object_id old, new, prev;
-	char ref[PATH_MAX];
+	struct strbuf ref = STRBUF_INIT;
 
 	if (get_oid(object_ref, &old) < 0)
 		die("Not a valid object name: '%s'", object_ref);
@@ -289,7 +292,8 @@ static int edit_and_replace(const char *object_ref, int force, int raw)
 	if (type < 0)
 		die("unable to get object type for %s", oid_to_hex(&old));
 
-	check_ref_valid(&old, &prev, ref, sizeof(ref), force);
+	check_ref_valid(&old, &prev, &ref, force);
+	strbuf_release(&ref);
 
 	export_object(&old, type, raw, tmpfile);
 	if (launch_editor(tmpfile, NULL, NULL) < 0)
diff --git a/builtin/tag.c b/builtin/tag.c
index 045e7ad23..0eb4e25c6 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -72,25 +72,22 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
 			     const void *cb_data)
 {
 	const char **p;
-	char ref[PATH_MAX];
+	struct strbuf ref = STRBUF_INIT;
 	int had_error = 0;
 	unsigned char sha1[20];
 
 	for (p = argv; *p; p++) {
-		if (snprintf(ref, sizeof(ref), "refs/tags/%s", *p)
-					>= sizeof(ref)) {
-			error(_("tag name too long: %.*s..."), 50, *p);
-			had_error = 1;
-			continue;
-		}
-		if (read_ref(ref, sha1)) {
+		strbuf_reset(&ref);
+		strbuf_addf(&ref, "refs/tags/%s", *p);
+		if (read_ref(ref.buf, sha1)) {
 			error(_("tag '%s' not found."), *p);
 			had_error = 1;
 			continue;
 		}
-		if (fn(*p, ref, sha1, cb_data))
+		if (fn(*p, ref.buf, sha1, cb_data))
 			had_error = 1;
 	}
+	strbuf_release(&ref);
 	return had_error;
 }
 
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/18] avoid using mksnpath for refs
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (6 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 07/18] avoid using fixed PATH_MAX buffers for refs Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 09/18] create_branch: move msg setup closer to point of use Jeff King
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

Like the previous commit, we'd like to avoid the assumption
that refs fit into PATH_MAX-sized buffers. These callsites
have an extra twist, though: they write the refnames using
mksnpath. This does two things beyond a regular snprintf:

  1. It quietly writes "/bad-path/" when truncation occurs.
     This saves the caller having to check the error code,
     but if you aren't actually feeding the result to a
     system call (and we aren't here), it's questionable.

  2. It calls cleanup_path(), which removes leading
     instances of "./".  That's questionable when dealing
     with refnames, as we could silently canonicalize a
     syntactically bogus refname into a valid one.

Let's convert each case to use a strbuf. This is preferable
to xstrfmt() because we can reuse the same buffer as we
loop.

Signed-off-by: Jeff King <peff@peff.net>
---
 refs.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/refs.c b/refs.c
index e7606716d..9de6979ac 100644
--- a/refs.c
+++ b/refs.c
@@ -429,29 +429,31 @@ int expand_ref(const char *str, int len, unsigned char *sha1, char **ref)
 {
 	const char **p, *r;
 	int refs_found = 0;
+	struct strbuf fullref = STRBUF_INIT;
 
 	*ref = NULL;
 	for (p = ref_rev_parse_rules; *p; p++) {
-		char fullref[PATH_MAX];
 		unsigned char sha1_from_ref[20];
 		unsigned char *this_result;
 		int flag;
 
 		this_result = refs_found ? sha1_from_ref : sha1;
-		mksnpath(fullref, sizeof(fullref), *p, len, str);
-		r = resolve_ref_unsafe(fullref, RESOLVE_REF_READING,
+		strbuf_reset(&fullref);
+		strbuf_addf(&fullref, *p, len, str);
+		r = resolve_ref_unsafe(fullref.buf, RESOLVE_REF_READING,
 				       this_result, &flag);
 		if (r) {
 			if (!refs_found++)
 				*ref = xstrdup(r);
 			if (!warn_ambiguous_refs)
 				break;
-		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD")) {
-			warning("ignoring dangling symref %s.", fullref);
-		} else if ((flag & REF_ISBROKEN) && strchr(fullref, '/')) {
-			warning("ignoring broken ref %s.", fullref);
+		} else if ((flag & REF_ISSYMREF) && strcmp(fullref.buf, "HEAD")) {
+			warning("ignoring dangling symref %s.", fullref.buf);
+		} else if ((flag & REF_ISBROKEN) && strchr(fullref.buf, '/')) {
+			warning("ignoring broken ref %s.", fullref.buf);
 		}
 	}
+	strbuf_release(&fullref);
 	return refs_found;
 }
 
@@ -460,21 +462,22 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **log)
 	char *last_branch = substitute_branch_name(&str, &len);
 	const char **p;
 	int logs_found = 0;
+	struct strbuf path = STRBUF_INIT;
 
 	*log = NULL;
 	for (p = ref_rev_parse_rules; *p; p++) {
 		unsigned char hash[20];
-		char path[PATH_MAX];
 		const char *ref, *it;
 
-		mksnpath(path, sizeof(path), *p, len, str);
-		ref = resolve_ref_unsafe(path, RESOLVE_REF_READING,
+		strbuf_reset(&path);
+		strbuf_addf(&path, *p, len, str);
+		ref = resolve_ref_unsafe(path.buf, RESOLVE_REF_READING,
 					 hash, NULL);
 		if (!ref)
 			continue;
-		if (reflog_exists(path))
-			it = path;
-		else if (strcmp(ref, path) && reflog_exists(ref))
+		if (reflog_exists(path.buf))
+			it = path.buf;
+		else if (strcmp(ref, path.buf) && reflog_exists(ref))
 			it = ref;
 		else
 			continue;
@@ -485,6 +488,7 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **log)
 		if (!warn_ambiguous_refs)
 			break;
 	}
+	strbuf_release(&path);
 	free(last_branch);
 	return logs_found;
 }
@@ -944,6 +948,7 @@ char *shorten_unambiguous_ref(const char *refname, int strict)
 	static char **scanf_fmts;
 	static int nr_rules;
 	char *short_name;
+	struct strbuf resolved_buf = STRBUF_INIT;
 
 	if (!nr_rules) {
 		/*
@@ -1002,7 +1007,6 @@ char *shorten_unambiguous_ref(const char *refname, int strict)
 		 */
 		for (j = 0; j < rules_to_fail; j++) {
 			const char *rule = ref_rev_parse_rules[j];
-			char refname[PATH_MAX];
 
 			/* skip matched rule */
 			if (i == j)
@@ -1013,9 +1017,10 @@ char *shorten_unambiguous_ref(const char *refname, int strict)
 			 * (with this previous rule) to a valid ref
 			 * read_ref() returns 0 on success
 			 */
-			mksnpath(refname, sizeof(refname),
-				 rule, short_name_len, short_name);
-			if (ref_exists(refname))
+			strbuf_reset(&resolved_buf);
+			strbuf_addf(&resolved_buf, rule,
+				    short_name_len, short_name);
+			if (ref_exists(resolved_buf.buf))
 				break;
 		}
 
@@ -1023,10 +1028,13 @@ char *shorten_unambiguous_ref(const char *refname, int strict)
 		 * short name is non-ambiguous if all previous rules
 		 * haven't resolved to a valid ref
 		 */
-		if (j == rules_to_fail)
+		if (j == rules_to_fail) {
+			strbuf_release(&resolved_buf);
 			return short_name;
+		}
 	}
 
+	strbuf_release(&resolved_buf);
 	free(short_name);
 	return xstrdup(refname);
 }
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/18] create_branch: move msg setup closer to point of use
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (7 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 08/18] avoid using mksnpath " Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 10/18] create_branch: use xstrfmt for reflog message Jeff King
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

In create_branch() we write the reflog msg into a buffer in
the main function, but then use it only inside a
conditional. If you carefully follow the logic, you can
confirm that we never use the buffer uninitialized nor write
when it would not be used. But we can make this a lot more
obvious by simply moving the write step inside the
conditional.

Signed-off-by: Jeff King <peff@peff.net>
---
 branch.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/branch.c b/branch.c
index 5c12036b0..6d0ca94cc 100644
--- a/branch.c
+++ b/branch.c
@@ -234,7 +234,7 @@ void create_branch(const char *name, const char *start_name,
 {
 	struct commit *commit;
 	unsigned char sha1[20];
-	char *real_ref, msg[PATH_MAX + 20];
+	char *real_ref;
 	struct strbuf ref = STRBUF_INIT;
 	int forcing = 0;
 	int dont_change_ref = 0;
@@ -290,19 +290,20 @@ void create_branch(const char *name, const char *start_name,
 		die(_("Not a valid branch point: '%s'."), start_name);
 	hashcpy(sha1, commit->object.oid.hash);
 
-	if (forcing)
-		snprintf(msg, sizeof msg, "branch: Reset to %s",
-			 start_name);
-	else if (!dont_change_ref)
-		snprintf(msg, sizeof msg, "branch: Created from %s",
-			 start_name);
-
 	if (reflog)
 		log_all_ref_updates = LOG_REFS_NORMAL;
 
 	if (!dont_change_ref) {
 		struct ref_transaction *transaction;
 		struct strbuf err = STRBUF_INIT;
+		char msg[PATH_MAX + 20];
+
+		if (forcing)
+			snprintf(msg, sizeof msg, "branch: Reset to %s",
+				 start_name);
+		else
+			snprintf(msg, sizeof msg, "branch: Created from %s",
+				 start_name);
 
 		transaction = ref_transaction_begin(&err);
 		if (!transaction ||
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/18] create_branch: use xstrfmt for reflog message
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (8 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 09/18] create_branch: move msg setup closer to point of use Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 11/18] name-rev: replace static buffer with strbuf Jeff King
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

We generate a reflog message that contains some fixed text
plus a branch name, and use a buffer of size PATH_MAX + 20.
This mostly works if you assume that refnames are shorter
than PATH_MAX, but:

  1. That's not necessarily true. PATH_MAX is not always the
     filesystem's limit.

  2. The "20" is not sufficiently large for the fixed text
     anyway.

Let's just switch to a heap buffer so we don't have to even
care.

Signed-off-by: Jeff King <peff@peff.net>
---
 branch.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/branch.c b/branch.c
index 6d0ca94cc..ad5a2299b 100644
--- a/branch.c
+++ b/branch.c
@@ -296,14 +296,12 @@ void create_branch(const char *name, const char *start_name,
 	if (!dont_change_ref) {
 		struct ref_transaction *transaction;
 		struct strbuf err = STRBUF_INIT;
-		char msg[PATH_MAX + 20];
+		char *msg;
 
 		if (forcing)
-			snprintf(msg, sizeof msg, "branch: Reset to %s",
-				 start_name);
+			msg = xstrfmt("branch: Reset to %s", start_name);
 		else
-			snprintf(msg, sizeof msg, "branch: Created from %s",
-				 start_name);
+			msg = xstrfmt("branch: Created from %s", start_name);
 
 		transaction = ref_transaction_begin(&err);
 		if (!transaction ||
@@ -314,6 +312,7 @@ void create_branch(const char *name, const char *start_name,
 			die("%s", err.buf);
 		ref_transaction_free(transaction);
 		strbuf_release(&err);
+		free(msg);
 	}
 
 	if (real_ref && track)
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/18] name-rev: replace static buffer with strbuf
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (9 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 10/18] create_branch: use xstrfmt for reflog message Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 12/18] receive-pack: print --pack-header directly into argv array Jeff King
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

When name-rev needs to format an actual name, we do so into
a fixed-size buffer. That includes the actual ref tip, as
well as any traversal information. Since refs can exceed
1024 bytes, this means you can get a bogus result. E.g.,
doing:

   git tag $(perl -e 'print join("/", 1..1024)')
   git describe --contains HEAD^

results in ".../282/283", when it should be
".../1023/1024~1".

We can solve this by using a heap buffer. We'll use a
strbuf, which lets us write into the same buffer from our
loop without having to reallocate.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/name-rev.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 8bdc3eaa6..92a5d8a5d 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -238,10 +238,9 @@ static const char *get_exact_ref_match(const struct object *o)
 	return NULL;
 }
 
-/* returns a static buffer */
-static const char *get_rev_name(const struct object *o)
+/* may return a constant string or use "buf" as scratch space */
+static const char *get_rev_name(const struct object *o, struct strbuf *buf)
 {
-	static char buffer[1024];
 	struct rev_name *n;
 	struct commit *c;
 
@@ -258,10 +257,9 @@ static const char *get_rev_name(const struct object *o)
 		int len = strlen(n->tip_name);
 		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
 			len -= 2;
-		snprintf(buffer, sizeof(buffer), "%.*s~%d", len, n->tip_name,
-				n->generation);
-
-		return buffer;
+		strbuf_reset(buf);
+		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
+		return buf->buf;
 	}
 }
 
@@ -271,10 +269,11 @@ static void show_name(const struct object *obj,
 {
 	const char *name;
 	const struct object_id *oid = &obj->oid;
+	struct strbuf buf = STRBUF_INIT;
 
 	if (!name_only)
 		printf("%s ", caller_name ? caller_name : oid_to_hex(oid));
-	name = get_rev_name(obj);
+	name = get_rev_name(obj, &buf);
 	if (name)
 		printf("%s\n", name);
 	else if (allow_undefined)
@@ -283,6 +282,7 @@ static void show_name(const struct object *obj,
 		printf("%s\n", find_unique_abbrev(oid->hash, DEFAULT_ABBREV));
 	else
 		die("cannot describe '%s'", oid_to_hex(oid));
+	strbuf_release(&buf);
 }
 
 static char const * const name_rev_usage[] = {
@@ -294,6 +294,7 @@ static char const * const name_rev_usage[] = {
 
 static void name_rev_line(char *p, struct name_ref_data *data)
 {
+	struct strbuf buf = STRBUF_INIT;
 	int forty = 0;
 	char *p_start;
 	for (p_start = p; *p; p++) {
@@ -314,7 +315,7 @@ static void name_rev_line(char *p, struct name_ref_data *data)
 				struct object *o =
 					lookup_object(sha1);
 				if (o)
-					name = get_rev_name(o);
+					name = get_rev_name(o, &buf);
 			}
 			*(p+1) = c;
 
@@ -332,6 +333,8 @@ static void name_rev_line(char *p, struct name_ref_data *data)
 	/* flush */
 	if (p_start != p)
 		fwrite(p_start, p - p_start, 1, stdout);
+
+	strbuf_release(&buf);
 }
 
 int cmd_name_rev(int argc, const char **argv, const char *prefix)
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/18] receive-pack: print --pack-header directly into argv array
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (10 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 11/18] name-rev: replace static buffer with strbuf Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 13/18] replace unchecked snprintf calls with heap buffers Jeff King
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

After receive-pack reads the pack header from the client, it
feeds the already-read part to index-pack and unpack-objects
via their --pack-header command-line options.  To do so, we
format it into a fixed buffer, then duplicate it into the
child's argv_array.

Our buffer is long enough to handle any possible input, so
this isn't wrong. But it's more complicated than it needs to
be; we can just argv_array_pushf() the final value and avoid
the intermediate copy. This drops the magic number and is
more efficient, too.

Note that we need to push to the argv_array in order, which
means we can't do the push until we are in the "unpack-objects
versus index-pack" conditional.  Rather than duplicate the
slightly complicated format specifier, I pushed it into a
helper function.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/receive-pack.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index fb2a090a0..2ca93adef 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1634,12 +1634,17 @@ static const char *parse_pack_header(struct pack_header *hdr)
 
 static const char *pack_lockfile;
 
+static void push_header_arg(struct argv_array *args, struct pack_header *hdr)
+{
+	argv_array_pushf(args, "--pack_header=%"PRIu32",%"PRIu32,
+			ntohl(hdr->hdr_version), ntohl(hdr->hdr_entries));
+}
+
 static const char *unpack(int err_fd, struct shallow_info *si)
 {
 	struct pack_header hdr;
 	const char *hdr_err;
 	int status;
-	char hdr_arg[38];
 	struct child_process child = CHILD_PROCESS_INIT;
 	int fsck_objects = (receive_fsck_objects >= 0
 			    ? receive_fsck_objects
@@ -1653,9 +1658,6 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 			close(err_fd);
 		return hdr_err;
 	}
-	snprintf(hdr_arg, sizeof(hdr_arg),
-			"--pack_header=%"PRIu32",%"PRIu32,
-			ntohl(hdr.hdr_version), ntohl(hdr.hdr_entries));
 
 	if (si->nr_ours || si->nr_theirs) {
 		alt_shallow_file = setup_temporary_shallow(si->shallow);
@@ -1679,7 +1681,8 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 	tmp_objdir_add_as_alternate(tmp_objdir);
 
 	if (ntohl(hdr.hdr_entries) < unpack_limit) {
-		argv_array_pushl(&child.args, "unpack-objects", hdr_arg, NULL);
+		argv_array_push(&child.args, "unpack-objects");
+		push_header_arg(&child.args, &hdr);
 		if (quiet)
 			argv_array_push(&child.args, "-q");
 		if (fsck_objects)
@@ -1697,8 +1700,8 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 	} else {
 		char hostname[256];
 
-		argv_array_pushl(&child.args, "index-pack",
-				 "--stdin", hdr_arg, NULL);
+		argv_array_pushl(&child.args, "index-pack", "--stdin", NULL);
+		push_header_arg(&child.args, &hdr);
 
 		if (gethostname(hostname, sizeof(hostname)))
 			xsnprintf(hostname, sizeof(hostname), "localhost");
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/18] replace unchecked snprintf calls with heap buffers
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (11 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 12/18] receive-pack: print --pack-header directly into argv array Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 14/18] combine-diff: replace malloc/snprintf with xstrfmt Jeff King
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

We'd prefer to avoid unchecked snprintf calls because
truncation can lead to unexpected results.

These are all cases where truncation shouldn't ever happen,
because the input to snprintf is fixed in size. That makes
them candidates for xsnprintf(), but it's simpler still to
just use the heap, and then nobody has to wonder if "100" is
big enough.

We'll use xstrfmt() where possible, and a strbuf when we need
the resulting size or to reuse the same buffer in a loop.

Signed-off-by: Jeff King <peff@peff.net>
---
 bisect.c             | 8 +++++---
 builtin/index-pack.c | 9 +++++----
 builtin/notes.c      | 9 ++++-----
 builtin/rev-parse.c  | 5 +++--
 4 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/bisect.c b/bisect.c
index 30808cadf..d12583eaa 100644
--- a/bisect.c
+++ b/bisect.c
@@ -200,6 +200,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 {
 	struct commit_list *p;
 	struct commit_dist *array = xcalloc(nr, sizeof(*array));
+	struct strbuf buf = STRBUF_INIT;
 	int cnt, i;
 
 	for (p = list, cnt = 0; p; p = p->next) {
@@ -217,17 +218,18 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 	}
 	QSORT(array, cnt, compare_commit_dist);
 	for (p = list, i = 0; i < cnt; i++) {
-		char buf[100]; /* enough for dist=%d */
 		struct object *obj = &(array[i].commit->object);
 
-		snprintf(buf, sizeof(buf), "dist=%d", array[i].distance);
-		add_name_decoration(DECORATION_NONE, buf, obj);
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "dist=%d", array[i].distance);
+		add_name_decoration(DECORATION_NONE, buf.buf, obj);
 
 		p->item = array[i].commit;
 		p = p->next;
 	}
 	if (p)
 		p->next = NULL;
+	strbuf_release(&buf);
 	free(array);
 	return list;
 }
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f4af2ab37..197c51912 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1443,10 +1443,11 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	if (!from_stdin) {
 		printf("%s\n", sha1_to_hex(sha1));
 	} else {
-		char buf[48];
-		int len = snprintf(buf, sizeof(buf), "%s\t%s\n",
-				   report, sha1_to_hex(sha1));
-		write_or_die(1, buf, len);
+		struct strbuf buf = STRBUF_INIT;
+
+		strbuf_addf(&buf, "%s\t%s\n", report, sha1_to_hex(sha1));
+		write_or_die(1, buf.buf, buf.len);
+		strbuf_release(&buf);
 
 		/*
 		 * Let's just mimic git-unpack-objects here and write
diff --git a/builtin/notes.c b/builtin/notes.c
index 0513f7455..7b891471c 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -554,7 +554,7 @@ static int append_edit(int argc, const char **argv, const char *prefix)
 	struct notes_tree *t;
 	unsigned char object[20], new_note[20];
 	const unsigned char *note;
-	char logmsg[100];
+	char *logmsg;
 	const char * const *usage;
 	struct note_data d = { 0, 0, NULL, STRBUF_INIT };
 	struct option options[] = {
@@ -618,17 +618,16 @@ static int append_edit(int argc, const char **argv, const char *prefix)
 		write_note_data(&d, new_note);
 		if (add_note(t, object, new_note, combine_notes_overwrite))
 			die("BUG: combine_notes_overwrite failed");
-		snprintf(logmsg, sizeof(logmsg), "Notes added by 'git notes %s'",
-			argv[0]);
+		logmsg = xstrfmt("Notes added by 'git notes %s'", argv[0]);
 	} else {
 		fprintf(stderr, _("Removing note for object %s\n"),
 			sha1_to_hex(object));
 		remove_note(t, object);
-		snprintf(logmsg, sizeof(logmsg), "Notes removed by 'git notes %s'",
-			argv[0]);
+		logmsg = xstrfmt("Notes removed by 'git notes %s'", argv[0]);
 	}
 	commit_notes(t, logmsg);
 
+	free(logmsg);
 	free_note_data(&d);
 	free_notes(t);
 	return 0;
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 9e53a1a7c..f54d7b502 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -213,13 +213,14 @@ static int show_abbrev(const unsigned char *sha1, void *cb_data)
 
 static void show_datestring(const char *flag, const char *datestr)
 {
-	static char buffer[100];
+	char *buffer;
 
 	/* date handling requires both flags and revs */
 	if ((filter & (DO_FLAGS | DO_REVS)) != (DO_FLAGS | DO_REVS))
 		return;
-	snprintf(buffer, sizeof(buffer), "%s%lu", flag, approxidate(datestr));
+	buffer = xstrfmt("%s%lu", flag, approxidate(datestr));
 	show(buffer);
+	free(buffer);
 }
 
 static int show_file(const char *arg, int output_prefix)
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/18] combine-diff: replace malloc/snprintf with xstrfmt
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (12 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 13/18] replace unchecked snprintf calls with heap buffers Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:46 ` [PATCH 15/18] convert unchecked snprintf into xsnprintf Jeff King
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

There's no need to use the magic "100" when a strbuf can do
it for us.

Signed-off-by: Jeff King <peff@peff.net>
---
 combine-diff.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 59501db99..d3560573a 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -292,9 +292,10 @@ static char *grab_blob(const struct object_id *oid, unsigned int mode,
 	enum object_type type;
 
 	if (S_ISGITLINK(mode)) {
-		blob = xmalloc(100);
-		*size = snprintf(blob, 100,
-				 "Subproject commit %s\n", oid_to_hex(oid));
+		struct strbuf buf = STRBUF_INIT;
+		strbuf_addf(&buf, "Subproject commit %s\n", oid_to_hex(oid));
+		*size = buf.len;
+		blob = strbuf_detach(&buf, NULL);
 	} else if (is_null_oid(oid)) {
 		/* deleted blob */
 		*size = 0;
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 15/18] convert unchecked snprintf into xsnprintf
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (13 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 14/18] combine-diff: replace malloc/snprintf with xstrfmt Jeff King
@ 2017-03-28 19:46 ` Jeff King
  2017-03-28 19:47 ` [PATCH 16/18] transport-helper: replace checked snprintf with xsnprintf Jeff King
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:46 UTC (permalink / raw)
  To: git

These calls to snprintf should always succeed, because their
input is small and fixed. Let's use xsnprintf to make sure
this is the case (and to make auditing for actual truncation
easier).

These could be candidates for turning into heap buffers, but
they fall into a few broad categories that make it not worth
doing:

  - formatting single numbers is simple enough that we can
    see the result should fit

  - the size of a sha1 is likewise well-known, and I didn't
    want to cause unnecessary conflicts with the ongoing
    process to convert these constants to GIT_MAX_HEXSZ

  - the interface for curl_errorstr is dictated by curl

Signed-off-by: Jeff King <peff@peff.net>
---
 grep.c      |  4 ++--
 http.c      | 10 +++++-----
 imap-send.c |  2 +-
 sha1_file.c |  4 ++--
 submodule.c |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/grep.c b/grep.c
index 0dbdc1d00..39b4b60d2 100644
--- a/grep.c
+++ b/grep.c
@@ -1164,7 +1164,7 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
 	}
 	if (opt->linenum) {
 		char buf[32];
-		snprintf(buf, sizeof(buf), "%d", lno);
+		xsnprintf(buf, sizeof(buf), "%d", lno);
 		output_color(opt, buf, strlen(buf), opt->color_lineno);
 		output_sep(opt, sign);
 	}
@@ -1651,7 +1651,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 				     opt->color_filename);
 			output_sep(opt, ':');
 		}
-		snprintf(buf, sizeof(buf), "%u\n", count);
+		xsnprintf(buf, sizeof(buf), "%u\n", count);
 		opt->output(opt, buf, strlen(buf));
 		return 1;
 	}
diff --git a/http.c b/http.c
index 96d84bbed..8d94e2c63 100644
--- a/http.c
+++ b/http.c
@@ -1366,9 +1366,9 @@ static int handle_curl_result(struct slot_results *results)
 		 * FAILONERROR it is lost, so we can give only the numeric
 		 * status code.
 		 */
-		snprintf(curl_errorstr, sizeof(curl_errorstr),
-			 "The requested URL returned error: %ld",
-			 results->http_code);
+		xsnprintf(curl_errorstr, sizeof(curl_errorstr),
+			  "The requested URL returned error: %ld",
+			  results->http_code);
 	}
 
 	if (results->curl_result == CURLE_OK) {
@@ -1410,8 +1410,8 @@ int run_one_slot(struct active_request_slot *slot,
 {
 	slot->results = results;
 	if (!start_active_slot(slot)) {
-		snprintf(curl_errorstr, sizeof(curl_errorstr),
-			 "failed to start HTTP request");
+		xsnprintf(curl_errorstr, sizeof(curl_errorstr),
+			  "failed to start HTTP request");
 		return HTTP_START_FAILED;
 	}
 
diff --git a/imap-send.c b/imap-send.c
index 5c7e27a89..857591660 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -964,7 +964,7 @@ static struct imap_store *imap_open_store(struct imap_server_conf *srvc, char *f
 		int gai;
 		char portstr[6];
 
-		snprintf(portstr, sizeof(portstr), "%d", srvc->port);
+		xsnprintf(portstr, sizeof(portstr), "%d", srvc->port);
 
 		memset(&hints, 0, sizeof(hints));
 		hints.ai_socktype = SOCK_STREAM;
diff --git a/sha1_file.c b/sha1_file.c
index 71063890f..43990dec7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3762,8 +3762,8 @@ static int for_each_file_in_obj_subdir(int subdir_nr,
 			char hex[GIT_SHA1_HEXSZ+1];
 			struct object_id oid;
 
-			snprintf(hex, sizeof(hex), "%02x%s",
-				 subdir_nr, de->d_name);
+			xsnprintf(hex, sizeof(hex), "%02x%s",
+				  subdir_nr, de->d_name);
 			if (!get_oid_hex(hex, &oid)) {
 				if (obj_cb) {
 					r = obj_cb(&oid, path->buf, data);
diff --git a/submodule.c b/submodule.c
index 3200b7bb2..e11ea7d0a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1221,7 +1221,7 @@ static int find_first_merges(struct object_array *result, const char *path,
 	memset(&rev_opts, 0, sizeof(rev_opts));
 
 	/* get all revisions that merge commit a */
-	snprintf(merged_revision, sizeof(merged_revision), "^%s",
+	xsnprintf(merged_revision, sizeof(merged_revision), "^%s",
 			oid_to_hex(&a->object.oid));
 	init_revisions(&revs, NULL);
 	rev_opts.submodule = path;
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 16/18] transport-helper: replace checked snprintf with xsnprintf
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (14 preceding siblings ...)
  2017-03-28 19:46 ` [PATCH 15/18] convert unchecked snprintf into xsnprintf Jeff King
@ 2017-03-28 19:47 ` Jeff King
  2017-03-28 19:47 ` [PATCH 17/18] gc: replace local buffer with git_path Jeff King
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:47 UTC (permalink / raw)
  To: git

We can use xsnprintf to do our truncation check with less
code. The error message isn't as specific, but the point is
that this isn't supposed to trigger in the first place
(because our buffer is big enough to handle any int).

Signed-off-by: Jeff King <peff@peff.net>
---
 transport-helper.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index dc90a1fb7..36408046e 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -347,14 +347,11 @@ static int set_helper_option(struct transport *transport,
 static void standard_options(struct transport *t)
 {
 	char buf[16];
-	int n;
 	int v = t->verbose;
 
 	set_helper_option(t, "progress", t->progress ? "true" : "false");
 
-	n = snprintf(buf, sizeof(buf), "%d", v + 1);
-	if (n >= sizeof(buf))
-		die("impossibly large verbosity value");
+	xsnprintf(buf, sizeof(buf), "%d", v + 1);
 	set_helper_option(t, "verbosity", buf);
 
 	switch (t->family) {
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 17/18] gc: replace local buffer with git_path
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (15 preceding siblings ...)
  2017-03-28 19:47 ` [PATCH 16/18] transport-helper: replace checked snprintf with xsnprintf Jeff King
@ 2017-03-28 19:47 ` Jeff King
  2017-03-28 19:48 ` [PATCH 18/18] daemon: use an argv_array to exec children Jeff King
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:47 UTC (permalink / raw)
  To: git

We probe the "17/" loose object directory for auto-gc, and
use a local buffer to format the path. We can just use
git_path() for this. It handles paths of any length
(reducing our error handling). And because we feed the
result straight to a system call, we can just use the static
variant.

Note that git_path also knows the string "objects/" is
special, and will replace it with git_object_directory()
when necessary.

Another alternative would be to use sha1_file_name() for the
pretend object "170000...", but that ends up being more
hassle for no gain, as we have to truncate the final path
component.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/gc.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c2c61a57b..2daede782 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -135,8 +135,6 @@ static int too_many_loose_objects(void)
 	 * distributed, we can check only one and get a reasonable
 	 * estimate.
 	 */
-	char path[PATH_MAX];
-	const char *objdir = get_object_directory();
 	DIR *dir;
 	struct dirent *ent;
 	int auto_threshold;
@@ -146,11 +144,7 @@ static int too_many_loose_objects(void)
 	if (gc_auto_threshold <= 0)
 		return 0;
 
-	if (sizeof(path) <= snprintf(path, sizeof(path), "%s/17", objdir)) {
-		warning(_("insanely long object directory %.*s"), 50, objdir);
-		return 0;
-	}
-	dir = opendir(path);
+	dir = opendir(git_path("objects/17"));
 	if (!dir)
 		return 0;
 
-- 
2.12.2.845.g55fcf8b10


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 18/18] daemon: use an argv_array to exec children
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (16 preceding siblings ...)
  2017-03-28 19:47 ` [PATCH 17/18] gc: replace local buffer with git_path Jeff King
@ 2017-03-28 19:48 ` Jeff King
  2017-03-28 22:33 ` [PATCH 0/18] snprintf cleanups Junio C Hamano
  2017-03-29  7:10 ` [PATCH] Makefile: detect errors in running spatch Jeff King
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:48 UTC (permalink / raw)
  To: git

Our struct child_process already has its own argv_array.
Let's use that to avoid having to format options into
separate buffers.

Note that we'll need to declare the child process outside of
the run_service_command() helper to do this. But that opens
up a further simplification, which is that the helper can
append to our argument list, saving each caller from
specifying "." manually.

Signed-off-by: Jeff King <peff@peff.net>
---
Of all the patches in this series, this is the one where I was most
undecided on whether the result was more readable or not.

My ulterior motive, of course, was to get rid of the unchecked
snprintf() into timebuf. But it could also just become an xsnprintf()
with no further changes.

 daemon.c | 38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/daemon.c b/daemon.c
index 473e6b6b6..f70d27b82 100644
--- a/daemon.c
+++ b/daemon.c
@@ -449,46 +449,42 @@ static void copy_to_log(int fd)
 	fclose(fp);
 }
 
-static int run_service_command(const char **argv)
+static int run_service_command(struct child_process *cld)
 {
-	struct child_process cld = CHILD_PROCESS_INIT;
-
-	cld.argv = argv;
-	cld.git_cmd = 1;
-	cld.err = -1;
-	if (start_command(&cld))
+	argv_array_push(&cld->args, ".");
+	cld->git_cmd = 1;
+	cld->err = -1;
+	if (start_command(cld))
 		return -1;
 
 	close(0);
 	close(1);
 
-	copy_to_log(cld.err);
+	copy_to_log(cld->err);
 
-	return finish_command(&cld);
+	return finish_command(cld);
 }
 
 static int upload_pack(void)
 {
-	/* Timeout as string */
-	char timeout_buf[64];
-	const char *argv[] = { "upload-pack", "--strict", NULL, ".", NULL };
-
-	argv[2] = timeout_buf;
-
-	snprintf(timeout_buf, sizeof timeout_buf, "--timeout=%u", timeout);
-	return run_service_command(argv);
+	struct child_process cld = CHILD_PROCESS_INIT;
+	argv_array_pushl(&cld.args, "upload-pack", "--strict", NULL);
+	argv_array_pushf(&cld.args, "--timeout=%u", timeout);
+	return run_service_command(&cld);
 }
 
 static int upload_archive(void)
 {
-	static const char *argv[] = { "upload-archive", ".", NULL };
-	return run_service_command(argv);
+	struct child_process cld = CHILD_PROCESS_INIT;
+	argv_array_push(&cld.args, "upload-archive");
+	return run_service_command(&cld);
 }
 
 static int receive_pack(void)
 {
-	static const char *argv[] = { "receive-pack", ".", NULL };
-	return run_service_command(argv);
+	struct child_process cld = CHILD_PROCESS_INIT;
+	argv_array_push(&cld.args, "receive-pack");
+	return run_service_command(&cld);
 }
 
 static struct daemon_service daemon_service[] = {
-- 
2.12.2.845.g55fcf8b10

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/18] diff: avoid fixed-size buffer for patch-ids
  2017-03-28 19:46 ` [PATCH 04/18] diff: avoid fixed-size buffer for patch-ids Jeff King
@ 2017-03-28 19:50   ` Jeff King
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-28 19:50 UTC (permalink / raw)
  To: git

On Tue, Mar 28, 2017 at 03:46:18PM -0400, Jeff King wrote:

> As a result, the data we feed to the sha1 computation may be
> truncated, and it's possible that a commit with a very long
> filename could erroneously collide in the patch-id space
> with another commit. For instance, if one commit modified
> "really-long-filename/foo" and another modified "bar" in the
> same directory.
> 
> In practice this is unlikely. Because the filenames are
> repeated, and because there's a single cutoff at the end of
> the buffer, the offending filename would have to be on the
> order of four times larger than PATH_MAX.
> 
> But it's easy to fix by moving to a strbuf.

The other obvious solution is to avoid formatting the string entirely,
and just feed the pieces to the sha1 computation. Unfortunately that
still involves formatting the modes (into a fixed-size buffer!) since we
need the sha1 of their string representations.

That patch is below for reference. I'm not sure if it's more readable or
not.

---
diff --git a/diff.c b/diff.c
index a628ac3a9..435c734f4 100644
--- a/diff.c
+++ b/diff.c
@@ -4570,6 +4570,19 @@ static void patch_id_consume(void *priv, char *line, unsigned long len)
 	data->patchlen += new_len;
 }
 
+static void patch_id_add_string(git_SHA_CTX *ctx, const char *str)
+{
+	git_SHA1_Update(ctx, str, strlen(str));
+}
+
+static void patch_id_add_mode(git_SHA_CTX *ctx, unsigned mode)
+{
+	/* large enough for 2^32 in octal */
+	char buf[12];
+	int len = xsnprintf(buf, sizeof(buf), "%06o", mode);
+	git_SHA1_Update(ctx, buf, len);
+}
+
 /* returns 0 upon success, and writes result into sha1 */
 static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only)
 {
@@ -4577,7 +4590,6 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 	int i;
 	git_SHA_CTX ctx;
 	struct patch_id_t data;
-	char buffer[PATH_MAX * 4 + 20];
 
 	git_SHA1_Init(&ctx);
 	memset(&data, 0, sizeof(struct patch_id_t));
@@ -4609,36 +4621,30 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 
 		len1 = remove_space(p->one->path, strlen(p->one->path));
 		len2 = remove_space(p->two->path, strlen(p->two->path));
-		if (p->one->mode == 0)
-			len1 = snprintf(buffer, sizeof(buffer),
-					"diff--gita/%.*sb/%.*s"
-					"newfilemode%06o"
-					"---/dev/null"
-					"+++b/%.*s",
-					len1, p->one->path,
-					len2, p->two->path,
-					p->two->mode,
-					len2, p->two->path);
-		else if (p->two->mode == 0)
-			len1 = snprintf(buffer, sizeof(buffer),
-					"diff--gita/%.*sb/%.*s"
-					"deletedfilemode%06o"
-					"---a/%.*s"
-					"+++/dev/null",
-					len1, p->one->path,
-					len2, p->two->path,
-					p->one->mode,
-					len1, p->one->path);
-		else
-			len1 = snprintf(buffer, sizeof(buffer),
-					"diff--gita/%.*sb/%.*s"
-					"---a/%.*s"
-					"+++b/%.*s",
-					len1, p->one->path,
-					len2, p->two->path,
-					len1, p->one->path,
-					len2, p->two->path);
-		git_SHA1_Update(&ctx, buffer, len1);
+		patch_id_add_string(&ctx, "diff--git");
+		patch_id_add_string(&ctx, "a/");
+		git_SHA1_Update(&ctx, p->one->path, len1);
+		patch_id_add_string(&ctx, "b/");
+		git_SHA1_Update(&ctx, p->two->path, len2);
+
+		if (p->one->mode == 0) {
+			patch_id_add_string(&ctx, "newfilemode");
+			patch_id_add_mode(&ctx, p->two->mode);
+			patch_id_add_string(&ctx, "---/dev/null");
+			patch_id_add_string(&ctx, "+++b/");
+			git_SHA1_Update(&ctx, p->two->path, len2);
+		} else if (p->two->mode == 0) {
+			patch_id_add_string(&ctx, "deletedfilemode");
+			patch_id_add_mode(&ctx, p->one->mode);
+			patch_id_add_string(&ctx, "---a/");
+			git_SHA1_Update(&ctx, p->one->path, len1);
+			patch_id_add_string(&ctx, "+++/dev/null");
+		} else {
+			patch_id_add_string(&ctx, "---a/");
+			git_SHA1_Update(&ctx, p->one->path, len1);
+			patch_id_add_string(&ctx, "+++b/");
+			git_SHA1_Update(&ctx, p->two->path, len2);
+		}
 
 		if (diff_header_only)
 			continue;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/18] snprintf cleanups
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (17 preceding siblings ...)
  2017-03-28 19:48 ` [PATCH 18/18] daemon: use an argv_array to exec children Jeff King
@ 2017-03-28 22:33 ` Junio C Hamano
  2017-03-29  3:41   ` Jeff King
  2017-03-29  7:10 ` [PATCH] Makefile: detect errors in running spatch Jeff King
  19 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-03-28 22:33 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> It's a lot of patches, but hopefully they're all pretty straightforward
> to read.

Yes, quite a lot of changes.  I didn't see anything questionable in
there.

As to the "patch-id" thing, I find the alternate one slightly easier
to read.  Also, exactly because this is not a performance critical
codepath, it may be better if patch_id_add_string() filtered out
whitespaces; that would allow the source to express things in more
natural way, e.g.

		patch_id_addf(&ctx, "new file mode");
		patch_id_addf(&ctx, "%06o", p->two->mode);
		patch_id_addf(&ctx, "--- /dev/null");
		patch_id_addf(&ctx, "+++ b/%.*s", len2, p->two->path);

Or I may be going overboard by bringing "addf" into the mix X-<.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/18] snprintf cleanups
  2017-03-28 22:33 ` [PATCH 0/18] snprintf cleanups Junio C Hamano
@ 2017-03-29  3:41   ` Jeff King
  2017-03-29 16:05     ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-03-29  3:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Mar 28, 2017 at 03:33:48PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It's a lot of patches, but hopefully they're all pretty straightforward
> > to read.
> 
> Yes, quite a lot of changes.  I didn't see anything questionable in
> there.
> 
> As to the "patch-id" thing, I find the alternate one slightly easier
> to read.  Also, exactly because this is not a performance critical
> codepath, it may be better if patch_id_add_string() filtered out
> whitespaces; that would allow the source to express things in more
> natural way, e.g.
> 
> 		patch_id_addf(&ctx, "new file mode");
> 		patch_id_addf(&ctx, "%06o", p->two->mode);
> 		patch_id_addf(&ctx, "--- /dev/null");
> 		patch_id_addf(&ctx, "+++ b/%.*s", len2, p->two->path);
> 
> Or I may be going overboard by bringing "addf" into the mix X-<.

I think there are two things going on in your example.

One is that obviously patch_id_addf() removes the spaces from the
result. But we could do that now by keeping the big strbuf_addf(), and
then just walking the result and feeding non-spaces.

The second is that your addf means we are back to formatting everything
into a buffer again. And it has to be dynamic to handle the final line
there, because "len2" isn't bounded. At which point we may as well go
back to sticking it all in one big strbuf (your example also breaks it
down line by line, but we could do that with separte strbuf_addf calls,
too).

Or you have to reimplement the printf format-parsing yourself, and write
into the sha1 instead of into the buffers. But that's probably insane.

I think the "no extra buffer with whitespace" combo is more like:

  void patch_id_add_buf(git_SHA1_CTX *ctx, const char *buf, size_t len)
  {
	for (; len > 0; buf++, len--) {
		if (!isspace(*buf))
			git_SHA1_Update(ctx, buf, 1);
	}
  }

  void patch_id_add_str(git_SHA1_CTX *ctx, const char *str)
  {
	patch_id_add_buf(ctx, strlen(str));
  }

  void patch_id_add_mode(git_SHA1_CTX *ctx, unsigned mode)
  {
	char buf[16]; /* big enough... */
	int len = xsnprintf(buf, "%06o", mode);
	patch_id_add_buf(ctx, buf, len);
  }

  patch_id_add_str(&ctx, "new file mode");
  patch_id_add_mode(&ctx, p->two->mode);
  patch_id_add_str(&ctx, "--- /dev/null");
  patch_id_add_str(&ctx, "+++ b/");
  patch_id_add_buf(&ctx, p->two->path, len2);

I dunno. I wondered if feeding single bytes to the sha1 update might
actually be noticeably slower, because I would assume that internally it
generally copies data in larger chunks. I didn't measure it, though.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] Makefile: detect errors in running spatch
  2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
                   ` (18 preceding siblings ...)
  2017-03-28 22:33 ` [PATCH 0/18] snprintf cleanups Junio C Hamano
@ 2017-03-29  7:10 ` Jeff King
  19 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-29  7:10 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Junio C Hamano

The "make coccicheck" target runs spatch against each source
file. But it does so in a for loop, so "make" never sees the
exit code of spatch. Worse, it redirects stderr to a log
file, so the user has no indication of any failure. And then
to top it all off, because we touched the patch file's
mtime, make will refuse to repeat the command because it
think the target is up-to-date.

So for example:

  $ make coccicheck SPATCH=does-not-exist
      SPATCH contrib/coccinelle/free.cocci
      SPATCH contrib/coccinelle/qsort.cocci
      SPATCH contrib/coccinelle/xstrdup_or_null.cocci
      SPATCH contrib/coccinelle/swap.cocci
      SPATCH contrib/coccinelle/strbuf.cocci
      SPATCH contrib/coccinelle/object_id.cocci
      SPATCH contrib/coccinelle/array.cocci
  $ make coccicheck SPATCH=does-not-exist
  make: Nothing to be done for 'coccicheck'.

With this patch, you get:

  $ make coccicheck SPATCH=does-not-exist
       SPATCH contrib/coccinelle/free.cocci
  /bin/sh: 4: does-not-exist: not found
  Makefile:2338: recipe for target 'contrib/coccinelle/free.cocci.patch' failed
  make: *** [contrib/coccinelle/free.cocci.patch] Error 1

It also dumps the log on failure, so any errors from spatch
itself (like syntax errors in our .cocci files) will be seen
by the user.

Signed-off-by: Jeff King <peff@peff.net>
---
This is a verbatim repost of:

  http://public-inbox.org/git/20170310083117.cbflqx7zbe4s7cqv@sigill.intra.peff.net/

I think this is a strict improvement over the status quo. The
conversation in that thread turned to possible refactorings, but since
those didn't materialize, I think we should apply this in the meantime.

 Makefile | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 9f8b35ad4..9b36068ac 100644
--- a/Makefile
+++ b/Makefile
@@ -2348,9 +2348,17 @@ check: common-cmds.h
 C_SOURCES = $(patsubst %.o,%.c,$(C_OBJ))
 %.cocci.patch: %.cocci $(C_SOURCES)
 	@echo '    ' SPATCH $<; \
+	ret=0; \
 	for f in $(C_SOURCES); do \
-		$(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS); \
-	done >$@ 2>$@.log; \
+		$(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
+			{ ret=$$?; break; }; \
+	done >$@+ 2>$@.log; \
+	if test $$ret != 0; \
+	then \
+		cat $@.log; \
+		exit 1; \
+	fi; \
+	mv $@+ $@; \
 	if test -s $@; \
 	then \
 		echo '    ' SPATCH result: $@; \
-- 
2.12.2.920.gc31091ce4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/18] snprintf cleanups
  2017-03-29  3:41   ` Jeff King
@ 2017-03-29 16:05     ` Junio C Hamano
  2017-03-30  6:27       ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-03-29 16:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Tue, Mar 28, 2017 at 03:33:48PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > It's a lot of patches, but hopefully they're all pretty straightforward
>> > to read.
>> 
>> Yes, quite a lot of changes.  I didn't see anything questionable in
>> there.
>> 
>> As to the "patch-id" thing, I find the alternate one slightly easier
>> to read.  Also, exactly because this is not a performance critical
>> codepath, it may be better if patch_id_add_string() filtered out
>> whitespaces; that would allow the source to express things in more
>> natural way, e.g.
>> 
>> 		patch_id_addf(&ctx, "new file mode");
>> 		patch_id_addf(&ctx, "%06o", p->two->mode);
>> 		patch_id_addf(&ctx, "--- /dev/null");
>> 		patch_id_addf(&ctx, "+++ b/%.*s", len2, p->two->path);
>> 
>> Or I may be going overboard by bringing "addf" into the mix X-<.
>
> I think there are two things going on in your example.
>
> One is that obviously patch_id_addf() removes the spaces from the
> result. But we could do that now by keeping the big strbuf_addf(), and
> then just walking the result and feeding non-spaces.
>
> The second is that your addf means we are back to formatting everything
> into a buffer again....

You are right to point out that I was blinded by the ugliness of
words stuck together without spaces in between, which was inherited
from the original code, and failed to see the sole point of this
series, which is to remove truncation without adding unnecessary
allocation and freeing.

Thanks for straighten my thinking out.  I think the seeming
ugliness, if it ever becomes a real problem, should be handled
outside this series after the dust settles.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/18] snprintf cleanups
  2017-03-29 16:05     ` Junio C Hamano
@ 2017-03-30  6:27       ` Jeff King
  2017-03-30 17:24         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-03-30  6:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Mar 29, 2017 at 09:05:33AM -0700, Junio C Hamano wrote:

> > I think there are two things going on in your example.
> >
> > One is that obviously patch_id_addf() removes the spaces from the
> > result. But we could do that now by keeping the big strbuf_addf(), and
> > then just walking the result and feeding non-spaces.
> >
> > The second is that your addf means we are back to formatting everything
> > into a buffer again....
> 
> You are right to point out that I was blinded by the ugliness of
> words stuck together without spaces in between, which was inherited
> from the original code, and failed to see the sole point of this
> series, which is to remove truncation without adding unnecessary
> allocation and freeing.
> 
> Thanks for straighten my thinking out.  I think the seeming
> ugliness, if it ever becomes a real problem, should be handled
> outside this series after the dust settles.

Yeah, the no-spaces thing should almost certainly wait.

There is still the minor question of whether skipping the strbuf
entirely is nicer, even if you still have to feed it strings without
spaces (i.e., what I posted in my initial reply).

I'm OK either with the series I posted, or wrapping up the alternative
in a commit message.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/18] snprintf cleanups
  2017-03-30  6:27       ` Jeff King
@ 2017-03-30 17:24         ` Junio C Hamano
  2017-03-30 18:26           ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-03-30 17:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Wed, Mar 29, 2017 at 09:05:33AM -0700, Junio C Hamano wrote:
>
>> > I think there are two things going on in your example.
>> >
>> > One is that obviously patch_id_addf() removes the spaces from the
>> > result. But we could do that now by keeping the big strbuf_addf(), and
>> > then just walking the result and feeding non-spaces.
>> >
>> > The second is that your addf means we are back to formatting everything
>> > into a buffer again....
>> 
>> You are right to point out that I was blinded by the ugliness of
>> words stuck together without spaces in between, which was inherited
>> from the original code, and failed to see the sole point of this
>> series, which is to remove truncation without adding unnecessary
>> allocation and freeing.
>> 
>> Thanks for straighten my thinking out.  I think the seeming
>> ugliness, if it ever becomes a real problem, should be handled
>> outside this series after the dust settles.
>
> Yeah, the no-spaces thing should almost certainly wait.
>
> There is still the minor question of whether skipping the strbuf
> entirely is nicer, even if you still have to feed it strings without
> spaces (i.e., what I posted in my initial reply).
>
> I'm OK either with the series I posted, or wrapping up the alternative
> in a commit message.

I do find the updated one easier to follow (if anything it is more
compact); I do not think it is worth a reroll, but it is easy enough
to replace the patch part of the original with the updated patch and
tweak "it's easy to fix by moving to a strbuf" in its log message to
something like "But it's easy to eliminate the allocation with a few
helper functions, and it makes the result easier to follow", so I am
tempted to go that route myself...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/18] snprintf cleanups
  2017-03-30 17:24         ` Junio C Hamano
@ 2017-03-30 18:26           ` Jeff King
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-03-30 18:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Mar 30, 2017 at 10:24:36AM -0700, Junio C Hamano wrote:

> > I'm OK either with the series I posted, or wrapping up the alternative
> > in a commit message.
> 
> I do find the updated one easier to follow (if anything it is more
> compact); I do not think it is worth a reroll, but it is easy enough
> to replace the patch part of the original with the updated patch and
> tweak "it's easy to fix by moving to a strbuf" in its log message to
> something like "But it's easy to eliminate the allocation with a few
> helper functions, and it makes the result easier to follow", so I am
> tempted to go that route myself...

Yeah, I think that's all that would be needed. Here it is wrapped up as
a patch:

-- >8 --
Subject: [PATCH] diff: avoid fixed-size buffer for patch-ids

To generate a patch id, we format the diff header into a
fixed-size buffer, and then feed the result to our sha1
computation. The fixed buffer has size '4*PATH_MAX + 20',
which in theory accommodates the four filenames plus some
extra data. Except:

  1. The filenames may not be constrained to PATH_MAX. The
     static value may not be a real limit on the current
     filesystem. Moreover, we may compute patch-ids for
     names stored only in git, without touching the current
     filesystem at all.

  2. The 20 bytes is not nearly enough to cover the
     extra content we put in the buffer.

As a result, the data we feed to the sha1 computation may be
truncated, and it's possible that a commit with a very long
filename could erroneously collide in the patch-id space
with another commit. For instance, if one commit modified
"really-long-filename/foo" and another modified "bar" in the
same directory.

In practice this is unlikely. Because the filenames are
repeated, and because there's a single cutoff at the end of
the buffer, the offending filename would have to be on the
order of four times larger than PATH_MAX.

We could fix this by moving to a strbuf. However, we can
observe that the purpose of formatting this in the first
place is to feed it to git_SHA1_Update(). So instead, let's
just feed each part of the formatted string directly. This
actually ends up more readable, and we can even factor out
some duplicated bits from the various conditional branches.

Technically this may change the output of patch-id for very
long filenames, but it's not worth making an exception for
this in the --stable output. It was a bug, and one that only
affected an unlikely set of paths.  And anyway, the exact
value would have varied from platform to platform depending
on the value of PATH_MAX, so there is no "stable" value.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c | 68 ++++++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/diff.c b/diff.c
index 58cb72d7e..92d78e459 100644
--- a/diff.c
+++ b/diff.c
@@ -4570,6 +4570,19 @@ static void patch_id_consume(void *priv, char *line, unsigned long len)
 	data->patchlen += new_len;
 }
 
+static void patch_id_add_string(git_SHA_CTX *ctx, const char *str)
+{
+	git_SHA1_Update(ctx, str, strlen(str));
+}
+
+static void patch_id_add_mode(git_SHA_CTX *ctx, unsigned mode)
+{
+	/* large enough for 2^32 in octal */
+	char buf[12];
+	int len = xsnprintf(buf, sizeof(buf), "%06o", mode);
+	git_SHA1_Update(ctx, buf, len);
+}
+
 /* returns 0 upon success, and writes result into sha1 */
 static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only)
 {
@@ -4577,7 +4590,6 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 	int i;
 	git_SHA_CTX ctx;
 	struct patch_id_t data;
-	char buffer[PATH_MAX * 4 + 20];
 
 	git_SHA1_Init(&ctx);
 	memset(&data, 0, sizeof(struct patch_id_t));
@@ -4609,36 +4621,30 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1,
 
 		len1 = remove_space(p->one->path, strlen(p->one->path));
 		len2 = remove_space(p->two->path, strlen(p->two->path));
-		if (p->one->mode == 0)
-			len1 = snprintf(buffer, sizeof(buffer),
-					"diff--gita/%.*sb/%.*s"
-					"newfilemode%06o"
-					"---/dev/null"
-					"+++b/%.*s",
-					len1, p->one->path,
-					len2, p->two->path,
-					p->two->mode,
-					len2, p->two->path);
-		else if (p->two->mode == 0)
-			len1 = snprintf(buffer, sizeof(buffer),
-					"diff--gita/%.*sb/%.*s"
-					"deletedfilemode%06o"
-					"---a/%.*s"
-					"+++/dev/null",
-					len1, p->one->path,
-					len2, p->two->path,
-					p->one->mode,
-					len1, p->one->path);
-		else
-			len1 = snprintf(buffer, sizeof(buffer),
-					"diff--gita/%.*sb/%.*s"
-					"---a/%.*s"
-					"+++b/%.*s",
-					len1, p->one->path,
-					len2, p->two->path,
-					len1, p->one->path,
-					len2, p->two->path);
-		git_SHA1_Update(&ctx, buffer, len1);
+		patch_id_add_string(&ctx, "diff--git");
+		patch_id_add_string(&ctx, "a/");
+		git_SHA1_Update(&ctx, p->one->path, len1);
+		patch_id_add_string(&ctx, "b/");
+		git_SHA1_Update(&ctx, p->two->path, len2);
+
+		if (p->one->mode == 0) {
+			patch_id_add_string(&ctx, "newfilemode");
+			patch_id_add_mode(&ctx, p->two->mode);
+			patch_id_add_string(&ctx, "---/dev/null");
+			patch_id_add_string(&ctx, "+++b/");
+			git_SHA1_Update(&ctx, p->two->path, len2);
+		} else if (p->two->mode == 0) {
+			patch_id_add_string(&ctx, "deletedfilemode");
+			patch_id_add_mode(&ctx, p->one->mode);
+			patch_id_add_string(&ctx, "---a/");
+			git_SHA1_Update(&ctx, p->one->path, len1);
+			patch_id_add_string(&ctx, "+++/dev/null");
+		} else {
+			patch_id_add_string(&ctx, "---a/");
+			git_SHA1_Update(&ctx, p->one->path, len1);
+			patch_id_add_string(&ctx, "+++b/");
+			git_SHA1_Update(&ctx, p->two->path, len2);
+		}
 
 		if (diff_header_only)
 			continue;
-- 
2.12.2.922.g77065305a


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/18] avoid using fixed PATH_MAX buffers for refs
  2017-03-28 19:46 ` [PATCH 07/18] avoid using fixed PATH_MAX buffers for refs Jeff King
@ 2017-04-17  6:00   ` Junio C Hamano
  2017-04-18  3:18     ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-04-17  6:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> diff --git a/builtin/replace.c b/builtin/replace.c
> index f83e7b8fc..065515bab 100644
> --- a/builtin/replace.c
> +++ b/builtin/replace.c
> @@ -93,26 +93,31 @@ typedef int (*each_replace_name_fn)(const char *name, const char *ref,
>  static int for_each_replace_name(const char **argv, each_replace_name_fn fn)
>  {
>  	const char **p, *full_hex;
> -	char ref[PATH_MAX];
> +	struct strbuf ref = STRBUF_INIT;
> +	size_t base_len;
>  	int had_error = 0;
>  	struct object_id oid;
>  
> +	strbuf_addstr(&ref, git_replace_ref_base);
> +	base_len = ref.len;
> +
>  	for (p = argv; *p; p++) {
>  		if (get_oid(*p, &oid)) {
>  			error("Failed to resolve '%s' as a valid ref.", *p);
>  			had_error = 1;
>  			continue;
>  		}
> -		full_hex = oid_to_hex(&oid);
> -		snprintf(ref, sizeof(ref), "%s%s", git_replace_ref_base, full_hex);
> -		/* read_ref() may reuse the buffer */
> -		full_hex = ref + strlen(git_replace_ref_base);
> -		if (read_ref(ref, oid.hash)) {
> +
> +		strbuf_setlen(&ref, base_len);
> +		strbuf_addstr(&ref, oid_to_hex(&oid));
> +		full_hex = ref.buf + base_len;
> +
> +		if (read_ref(ref.buf, oid.hash)) {
>  			error("replace ref '%s' not found.", full_hex);
>  			had_error = 1;
>  			continue;
>  		}
> -		if (fn(full_hex, ref, &oid))
> +		if (fn(full_hex, ref.buf, &oid))
>  			had_error = 1;
>  	}
>  	return had_error;

Don't we need to strbuf_release(&ref) before leaving this function?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/18] avoid using fixed PATH_MAX buffers for refs
  2017-04-17  6:00   ` Junio C Hamano
@ 2017-04-18  3:18     ` Jeff King
  2017-04-18  4:55       ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-04-18  3:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, Apr 16, 2017 at 11:00:25PM -0700, Junio C Hamano wrote:

> > diff --git a/builtin/replace.c b/builtin/replace.c
> > index f83e7b8fc..065515bab 100644
> > --- a/builtin/replace.c
> > +++ b/builtin/replace.c
> > @@ -93,26 +93,31 @@ typedef int (*each_replace_name_fn)(const char *name, const char *ref,
> >  static int for_each_replace_name(const char **argv, each_replace_name_fn fn)
> [...]
> 
> Don't we need to strbuf_release(&ref) before leaving this function?

Yes, good catch. Squashable patch is below.

I'm not sure how I missed that one. I double-checked the other hunks in
the patch and they are all fine.

diff --git a/builtin/replace.c b/builtin/replace.c
index 065515bab..ab17668f4 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -120,6 +120,7 @@ static int for_each_replace_name(const char **argv, each_replace_name_fn fn)
 		if (fn(full_hex, ref.buf, &oid))
 			had_error = 1;
 	}
+	strbuf_release(&ref);
 	return had_error;
 }
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/18] avoid using fixed PATH_MAX buffers for refs
  2017-04-18  3:18     ` Jeff King
@ 2017-04-18  4:55       ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2017-04-18  4:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Sun, Apr 16, 2017 at 11:00:25PM -0700, Junio C Hamano wrote:
>
>> > diff --git a/builtin/replace.c b/builtin/replace.c
>> > index f83e7b8fc..065515bab 100644
>> > --- a/builtin/replace.c
>> > +++ b/builtin/replace.c
>> > @@ -93,26 +93,31 @@ typedef int (*each_replace_name_fn)(const char *name, const char *ref,
>> >  static int for_each_replace_name(const char **argv, each_replace_name_fn fn)
>> [...]
>> 
>> Don't we need to strbuf_release(&ref) before leaving this function?
>
> Yes, good catch. Squashable patch is below.
>
> I'm not sure how I missed that one. I double-checked the other hunks in
> the patch and they are all fine.

Heh, I do not know how I missed that one while queuing, and while
advancing the topic to 'next'.  The third time I read over before
merging to 'master' was when I noticed it.

>
> diff --git a/builtin/replace.c b/builtin/replace.c
> index 065515bab..ab17668f4 100644
> --- a/builtin/replace.c
> +++ b/builtin/replace.c
> @@ -120,6 +120,7 @@ static int for_each_replace_name(const char **argv, each_replace_name_fn fn)
>  		if (fn(full_hex, ref.buf, &oid))
>  			had_error = 1;
>  	}
> +	strbuf_release(&ref);
>  	return had_error;
>  }
>  

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-04-18  4:55 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28 19:42 [PATCH 0/18] snprintf cleanups Jeff King
2017-03-28 19:45 ` [PATCH 01/18] do not check odb_mkstemp return value for errors Jeff King
2017-03-28 19:45 ` [PATCH 02/18] odb_mkstemp: write filename into strbuf Jeff King
2017-03-28 19:45 ` [PATCH 03/18] odb_mkstemp: use git_path_buf Jeff King
2017-03-28 19:46 ` [PATCH 04/18] diff: avoid fixed-size buffer for patch-ids Jeff King
2017-03-28 19:50   ` Jeff King
2017-03-28 19:46 ` [PATCH 05/18] tag: use strbuf to format tag header Jeff King
2017-03-28 19:46 ` [PATCH 06/18] fetch: use heap buffer to format reflog Jeff King
2017-03-28 19:46 ` [PATCH 07/18] avoid using fixed PATH_MAX buffers for refs Jeff King
2017-04-17  6:00   ` Junio C Hamano
2017-04-18  3:18     ` Jeff King
2017-04-18  4:55       ` Junio C Hamano
2017-03-28 19:46 ` [PATCH 08/18] avoid using mksnpath " Jeff King
2017-03-28 19:46 ` [PATCH 09/18] create_branch: move msg setup closer to point of use Jeff King
2017-03-28 19:46 ` [PATCH 10/18] create_branch: use xstrfmt for reflog message Jeff King
2017-03-28 19:46 ` [PATCH 11/18] name-rev: replace static buffer with strbuf Jeff King
2017-03-28 19:46 ` [PATCH 12/18] receive-pack: print --pack-header directly into argv array Jeff King
2017-03-28 19:46 ` [PATCH 13/18] replace unchecked snprintf calls with heap buffers Jeff King
2017-03-28 19:46 ` [PATCH 14/18] combine-diff: replace malloc/snprintf with xstrfmt Jeff King
2017-03-28 19:46 ` [PATCH 15/18] convert unchecked snprintf into xsnprintf Jeff King
2017-03-28 19:47 ` [PATCH 16/18] transport-helper: replace checked snprintf with xsnprintf Jeff King
2017-03-28 19:47 ` [PATCH 17/18] gc: replace local buffer with git_path Jeff King
2017-03-28 19:48 ` [PATCH 18/18] daemon: use an argv_array to exec children Jeff King
2017-03-28 22:33 ` [PATCH 0/18] snprintf cleanups Junio C Hamano
2017-03-29  3:41   ` Jeff King
2017-03-29 16:05     ` Junio C Hamano
2017-03-30  6:27       ` Jeff King
2017-03-30 17:24         ` Junio C Hamano
2017-03-30 18:26           ` Jeff King
2017-03-29  7:10 ` [PATCH] Makefile: detect errors in running spatch Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).