git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Erik Faye-Lund <kusmabite@gmail.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int"
Date: Fri, 10 Dec 2021 06:41:25 -0500	[thread overview]
Message-ID: <YbM85W3N0ySi5k+H@coredump.intra.peff.net> (raw)
In-Reply-To: <211210.86lf0sdah1.gmgdl@evledraar.gmail.com>

On Fri, Dec 10, 2021 at 11:22:59AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > Dropping the st_mult() does nothing to fix the actual problem (which is
> > that this function should use a more appropriate type), but introduces
> > new failure modes.
> 
> Yes you're entirely right. I had some stupid blinders on while writing
> this. FWIW I think I was experimenting with some local macros and
> conflated a testing of the overflow of n*n in gdb with the caste'd
> version, which you rightly point out here won't have the overflow issue
> at all. Sorry.

I'm not sure if this is helpful or not, but this is the minimal fix I
came up with that runs the testcase I showed earlier. It's basically
just swapping out "int" for "ssize_t" for any variables we use to index
the arrays (though note a few are themselves held in arrays, and we have
to cross some function boundaries).

I won't be surprised if it doesn't hit all cases, or if it even hits a
few it doesn't need to (e.g., should "phase" be dragged along with "i"
and "j" in the first hunk?). I mostly did guess-and-check on the
test-case, fixing whatever segfaulted and then running again until it
worked. I didn't even really read the code very carefully.

I think you _did_ do more of that careful reading, and broke down the
refactorings into separate patches in your series. Which is good. So I
think what we'd want is to pick out those parts of your series that end
up switching the variable type. My goal in sharing this here is just to
show that the end result of the fix can (and IMHO should) be around this
same order of magnitude.

---
diff --git a/linear-assignment.c b/linear-assignment.c
index ecffc09be6..3efa30c50b 100644
--- a/linear-assignment.c
+++ b/linear-assignment.c
@@ -13,11 +13,11 @@
  * i is `cost[j + column_count * i].
  */
 void compute_assignment(int column_count, int row_count, int *cost,
-			int *column2row, int *row2column)
+			ssize_t *column2row, ssize_t *row2column)
 {
 	int *v, *d;
 	int *free_row, free_count = 0, saved_free_count, *pred, *col;
-	int i, j, phase;
+	ssize_t i, j, phase;
 
 	if (column_count < 2) {
 		memset(column2row, 0, sizeof(int) * column_count);
@@ -31,7 +31,7 @@ void compute_assignment(int column_count, int row_count, int *cost,
 
 	/* column reduction */
 	for (j = column_count - 1; j >= 0; j--) {
-		int i1 = 0;
+		ssize_t i1 = 0;
 
 		for (i = 1; i < row_count; i++)
 			if (COST(j, i1) > COST(j, i))
@@ -51,7 +51,7 @@ void compute_assignment(int column_count, int row_count, int *cost,
 	/* reduction transfer */
 	ALLOC_ARRAY(free_row, row_count);
 	for (i = 0; i < row_count; i++) {
-		int j1 = row2column[i];
+		ssize_t j1 = row2column[i];
 		if (j1 == -1)
 			free_row[free_count++] = i;
 		else if (j1 < -1)
@@ -74,13 +74,13 @@ void compute_assignment(int column_count, int row_count, int *cost,
 
 	/* augmenting row reduction */
 	for (phase = 0; phase < 2; phase++) {
-		int k = 0;
+		ssize_t k = 0;
 
 		saved_free_count = free_count;
 		free_count = 0;
 		while (k < saved_free_count) {
 			int u1, u2;
-			int j1 = 0, j2, i0;
+			ssize_t j1 = 0, j2, i0;
 
 			i = free_row[k++];
 			u1 = COST(j1, i) - v[j1];
@@ -130,7 +130,7 @@ void compute_assignment(int column_count, int row_count, int *cost,
 	ALLOC_ARRAY(pred, column_count);
 	ALLOC_ARRAY(col, column_count);
 	for (free_count = 0; free_count < saved_free_count; free_count++) {
-		int i1 = free_row[free_count], low = 0, up = 0, last, k;
+		ssize_t i1 = free_row[free_count], low = 0, up = 0, last, k;
 		int min, c, u1;
 
 		for (j = 0; j < column_count; j++) {
@@ -192,7 +192,7 @@ void compute_assignment(int column_count, int row_count, int *cost,
 		/* augmentation */
 		do {
 			if (j < 0)
-				BUG("negative j: %d", j);
+				BUG("negative j: %"PRIdMAX, (intmax_t)j);
 			i = pred[j];
 			column2row[j] = i;
 			SWAP(j, row2column[i]);
diff --git a/linear-assignment.h b/linear-assignment.h
index 1dfea76629..7005521d61 100644
--- a/linear-assignment.h
+++ b/linear-assignment.h
@@ -14,7 +14,7 @@
  * row_count).
  */
 void compute_assignment(int column_count, int row_count, int *cost,
-			int *column2row, int *row2column);
+			ssize_t *column2row, ssize_t *row2column);
 
 /* The maximal cost in the cost matrix (to prevent integer overflows). */
 #define COST_MAX (1<<16)
diff --git a/range-diff.c b/range-diff.c
index cac89a2f4f..f1e1e27bf9 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -308,9 +308,10 @@ static int diffsize(const char *a, const char *b)
 static void get_correspondences(struct string_list *a, struct string_list *b,
 				int creation_factor)
 {
-	int n = a->nr + b->nr;
-	int *cost, c, *a2b, *b2a;
-	int i, j;
+	size_t n = a->nr + b->nr;
+	int *cost, c;
+	ssize_t *a2b, *b2a;
+	size_t i, j;
 
 	ALLOC_ARRAY(cost, st_mult(n, n));
 	ALLOC_ARRAY(a2b, n);

  reply	other threads:[~2021-12-10 11:41 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 19:19 [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 01/10] string-list API: change "nr" and "alloc" to "size_t" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int" Ævar Arnfjörð Bjarmason
2021-12-10  3:39   ` Jeff King
2021-12-10 10:22     ` Ævar Arnfjörð Bjarmason
2021-12-10 11:41       ` Jeff King [this message]
2021-12-10 12:31         ` Ævar Arnfjörð Bjarmason
2021-12-10 19:24           ` Phillip Wood
2021-12-14 14:34           ` Jeff King
2021-12-10 14:27         ` Johannes Schindelin
2021-12-10 14:58           ` Ævar Arnfjörð Bjarmason
2021-12-11 14:01             ` Johannes Schindelin
2021-12-12 17:44               ` Ævar Arnfjörð Bjarmason
2021-12-14 14:42           ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 03/10] range-diff.c: use "size_t" to refer to "struct string_list"'s "nr" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 04/10] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 05/10] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 06/10] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 07/10] linear-assignment.c: convert a macro to a "static inline" function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 08/10] linear-assignment.c: detect signed add/mul on GCC and Clang Ævar Arnfjörð Bjarmason
2021-12-10  3:56   ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 09/10] linear-assignment.c: add and use intprops.h from Gnulib Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 10/10] linear-assignment.c: use "intmax_t" instead of "int" Ævar Arnfjörð Bjarmason
2021-12-10  4:00   ` Jeff King
2021-12-10 12:30 ` [RFC PATCH v2 0/5] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-10 12:30   ` [RFC PATCH v2 1/5] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-14 13:36     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 2/5] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-14 13:39     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 3/5] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-14 13:40     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 4/5] range-diff.c: rename "n" to "column_count" in get_correspondences() Ævar Arnfjörð Bjarmason
2021-12-14 13:42     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 5/5] range-diff: fix integer overflow & segfault on cost[i + n * j] Ævar Arnfjörð Bjarmason
2021-12-14 14:04     ` Jeff King
2021-12-10 14:31 ` [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Johannes Schindelin
2021-12-10 15:07   ` Ævar Arnfjörð Bjarmason
2021-12-21 23:22   ` Philip Oakley
2021-12-21 23:36     ` Ævar Arnfjörð Bjarmason
2021-12-22 20:50       ` Johannes Schindelin
2021-12-22 21:11         ` Jeff King
2021-12-24 11:15       ` Philip Oakley
2021-12-24 16:46         ` Ævar Arnfjörð Bjarmason
2021-12-24 18:31           ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbM85W3N0ySi5k+H@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=kusmabite@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).