RE: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* RE: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
@ 2019-07-30 17:08 Randall S. Becker
  2019-07-30 17:31 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Randall S. Becker @ 2019-07-30 17:08 UTC (permalink / raw)
  To: 'Junio C Hamano', git; +Cc: git-packagers

Hi All,

A preview of the situation with testing 2.23.0.rc0 on NonStop is not great. We have had some new failures right off the bat on our NonStop platforms. This is a preview of what we find within the first bit of testing. The tests run a long time, so more to come.

t0016: oidmap

Subtest 6 had an ordering issue. We do not know whether the problem is the code or the test result not keeping up with the code changes.
--- expect      2019-07-30 16:56:36 +0000
+++ actual      2019-07-30 16:56:36 +0000
@@ -1,6 +1,6 @@
 NULL
 NULL
 NULL
+7c7cd714e262561f73f3079dfca4e8724682ac21 3
 139b20d8e6c5b496de61f033f642d0e3dbff528d 2
 d79ce1670bdcb76e6d1da2ae095e890ccb326ae9 1
-7c7cd714e262561f73f3079dfca4e8724682ac21 3


t0066: dir-iterator

Subtest 4 depends on a non-portable error code. ENOENT is not guaranteed to be the same on all systems. It is 4002 on NonStop and 2 on many other machines.
--- expected-inexistent-path-output     2019-07-30 16:58:51 +0000
+++ actual-inexistent-path-output       2019-07-30 16:58:50 +0000
@@ -1 +1 @@
-dir_iterator_begin failure: 2
+dir_iterator_begin failure: 4002

Subtest 5 also depends on a non-portable error code. ENOTDIR is not guaranteed to be the same on all systems. It is 4020 on NonStop and 20 on many other machines.
--- expected-non-dir-output     2019-07-30 16:58:51 +0000
+++ actual-non-dir-output       2019-07-30 16:58:51 +0000
@@ -1 +1 @@
-dir_iterator_begin failure: 20
+dir_iterator_begin failure: 4020

Randall

-- Brief whoami:
 NonStop developer since approximately 211288444200000000
 UNIX developer since approximately 421664400
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 17:08 [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop Randall S. Becker
@ 2019-07-30 17:31 ` Junio C Hamano
  2019-07-30 18:09   ` Matheus Tavares Bernardino
  2019-07-30 18:10   ` Randall S. Becker
  2019-07-30 19:45 ` Jeff King
  2019-07-30 19:49 ` Todd Zullinger
  2 siblings, 2 replies; 24+ messages in thread
From: Junio C Hamano @ 2019-07-30 17:31 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git, git-packagers

"Randall S. Becker" <rsbecker@nexbridge.com> writes:

> t0066: dir-iterator
>
> Subtest 4 depends on a non-portable error code. ENOENT is not guaranteed ...
> Subtest 5 also depends on a non-portable error code. ENOTDIR is not gua...

Yikes, and sorry.  I've become somewhat complacent after relying on
how good our other reviewers are, pretty much ignored the new code
in fringes like t/helper/, and failed catch an obvious amateurish
mistake like this one.

I do not think of a portable way to map an int ENOENT to a string
"ENOENT", but there are only only two errors test-dir-iterator test
code cares about, so perhaps a patch like the following may be
sufficient.

I wonder if a tool like sparse can help us catch a pattern that
feeds errno to "%d" format.

 t/helper/test-dir-iterator.c | 11 ++++++++++-
 t/t0066-dir-iterator.sh      |  4 ++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index a5b96cb0dc..c7c30664da 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,6 +4,15 @@
 #include "iterator.h"
 #include "dir-iterator.h"
 
+static const char *error_name(int error_number)
+{
+	switch (error_number) {
+	case ENOENT: return "ENOENT";
+	case ENOTDIR: return "ENOTDIR";
+	default: return "ESOMETHINGELSE";
+	}
+}
+
 /*
  * usage:
  * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
@@ -31,7 +40,7 @@ int cmd__dir_iterator(int argc, const char **argv)
 	diter = dir_iterator_begin(path.buf, flags);
 
 	if (!diter) {
-		printf("dir_iterator_begin failure: %d\n", errno);
+		printf("dir_iterator_begin failure: %s\n", error_name(errno));
 		exit(EXIT_FAILURE);
 	}
 
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 9354d3f1ed..92910e4e6c 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -55,13 +55,13 @@ test_expect_success 'dir-iterator should list files in the correct order' '
 test_expect_success 'begin should fail upon inexistent paths' '
 	test_must_fail test-tool dir-iterator ./inexistent-path \
 		>actual-inexistent-path-output &&
-	echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
+	echo "dir_iterator_begin failure: ENOENT" >expected-inexistent-path-output &&
 	test_cmp expected-inexistent-path-output actual-inexistent-path-output
 '
 
 test_expect_success 'begin should fail upon non directory paths' '
 	test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
-	echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
+	echo "dir_iterator_begin failure: ENOTDIR" >expected-non-dir-output &&
 	test_cmp expected-non-dir-output actual-non-dir-output
 '
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 17:31 ` Junio C Hamano
@ 2019-07-30 18:09   ` Matheus Tavares Bernardino
  2019-07-30 18:10   ` Randall S. Becker
  1 sibling, 0 replies; 24+ messages in thread
From: Matheus Tavares Bernardino @ 2019-07-30 18:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Randall S. Becker, git, git-packagers

Hi, Junio and Randall

On Tue, Jul 30, 2019 at 2:31 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Randall S. Becker" <rsbecker@nexbridge.com> writes:
>
> > t0066: dir-iterator
> >
> > Subtest 4 depends on a non-portable error code. ENOENT is not guaranteed ...
> > Subtest 5 also depends on a non-portable error code. ENOTDIR is not gua...
>
> Yikes, and sorry.  I've become somewhat complacent after relying on
> how good our other reviewers are, pretty much ignored the new code
> in fringes like t/helper/, and failed catch an obvious amateurish
> mistake like this one.
>
> I do not think of a portable way to map an int ENOENT to a string
> "ENOENT", but there are only only two errors test-dir-iterator test
> code cares about, so perhaps a patch like the following may be
> sufficient.

Sorry for that. I totally overlooked the non-portability of error
codes when making the patch. But the proposed fix seems very good. And
having the literal string at the test instead of a number is much more
significant, as well.

> I wonder if a tool like sparse can help us catch a pattern that
> feeds errno to "%d" format.
>
>  t/helper/test-dir-iterator.c | 11 ++++++++++-
>  t/t0066-dir-iterator.sh      |  4 ++--
>  2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
> index a5b96cb0dc..c7c30664da 100644
> --- a/t/helper/test-dir-iterator.c
> +++ b/t/helper/test-dir-iterator.c
> @@ -4,6 +4,15 @@
>  #include "iterator.h"
>  #include "dir-iterator.h"
>
> +static const char *error_name(int error_number)
> +{
> +       switch (error_number) {
> +       case ENOENT: return "ENOENT";
> +       case ENOTDIR: return "ENOTDIR";
> +       default: return "ESOMETHINGELSE";
> +       }
> +}
> +
>  /*
>   * usage:
>   * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
> @@ -31,7 +40,7 @@ int cmd__dir_iterator(int argc, const char **argv)
>         diter = dir_iterator_begin(path.buf, flags);
>
>         if (!diter) {
> -               printf("dir_iterator_begin failure: %d\n", errno);
> +               printf("dir_iterator_begin failure: %s\n", error_name(errno));
>                 exit(EXIT_FAILURE);
>         }
>
> diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
> index 9354d3f1ed..92910e4e6c 100755
> --- a/t/t0066-dir-iterator.sh
> +++ b/t/t0066-dir-iterator.sh
> @@ -55,13 +55,13 @@ test_expect_success 'dir-iterator should list files in the correct order' '
>  test_expect_success 'begin should fail upon inexistent paths' '
>         test_must_fail test-tool dir-iterator ./inexistent-path \
>                 >actual-inexistent-path-output &&
> -       echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
> +       echo "dir_iterator_begin failure: ENOENT" >expected-inexistent-path-output &&
>         test_cmp expected-inexistent-path-output actual-inexistent-path-output
>  '
>
>  test_expect_success 'begin should fail upon non directory paths' '
>         test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
> -       echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
> +       echo "dir_iterator_begin failure: ENOTDIR" >expected-non-dir-output &&
>         test_cmp expected-non-dir-output actual-non-dir-output
>  '
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 17:31 ` Junio C Hamano
  2019-07-30 18:09   ` Matheus Tavares Bernardino
@ 2019-07-30 18:10   ` Randall S. Becker
  2019-07-30 18:35     ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Randall S. Becker @ 2019-07-30 18:10 UTC (permalink / raw)
  To: 'Junio C Hamano'; +Cc: git, git-packagers

On July 30, 2019 1:32 PM, Junio C Hamano wrote:
> "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> 
> > t0066: dir-iterator
> >
> > Subtest 4 depends on a non-portable error code. ENOENT is not
> guaranteed ...
> > Subtest 5 also depends on a non-portable error code. ENOTDIR is not
gua...
> 
> Yikes, and sorry.  I've become somewhat complacent after relying on how
> good our other reviewers are, pretty much ignored the new code in fringes
> like t/helper/, and failed catch an obvious amateurish mistake like this
one.
> 
> I do not think of a portable way to map an int ENOENT to a string
"ENOENT",
> but there are only only two errors test-dir-iterator test code cares
about, so
> perhaps a patch like the following may be sufficient.
> 
> I wonder if a tool like sparse can help us catch a pattern that feeds
errno to
> "%d" format.
> 
>  t/helper/test-dir-iterator.c | 11 ++++++++++-
>  t/t0066-dir-iterator.sh      |  4 ++--
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index
> a5b96cb0dc..c7c30664da 100644
> --- a/t/helper/test-dir-iterator.c
> +++ b/t/helper/test-dir-iterator.c
> @@ -4,6 +4,15 @@
>  #include "iterator.h"
>  #include "dir-iterator.h"
> 
> +static const char *error_name(int error_number) {
> +	switch (error_number) {
> +	case ENOENT: return "ENOENT";
> +	case ENOTDIR: return "ENOTDIR";
> +	default: return "ESOMETHINGELSE";
> +	}
> +}
> +
>  /*
>   * usage:
>   * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
@@ -
> 31,7 +40,7 @@ int cmd__dir_iterator(int argc, const char **argv)
>  	diter = dir_iterator_begin(path.buf, flags);
> 
>  	if (!diter) {
> -		printf("dir_iterator_begin failure: %d\n", errno);
> +		printf("dir_iterator_begin failure: %s\n",
error_name(errno));
>  		exit(EXIT_FAILURE);
>  	}
> 
> diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh index
> 9354d3f1ed..92910e4e6c 100755
> --- a/t/t0066-dir-iterator.sh
> +++ b/t/t0066-dir-iterator.sh
> @@ -55,13 +55,13 @@ test_expect_success 'dir-iterator should list files in
> the correct order' '
>  test_expect_success 'begin should fail upon inexistent paths' '
>  	test_must_fail test-tool dir-iterator ./inexistent-path \
>  		>actual-inexistent-path-output &&
> -	echo "dir_iterator_begin failure: 2" >expected-inexistent-path-
> output &&
> +	echo "dir_iterator_begin failure: ENOENT"
> +>expected-inexistent-path-output &&
>  	test_cmp expected-inexistent-path-output actual-inexistent-path-
> output  '
> 
>  test_expect_success 'begin should fail upon non directory paths' '
>  	test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output
> &&
> -	echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
> +	echo "dir_iterator_begin failure: ENOTDIR" >expected-non-dir-
> output &&
>  	test_cmp expected-non-dir-output actual-non-dir-output  '
> 

Seems reasonable. Better than trying to use strerror(), which previously
(I'm not sure whether it was this project or another) had a similar mapping
issue because the error text does not match either.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 18:10   ` Randall S. Becker
@ 2019-07-30 18:35     ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2019-07-30 18:35 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git, git-packagers

"Randall S. Becker" <rsbecker@nexbridge.com> writes:

> On July 30, 2019 1:32 PM, Junio C Hamano wrote:
>> 
>> I wonder if a tool like sparse can help us catch a pattern that feeds
> errno to
>> "%d" format.
>>  ...
> Seems reasonable. Better than trying to use strerror(), which previously
> (I'm not sure whether it was this project or another) had a similar mapping
> issue because the error text does not match either.

Yup, that was also us X-<.

Here is what I queued for now.

-- >8 --
Subject: [PATCH] test-dir-iterator: do not assume errno values

A few tests printed 'errno' as an integer and compared with
hardcoded integers; this is obviously not portable.

A two things to note are:

 - the string obtained by strerror() is not portable, and cannot be
   used for the purpose of these tests.

 - there unfortunately isn't a portable way to map error numbers to
   error names.

As we only care about a few selected errors, just map the error
number to the name before emitting for comparison.

Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/helper/test-dir-iterator.c | 11 ++++++++++-
 t/t0066-dir-iterator.sh      |  4 ++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index a5b96cb0dc..c7c30664da 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,6 +4,15 @@
 #include "iterator.h"
 #include "dir-iterator.h"
 
+static const char *error_name(int error_number)
+{
+	switch (error_number) {
+	case ENOENT: return "ENOENT";
+	case ENOTDIR: return "ENOTDIR";
+	default: return "ESOMETHINGELSE";
+	}
+}
+
 /*
  * usage:
  * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
@@ -31,7 +40,7 @@ int cmd__dir_iterator(int argc, const char **argv)
 	diter = dir_iterator_begin(path.buf, flags);
 
 	if (!diter) {
-		printf("dir_iterator_begin failure: %d\n", errno);
+		printf("dir_iterator_begin failure: %s\n", error_name(errno));
 		exit(EXIT_FAILURE);
 	}
 
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 9354d3f1ed..92910e4e6c 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -55,13 +55,13 @@ test_expect_success 'dir-iterator should list files in the correct order' '
 test_expect_success 'begin should fail upon inexistent paths' '
 	test_must_fail test-tool dir-iterator ./inexistent-path \
 		>actual-inexistent-path-output &&
-	echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
+	echo "dir_iterator_begin failure: ENOENT" >expected-inexistent-path-output &&
 	test_cmp expected-inexistent-path-output actual-inexistent-path-output
 '
 
 test_expect_success 'begin should fail upon non directory paths' '
 	test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
-	echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
+	echo "dir_iterator_begin failure: ENOTDIR" >expected-non-dir-output &&
 	test_cmp expected-non-dir-output actual-non-dir-output
 '
 
-- 
2.23.0-rc0-134-gc2d418fd54



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 17:08 [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop Randall S. Becker
  2019-07-30 17:31 ` Junio C Hamano
@ 2019-07-30 19:45 ` Jeff King
  2019-07-30 20:25   ` Randall S. Becker
  2019-07-30 19:49 ` Todd Zullinger
  2 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2019-07-30 19:45 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: 'Junio C Hamano', git, git-packagers

On Tue, Jul 30, 2019 at 01:08:37PM -0400, Randall S. Becker wrote:

> t0016: oidmap
> 
> Subtest 6 had an ordering issue. We do not know whether the problem is the code or the test result not keeping up with the code changes.
> --- expect      2019-07-30 16:56:36 +0000
> +++ actual      2019-07-30 16:56:36 +0000
> @@ -1,6 +1,6 @@
>  NULL
>  NULL
>  NULL
> +7c7cd714e262561f73f3079dfca4e8724682ac21 3
>  139b20d8e6c5b496de61f033f642d0e3dbff528d 2
>  d79ce1670bdcb76e6d1da2ae095e890ccb326ae9 1
> -7c7cd714e262561f73f3079dfca4e8724682ac21 3

This one is very curious. It's iterating a hash, which _seems_ like it
would produce non-deterministic output. But neither this test nor the
hashmap test it is based on sorts the output, and they pass consistently
for me. I assume that's because while hash ordering is not guaranteed,
it happens to be the same as long the pattern of inserts is the same
(with our implementation, which does not do any hash randomization).

But I am scratching my head as to what could be different on your
platform that would cause a different ordering (especially when the
hashmap test this is based on doesn't get one!).

I guess in some sense it may not be worth tracking down, and we should
just sort the output of a hash iteration unconditionally when comparing
it to expected output.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 17:08 [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop Randall S. Becker
  2019-07-30 17:31 ` Junio C Hamano
  2019-07-30 19:45 ` Jeff King
@ 2019-07-30 19:49 ` Todd Zullinger
  2019-07-30 20:02   ` Jeff King
  2019-07-30 20:27   ` Randall S. Becker
  2 siblings, 2 replies; 24+ messages in thread
From: Todd Zullinger @ 2019-07-30 19:49 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Junio C Hamano', Christian Couder, SZEDER Gábor,
	Jeff King, git, git-packagers

Hi,

[added Christian, SZEDER, and Jeff to Cc as author and
helpers on the newly-added t0016-oidmap]

Randall S. Becker wrote:
> A preview of the situation with testing 2.23.0.rc0 on
> NonStop is not great. We have had some new failures right
> off the bat on our NonStop platforms. This is a preview of
> what we find within the first bit of testing. The tests
> run a long time, so more to come.
> 
> t0016: oidmap
> 
> Subtest 6 had an ordering issue. We do not know whether
> the problem is the code or the test result not keeping up
> with the code changes.
>
> --- expect      2019-07-30 16:56:36 +0000
> +++ actual      2019-07-30 16:56:36 +0000
> @@ -1,6 +1,6 @@
>  NULL
>  NULL
>  NULL
> +7c7cd714e262561f73f3079dfca4e8724682ac21 3
>  139b20d8e6c5b496de61f033f642d0e3dbff528d 2
>  d79ce1670bdcb76e6d1da2ae095e890ccb326ae9 1
> -7c7cd714e262561f73f3079dfca4e8724682ac21 3

I hit the same failure while building for Fedora on the
s390x architecture.  I have not dug into it much yet, but
perhaps this is an endianess issue?

-- 
Todd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 19:49 ` Todd Zullinger
@ 2019-07-30 20:02   ` Jeff King
  2019-07-30 20:39     ` Junio C Hamano
  2019-07-30 20:56     ` SZEDER Gábor
  2019-07-30 20:27   ` Randall S. Becker
  1 sibling, 2 replies; 24+ messages in thread
From: Jeff King @ 2019-07-30 20:02 UTC (permalink / raw)
  To: Todd Zullinger
  Cc: René Scharfe, Randall S. Becker, 'Junio C Hamano',
	Christian Couder, SZEDER Gábor, git, git-packagers

On Tue, Jul 30, 2019 at 03:49:38PM -0400, Todd Zullinger wrote:

> > Subtest 6 had an ordering issue. We do not know whether
> > the problem is the code or the test result not keeping up
> > with the code changes.
> >
> > --- expect      2019-07-30 16:56:36 +0000
> > +++ actual      2019-07-30 16:56:36 +0000
> > @@ -1,6 +1,6 @@
> >  NULL
> >  NULL
> >  NULL
> > +7c7cd714e262561f73f3079dfca4e8724682ac21 3
> >  139b20d8e6c5b496de61f033f642d0e3dbff528d 2
> >  d79ce1670bdcb76e6d1da2ae095e890ccb326ae9 1
> > -7c7cd714e262561f73f3079dfca4e8724682ac21 3
> 
> I hit the same failure while building for Fedora on the
> s390x architecture.  I have not dug into it much yet, but
> perhaps this is an endianess issue?

Ah, of course. Our oid hashing is done by just picking off the first
bytes of the sha1, and it doesn't care about endianness (because these
are just internal-to-memory hashes).

We _could_ reconcile that like this:

diff --git a/hashmap.h b/hashmap.h
index 8424911566..493229ac54 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -116,19 +116,11 @@ unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len);
  * Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
  * for use in hash tables. Cryptographic hashes are supposed to have
  * uniform distribution, so in contrast to `memhash()`, this just copies
- * the first `sizeof(int)` bytes without shuffling any bits. Note that
- * the results will be different on big-endian and little-endian
- * platforms, so they should not be stored or transferred over the net.
+ * the first `sizeof(int)` bytes without shuffling any bits.
  */
 static inline unsigned int oidhash(const struct object_id *oid)
 {
-	/*
-	 * Equivalent to 'return *(unsigned int *)oid->hash;', but safe on
-	 * platforms that don't support unaligned reads.
-	 */
-	unsigned int hash;
-	memcpy(&hash, oid->hash, sizeof(hash));
-	return hash;
+	return get_be32(oid->hash);
 }
 
 /*
diff --git a/t/t0016-oidmap.sh b/t/t0016-oidmap.sh
index bbe719e950..6656db9d69 100755
--- a/t/t0016-oidmap.sh
+++ b/t/t0016-oidmap.sh
@@ -93,9 +93,9 @@ put three 3
 iterate" "NULL
 NULL
 NULL
+$(git rev-parse three) 3
 $(git rev-parse two) 2
-$(git rev-parse one) 1
-$(git rev-parse three) 3"
+$(git rev-parse one) 1"
 
 '
 

which not only fixes this test but any other hash-based oddities. I
wonder if it's appreciably less efficient. I'll bet I could nerd-snipe
René into doing a bunch of measurements and explorations of the
disassembled code. ;)

-Peff

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 19:45 ` Jeff King
@ 2019-07-30 20:25   ` Randall S. Becker
  0 siblings, 0 replies; 24+ messages in thread
From: Randall S. Becker @ 2019-07-30 20:25 UTC (permalink / raw)
  To: 'Jeff King'; +Cc: 'Junio C Hamano', git, git-packagers

On July 30, 2019 3:45 PM, Jeff King wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: 'Junio C Hamano' <gitster@pobox.com>; git@vger.kernel.org; git-
> packagers@googlegroups.com
> Subject: Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
> 
> On Tue, Jul 30, 2019 at 01:08:37PM -0400, Randall S. Becker wrote:
> 
> > t0016: oidmap
> >
> > Subtest 6 had an ordering issue. We do not know whether the problem is
> the code or the test result not keeping up with the code changes.
> > --- expect      2019-07-30 16:56:36 +0000
> > +++ actual      2019-07-30 16:56:36 +0000
> > @@ -1,6 +1,6 @@
> >  NULL
> >  NULL
> >  NULL
> > +7c7cd714e262561f73f3079dfca4e8724682ac21 3
> >  139b20d8e6c5b496de61f033f642d0e3dbff528d 2
> >  d79ce1670bdcb76e6d1da2ae095e890ccb326ae9 1
> > -7c7cd714e262561f73f3079dfca4e8724682ac21 3
> 
> This one is very curious. It's iterating a hash, which _seems_ like it would
> produce non-deterministic output. But neither this test nor the hashmap test
> it is based on sorts the output, and they pass consistently for me. I assume
> that's because while hash ordering is not guaranteed, it happens to be the
> same as long the pattern of inserts is the same (with our implementation,
> which does not do any hash randomization).
> 
> But I am scratching my head as to what could be different on your platform
> that would cause a different ordering (especially when the hashmap test this
> is based on doesn't get one!).
> 
> I guess in some sense it may not be worth tracking down, and we should just
> sort the output of a hash iteration unconditionally when comparing it to
> expected output.

Definitely a head scratcher. Is it possible that the bucket() function, which uses

	key->hash & (map->tablesize - 1);

might better use

	key->hash % (map->tablesize - 1);

I have not seen a bucket computation done this way before so that surprised me, not that it should make a difference on hash-determinism. The only thing that might is an uninitialized stack variable, which on this platform's C compiler will not initialize. Global statics are always 0 unless otherwise specified, but I'm not sure about stack-local (but there's nothing wrong I can see in hashmap.c on those points.

Sorting the output seems like a safe option, providing that the hash is itself demonstrably solid otherwise.

Cheers,
Randall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 19:49 ` Todd Zullinger
  2019-07-30 20:02   ` Jeff King
@ 2019-07-30 20:27   ` Randall S. Becker
  1 sibling, 0 replies; 24+ messages in thread
From: Randall S. Becker @ 2019-07-30 20:27 UTC (permalink / raw)
  To: 'Todd Zullinger'
  Cc: 'Junio C Hamano', 'Christian Couder',
	'SZEDER Gábor', 'Jeff King', git,
	git-packagers

On July 30, 2019 3:50 PM, Todd Zullinger wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: 'Junio C Hamano' <gitster@pobox.com>; Christian Couder
> <chriscool@tuxfamily.org>; SZEDER Gábor <szeder.dev@gmail.com>; Jeff
> King <peff@peff.net>; git@vger.kernel.org; git-
> packagers@googlegroups.com
> Subject: Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
> 
> Hi,
> 
> [added Christian, SZEDER, and Jeff to Cc as author and helpers on the
newly-
> added t0016-oidmap]
> 
> Randall S. Becker wrote:
> > A preview of the situation with testing 2.23.0.rc0 on NonStop is not
> > great. We have had some new failures right off the bat on our NonStop
> > platforms. This is a preview of what we find within the first bit of
> > testing. The tests run a long time, so more to come.
> >
> > t0016: oidmap
> >
> > Subtest 6 had an ordering issue. We do not know whether the problem is
> > the code or the test result not keeping up with the code changes.
> >
> > --- expect      2019-07-30 16:56:36 +0000
> > +++ actual      2019-07-30 16:56:36 +0000
> > @@ -1,6 +1,6 @@
> >  NULL
> >  NULL
> >  NULL
> > +7c7cd714e262561f73f3079dfca4e8724682ac21 3
> >  139b20d8e6c5b496de61f033f642d0e3dbff528d 2
> >  d79ce1670bdcb76e6d1da2ae095e890ccb326ae9 1
> > -7c7cd714e262561f73f3079dfca4e8724682ac21 3
> 
> I hit the same failure while building for Fedora on the s390x
architecture.  I
> have not dug into it much yet, but perhaps this is an endianess issue?

My platforms are BIGendian, so perhaps that's it. The bucket calculation
uses bit arithmetic so that could contribute.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 20:02   ` Jeff King
@ 2019-07-30 20:39     ` Junio C Hamano
  2019-07-30 20:56     ` SZEDER Gábor
  1 sibling, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2019-07-30 20:39 UTC (permalink / raw)
  To: Jeff King
  Cc: Todd Zullinger, René Scharfe, Randall S. Becker,
	Christian Couder, SZEDER Gábor, git, git-packagers

Jeff King <peff@peff.net> writes:

> @@ -116,19 +116,11 @@ unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len);
>   * Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
>   * for use in hash tables. Cryptographic hashes are supposed to have
>   * uniform distribution, so in contrast to `memhash()`, this just copies
> - * the first `sizeof(int)` bytes without shuffling any bits. Note that
> - * the results will be different on big-endian and little-endian
> - * platforms, so they should not be stored or transferred over the net.

;-)

> + * the first `sizeof(int)` bytes without shuffling any bits.
>   */
>  static inline unsigned int oidhash(const struct object_id *oid)
>  {
> -	/*
> -	 * Equivalent to 'return *(unsigned int *)oid->hash;', but safe on
> -	 * platforms that don't support unaligned reads.
> -	 */
> -	unsigned int hash;
> -	memcpy(&hash, oid->hash, sizeof(hash));
> -	return hash;
> +	return get_be32(oid->hash);
>  }
>  
>  /*
> diff --git a/t/t0016-oidmap.sh b/t/t0016-oidmap.sh
> index bbe719e950..6656db9d69 100755
> --- a/t/t0016-oidmap.sh
> +++ b/t/t0016-oidmap.sh
> @@ -93,9 +93,9 @@ put three 3
>  iterate" "NULL
>  NULL
>  NULL
> +$(git rev-parse three) 3
>  $(git rev-parse two) 2
> -$(git rev-parse one) 1
> -$(git rev-parse three) 3"
> +$(git rev-parse one) 1"
>  
>  '
>
> which not only fixes this test but any other hash-based oddities. I
> wonder if it's appreciably less efficient. I'll bet I could nerd-snipe
> René into doing a bunch of measurements and explorations of the
> disassembled code. ;)

I'd rather see us go in the direction of discouraging people from
relying on the hash order.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 20:02   ` Jeff King
  2019-07-30 20:39     ` Junio C Hamano
@ 2019-07-30 20:56     ` SZEDER Gábor
  2019-07-31  0:59       ` Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: SZEDER Gábor @ 2019-07-30 20:56 UTC (permalink / raw)
  To: Jeff King
  Cc: Todd Zullinger, René Scharfe, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

On Tue, Jul 30, 2019 at 04:02:03PM -0400, Jeff King wrote:
> On Tue, Jul 30, 2019 at 03:49:38PM -0400, Todd Zullinger wrote:
> 
> > > Subtest 6 had an ordering issue. We do not know whether
> > > the problem is the code or the test result not keeping up
> > > with the code changes.
> > >
> > > --- expect      2019-07-30 16:56:36 +0000
> > > +++ actual      2019-07-30 16:56:36 +0000
> > > @@ -1,6 +1,6 @@
> > >  NULL
> > >  NULL
> > >  NULL
> > > +7c7cd714e262561f73f3079dfca4e8724682ac21 3
> > >  139b20d8e6c5b496de61f033f642d0e3dbff528d 2
> > >  d79ce1670bdcb76e6d1da2ae095e890ccb326ae9 1
> > > -7c7cd714e262561f73f3079dfca4e8724682ac21 3
> > 
> > I hit the same failure while building for Fedora on the
> > s390x architecture.  I have not dug into it much yet, but
> > perhaps this is an endianess issue?
> 
> Ah, of course. Our oid hashing is done by just picking off the first
> bytes of the sha1, and it doesn't care about endianness (because these
> are just internal-to-memory hashes).

Yeah.

> We _could_ reconcile that like this:

Do we really want that, though?  It's a hashmap, after all, and the
order of iteration over various hashmap implementations tends to be
arbitrary.  So an argument could be made that this test is overly
specific by expecting a particular order of elements (and perhaps by
checking the elements' oid as well), and it would be sufficient to
check that it iterates over all elements, no matter the order (IOW
sorting 'actual' before the comparison).

OTOH, this is not just any hashmap, but an oidmap, and I could imagine
that there might be use cases where it would be beneficial if the
iteration order were to match the oid order (but don't know whether we
actually have such a use case).

> I
> wonder if it's appreciably less efficient. I'll bet I could nerd-snipe
> René into doing a bunch of measurements and explorations of the
> disassembled code. ;)

Maybe it shows up in an oidmap-specific performance test, but with all
that's usually going on in Git hashmap performance tends to be
negligible (e.g. it's rarely visible in flame graphs).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-30 20:56     ` SZEDER Gábor
@ 2019-07-31  0:59       ` Jeff King
  2019-07-31  1:23         ` Jeff King
  2019-07-31 16:57         ` Junio C Hamano
  0 siblings, 2 replies; 24+ messages in thread
From: Jeff King @ 2019-07-31  0:59 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Todd Zullinger, René Scharfe, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

On Tue, Jul 30, 2019 at 10:56:24PM +0200, SZEDER Gábor wrote:

> > Ah, of course. Our oid hashing is done by just picking off the first
> > bytes of the sha1, and it doesn't care about endianness (because these
> > are just internal-to-memory hashes).
> 
> Yeah.
> 
> > We _could_ reconcile that like this:
> 
> Do we really want that, though?  It's a hashmap, after all, and the
> order of iteration over various hashmap implementations tends to be
> arbitrary.  So an argument could be made that this test is overly
> specific by expecting a particular order of elements (and perhaps by
> checking the elements' oid as well), and it would be sufficient to
> check that it iterates over all elements, no matter the order (IOW
> sorting 'actual' before the comparison).

I'd agree that this test is being overly specific. I guess what I'm
feeling is a vague notion that it might be better if Git behaves
deterministically regardless of endian-ness. Not because it _should_
matter for this test, but there could literally be a bug on big-endian
platforms that nobody knows about because it's very rare for anybody to
test there.

I admit that's pretty hand-wavy though. And there may actually be a
benefit in finding such a bug, because it means that some part of the
code (or a test) is relying on something it ought not to.

> OTOH, this is not just any hashmap, but an oidmap, and I could imagine
> that there might be use cases where it would be beneficial if the
> iteration order were to match the oid order (but don't know whether we
> actually have such a use case).

I don't think we can promise anything about iteration order. This test
is relying on the order at least being deterministic between runs, but
if we added a new entry and had to grow the table, all bets are off.

So regardless of the endian thing above, it probably does make sense for
any hashmap iteration output to be sorted before comparing. That goes
for t0011, too; it doesn't have this endian thing, but it looks to be
relying on hash order that could change if we swapped out hash
functions.

> > I
> > wonder if it's appreciably less efficient. I'll bet I could nerd-snipe
> > René into doing a bunch of measurements and explorations of the
> > disassembled code. ;)
> 
> Maybe it shows up in an oidmap-specific performance test, but with all
> that's usually going on in Git hashmap performance tends to be
> negligible (e.g. it's rarely visible in flame graphs).

That's my guess, too, but data trumps guesses (you'll note that I'm not
volunteering to _collect_ the data, though, which perhaps gives a sense
of how invested I am in it. ;) ).

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  0:59       ` Jeff King
@ 2019-07-31  1:23         ` Jeff King
  2019-07-31  1:27           ` Jeff King
  2019-07-31  1:59           ` Todd Zullinger
  2019-07-31 16:57         ` Junio C Hamano
  1 sibling, 2 replies; 24+ messages in thread
From: Jeff King @ 2019-07-31  1:23 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Todd Zullinger, René Scharfe, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

On Tue, Jul 30, 2019 at 08:59:33PM -0400, Jeff King wrote:

> > OTOH, this is not just any hashmap, but an oidmap, and I could imagine
> > that there might be use cases where it would be beneficial if the
> > iteration order were to match the oid order (but don't know whether we
> > actually have such a use case).
> 
> I don't think we can promise anything about iteration order. This test
> is relying on the order at least being deterministic between runs, but
> if we added a new entry and had to grow the table, all bets are off.
> 
> So regardless of the endian thing above, it probably does make sense for
> any hashmap iteration output to be sorted before comparing. That goes
> for t0011, too; it doesn't have this endian thing, but it looks to be
> relying on hash order that could change if we swapped out hash
> functions.

So here's an actual patch.

-- >8 --
Subject: [PATCH] t: sort output of hashmap iteration

The iteration order of a hashmap is undefined, and may depend on things
like the exact set of items added, or the table has been grown or
shrunk. In the case of an oidmap, it even depends on endianness, because
we take the oid hash by casting sha1 bytes directly into an unsigned
int.

Let's sort the test-tool output from any hash iterators. In the case of
t0011, this is just future-proofing. But for t0016, it actually fixes a
reported failure on the big-endian s390 and nonstop ports.

I didn't bother to teach the helper functions to optionally sort output.
They are short enough that it's simpler to just repeat them inline for
the iteration tests than it is to add a --sort option.

Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t0011-hashmap.sh | 58 ++++++++++++++++++++++++++++------------------
 t/t0016-oidmap.sh  | 30 +++++++++++++++---------
 2 files changed, 55 insertions(+), 33 deletions(-)

diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
index 9c96b3e3b1..5343ffd3f9 100755
--- a/t/t0011-hashmap.sh
+++ b/t/t0011-hashmap.sh
@@ -170,31 +170,45 @@ NULL
 '
 
 test_expect_success 'iterate' '
-
-test_hashmap "put key1 value1
-put key2 value2
-put fooBarFrotz value3
-iterate" "NULL
-NULL
-NULL
-key2 value2
-key1 value1
-fooBarFrotz value3"
-
+	test-tool hashmap >actual.raw <<-\EOF &&
+	put key1 value1
+	put key2 value2
+	put fooBarFrotz value3
+	iterate
+	EOF
+
+	cat >expect <<-\EOF &&
+	NULL
+	NULL
+	NULL
+	fooBarFrotz value3
+	key1 value1
+	key2 value2
+	EOF
+
+	sort <actual.raw >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'iterate (case insensitive)' '
-
-test_hashmap "put key1 value1
-put key2 value2
-put fooBarFrotz value3
-iterate" "NULL
-NULL
-NULL
-fooBarFrotz value3
-key2 value2
-key1 value1" ignorecase
-
+	test-tool hashmap ignorecase >actual.raw <<-\EOF &&
+	put key1 value1
+	put key2 value2
+	put fooBarFrotz value3
+	iterate
+	EOF
+
+	cat >expect <<-\EOF &&
+	NULL
+	NULL
+	NULL
+	fooBarFrotz value3
+	key1 value1
+	key2 value2
+	EOF
+
+	sort <actual.raw >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'grow / shrink' '
diff --git a/t/t0016-oidmap.sh b/t/t0016-oidmap.sh
index bbe719e950..31f8276ba8 100755
--- a/t/t0016-oidmap.sh
+++ b/t/t0016-oidmap.sh
@@ -86,17 +86,25 @@ NULL"
 '
 
 test_expect_success 'iterate' '
-
-test_oidmap "put one 1
-put two 2
-put three 3
-iterate" "NULL
-NULL
-NULL
-$(git rev-parse two) 2
-$(git rev-parse one) 1
-$(git rev-parse three) 3"
-
+	test-tool oidmap >actual.raw <<-\EOF &&
+	put one 1
+	put two 2
+	put three 3
+	iterate
+	EOF
+
+	# sort "expect" too so we do not rely on the order of particular oids
+	sort >expect <<-EOF &&
+	NULL
+	NULL
+	NULL
+	$(git rev-parse one) 1
+	$(git rev-parse two) 2
+	$(git rev-parse three) 3
+	EOF
+
+	sort <actual.raw >actual &&
+	test_cmp expect actual
 '
 
 test_done
-- 
2.23.0.rc0.426.gbdee707ba7


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  1:23         ` Jeff King
@ 2019-07-31  1:27           ` Jeff King
  2019-07-31  1:59           ` Todd Zullinger
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff King @ 2019-07-31  1:27 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Todd Zullinger, René Scharfe, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

On Tue, Jul 30, 2019 at 09:23:36PM -0400, Jeff King wrote:

>  test_expect_success 'iterate (case insensitive)' '
> -
> -test_hashmap "put key1 value1
> -put key2 value2
> -put fooBarFrotz value3
> -iterate" "NULL
> -NULL
> -NULL
> -fooBarFrotz value3
> -key2 value2
> -key1 value1" ignorecase

By the way, I found it curious that this test is repeated both with and
without ignorecase, but in the case-insensitive we never have any
case-equivalent names!

I guess it is testing that we preserved the name of fooBarFrotz, but I
would think that this:

diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
index 5343ffd3f9..2807dff644 100755
--- a/t/t0011-hashmap.sh
+++ b/t/t0011-hashmap.sh
@@ -195,14 +195,15 @@ test_expect_success 'iterate (case insensitive)' '
 	put key1 value1
 	put key2 value2
 	put fooBarFrotz value3
+	put foobarfrotz value4
 	iterate
 	EOF

 	cat >expect <<-\EOF &&
 	NULL
 	NULL
 	NULL
-	fooBarFrotz value3
+	fooBarFrotz value4
 	key1 value1
 	key2 value2
 	EOF

on top of my other patch would be a good addition. However, it yields
this result:

--- expect	2019-07-31 01:26:07.284231899 +0000
+++ actual	2019-07-31 01:26:07.284231899 +0000
@@ -1,6 +1,7 @@
 NULL
 NULL
 NULL
-fooBarFrotz value4
+foobarfrotz value4
 key1 value1
 key2 value2
+value3

The first line is a matter of taste I think (whether to keep the
original case or replace it with the new key). But the empty-keyed
retention of value3 seems wrong. I'm not sure if this is a bug in the
actual hashmap code, or just a weirdness with the way test-tool prints
it.

I'm going offline for a bit, so if anybody feels like following up on
the mystery, be my guest. Otherwise I may take a look later tonight.

-Peff

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  1:23         ` Jeff King
  2019-07-31  1:27           ` Jeff King
@ 2019-07-31  1:59           ` Todd Zullinger
  2019-07-31  3:27             ` Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: Todd Zullinger @ 2019-07-31  1:59 UTC (permalink / raw)
  To: Jeff King
  Cc: SZEDER Gábor, René Scharfe, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

Jeff King wrote:
> On Tue, Jul 30, 2019 at 08:59:33PM -0400, Jeff King wrote:
> 
>>> OTOH, this is not just any hashmap, but an oidmap, and I could imagine
>>> that there might be use cases where it would be beneficial if the
>>> iteration order were to match the oid order (but don't know whether we
>>> actually have such a use case).
>> 
>> I don't think we can promise anything about iteration order. This test
>> is relying on the order at least being deterministic between runs, but
>> if we added a new entry and had to grow the table, all bets are off.
>> 
>> So regardless of the endian thing above, it probably does make sense for
>> any hashmap iteration output to be sorted before comparing. That goes
>> for t0011, too; it doesn't have this endian thing, but it looks to be
>> relying on hash order that could change if we swapped out hash
>> functions.
> 
> So here's an actual patch.

At the risk of showing my complete lack of knowledge about
these tests, I was wondering if the order mattered for the
other tests in t0011 and t0016.  I had assumed it didn't and
had something like this for testing (and a similar change to
test_oidmap() in t0016):

 diff --git i/t/t0011-hashmap.sh w/t/t0011-hashmap.sh
 index 9c96b3e3b1..9ed1c4f14d 100755
 --- i/t/t0011-hashmap.sh
 +++ w/t/t0011-hashmap.sh
 @@ -4,8 +4,8 @@ test_description='test hashmap and string hash functions'
  . ./test-lib.sh
  
  test_hashmap() {
 -	echo "$1" | test-tool hashmap $3 > actual &&
 -	echo "$2" > expect &&
 +	echo "$1" | test-tool hashmap $3 | sort >actual &&
 +	echo "$2" | sort >expect &&
  	test_cmp expect actual
  }

You've got a more comprehensive patch and a proper commit
message, so this is really just a matter of curiosity.

-- 
Todd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  1:59           ` Todd Zullinger
@ 2019-07-31  3:27             ` Jeff King
  2019-07-31  3:53               ` Jeff King
                                 ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Jeff King @ 2019-07-31  3:27 UTC (permalink / raw)
  To: Todd Zullinger
  Cc: SZEDER Gábor, René Scharfe, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

On Tue, Jul 30, 2019 at 09:59:17PM -0400, Todd Zullinger wrote:

> At the risk of showing my complete lack of knowledge about
> these tests, I was wondering if the order mattered for the
> other tests in t0011 and t0016.  I had assumed it didn't and
> had something like this for testing (and a similar change to
> test_oidmap() in t0016):
> 
>  diff --git i/t/t0011-hashmap.sh w/t/t0011-hashmap.sh
>  index 9c96b3e3b1..9ed1c4f14d 100755
>  --- i/t/t0011-hashmap.sh
>  +++ w/t/t0011-hashmap.sh
>  @@ -4,8 +4,8 @@ test_description='test hashmap and string hash functions'
>   . ./test-lib.sh
>   
>   test_hashmap() {
>  -	echo "$1" | test-tool hashmap $3 > actual &&
>  -	echo "$2" > expect &&
>  +	echo "$1" | test-tool hashmap $3 | sort >actual &&
>  +	echo "$2" | sort >expect &&
>   	test_cmp expect actual
>   }
> 
> You've got a more comprehensive patch and a proper commit
> message, so this is really just a matter of curiosity.

I think the order does matter for those ones. E.g., the ones that run
"get" want to make sure they're seeing the values in the same order in
which they were requested.

One thing that makes it all a bit funky is that the "put" lines also
output the old value (which is what all those NULLs) are. And I think
that solves my "value3" puzzlement from earlier. It is not part of the
iteration at all, but rather the result of the duplicate "put".

That would perhaps be clearer if the "hashmap" tool actually did the
sorting itself (so we'd sort _just_ the iteration, not the whole
output). Something like this, though I'm on the fence about whether it
is worth it:

diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index aaf17b0ddf..9f6901666e 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -2,6 +2,7 @@
 #include "git-compat-util.h"
 #include "hashmap.h"
 #include "strbuf.h"
+#include "string-list.h"
 
 struct test_entry
 {
@@ -221,10 +222,18 @@ int cmd__hashmap(int argc, const char **argv)
 
 		} else if (!strcmp("iterate", cmd)) {
 
+			struct string_list sorted = STRING_LIST_INIT_NODUP;
+			struct string_list_item *item;
 			struct hashmap_iter iter;
 			hashmap_iter_init(&map, &iter);
-			while ((entry = hashmap_iter_next(&iter)))
-				printf("%s %s\n", entry->key, get_value(entry));
+			while ((entry = hashmap_iter_next(&iter))) {
+				item = string_list_append(&sorted, entry->key);
+				item->util = (void *)get_value(entry);
+			}
+			string_list_sort(&sorted);
+			for_each_string_list_item(item, &sorted)
+				printf("%s %s\n", item->string, (const char *)item->util);
+			string_list_clear(&sorted, 0);
 
 		} else if (!strcmp("size", cmd)) {
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  3:27             ` Jeff King
@ 2019-07-31  3:53               ` Jeff King
  2019-07-31 17:17                 ` Junio C Hamano
  2019-07-31  4:06               ` [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop René Scharfe
  2019-07-31  6:04               ` Todd Zullinger
  2 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2019-07-31  3:53 UTC (permalink / raw)
  To: Todd Zullinger
  Cc: SZEDER Gábor, René Scharfe, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

On Tue, Jul 30, 2019 at 11:27:35PM -0400, Jeff King wrote:

> That would perhaps be clearer if the "hashmap" tool actually did the
> sorting itself (so we'd sort _just_ the iteration, not the whole
> output). Something like this, though I'm on the fence about whether it
> is worth it:
> [...]

And here it is for reference with the matching change in test-oidmap,
and the adjustments necessary for the test scripts (from master, not
from my earlier patch). I think I prefer the simpler "just sort it all"
version I posted with the commit message.

The post-image in t0016 may be a new low: a command substitution in a DQ
string fed to echo, itself inside a command substitution in a here-doc.
Yuck. :)

---
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index aaf17b0ddf..9f6901666e 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -2,6 +2,7 @@
 #include "git-compat-util.h"
 #include "hashmap.h"
 #include "strbuf.h"
+#include "string-list.h"
 
 struct test_entry
 {
@@ -221,10 +222,18 @@ int cmd__hashmap(int argc, const char **argv)
 
 		} else if (!strcmp("iterate", cmd)) {
 
+			struct string_list sorted = STRING_LIST_INIT_NODUP;
+			struct string_list_item *item;
 			struct hashmap_iter iter;
 			hashmap_iter_init(&map, &iter);
-			while ((entry = hashmap_iter_next(&iter)))
-				printf("%s %s\n", entry->key, get_value(entry));
+			while ((entry = hashmap_iter_next(&iter))) {
+				item = string_list_append(&sorted, entry->key);
+				item->util = (void *)get_value(entry);
+			}
+			string_list_sort(&sorted);
+			for_each_string_list_item(item, &sorted)
+				printf("%s %s\n", item->string, (const char *)item->util);
+			string_list_clear(&sorted, 0);
 
 		} else if (!strcmp("size", cmd)) {
 
diff --git a/t/helper/test-oidmap.c b/t/helper/test-oidmap.c
index 0acf99931e..5fd059fe90 100644
--- a/t/helper/test-oidmap.c
+++ b/t/helper/test-oidmap.c
@@ -94,10 +94,19 @@ int cmd__oidmap(int argc, const char **argv)
 
 		} else if (!strcmp("iterate", cmd)) {
 
+			struct string_list sorted = STRING_LIST_INIT_DUP;
+			struct string_list_item *item;
 			struct oidmap_iter iter;
 			oidmap_iter_init(&map, &iter);
-			while ((entry = oidmap_iter_next(&iter)))
-				printf("%s %s\n", oid_to_hex(&entry->entry.oid), entry->name);
+			while ((entry = oidmap_iter_next(&iter))) {
+				item = string_list_append(&sorted,
+							  oid_to_hex(&entry->entry.oid));
+				item->util = entry->name;
+			}
+			string_list_sort(&sorted);
+			for_each_string_list_item(item, &sorted)
+				printf("%s %s\n", item->string, (const char *)item->util);
+			string_list_clear(&sorted, 0);
 
 		} else {
 
diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
index 9c96b3e3b1..1adcd7762d 100755
--- a/t/t0011-hashmap.sh
+++ b/t/t0011-hashmap.sh
@@ -177,9 +177,9 @@ put fooBarFrotz value3
 iterate" "NULL
 NULL
 NULL
-key2 value2
+fooBarFrotz value3
 key1 value1
-fooBarFrotz value3"
+key2 value2"
 
 '
 
@@ -192,8 +192,8 @@ iterate" "NULL
 NULL
 NULL
 fooBarFrotz value3
-key2 value2
-key1 value1" ignorecase
+key1 value1
+key2 value2" ignorecase
 
 '
 
diff --git a/t/t0016-oidmap.sh b/t/t0016-oidmap.sh
index bbe719e950..abeaa64159 100755
--- a/t/t0016-oidmap.sh
+++ b/t/t0016-oidmap.sh
@@ -93,9 +93,14 @@ put three 3
 iterate" "NULL
 NULL
 NULL
-$(git rev-parse two) 2
-$(git rev-parse one) 1
-$(git rev-parse three) 3"
+$(
+	# sort to avoid relying on exact oids
+	{
+		echo "$(git rev-parse one) 1"
+		echo "$(git rev-parse two) 2"
+		echo "$(git rev-parse three) 3"
+	} | sort
+)"
 
 '
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  3:27             ` Jeff King
  2019-07-31  3:53               ` Jeff King
@ 2019-07-31  4:06               ` René Scharfe
  2019-07-31  4:30                 ` Jeff King
  2019-07-31  6:04               ` Todd Zullinger
  2 siblings, 1 reply; 24+ messages in thread
From: René Scharfe @ 2019-07-31  4:06 UTC (permalink / raw)
  To: Jeff King, Todd Zullinger
  Cc: SZEDER Gábor, Randall S. Becker, 'Junio C Hamano',
	Christian Couder, git, git-packagers

Am 31.07.19 um 05:27 schrieb Jeff King:
> One thing that makes it all a bit funky is that the "put" lines also
> output the old value (which is what all those NULLs) are. And I think
> that solves my "value3" puzzlement from earlier. It is not part of the
> iteration at all, but rather the result of the duplicate "put".
>
> That would perhaps be clearer if the "hashmap" tool actually did the
> sorting itself (so we'd sort _just_ the iteration, not the whole
> output). Something like this, though I'm on the fence about whether it
> is worth it:

We already have a few other tests that sort and compare.  Perhaps it's
time for a test_cmp_ignore_order?

And perhaps something like this might even be worth implementing as a
diff option?  https://github.com/l0b0/diff-ignore-moved-lines has
post-processing script for that..

René

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  4:06               ` [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop René Scharfe
@ 2019-07-31  4:30                 ` Jeff King
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2019-07-31  4:30 UTC (permalink / raw)
  To: René Scharfe
  Cc: Todd Zullinger, SZEDER Gábor, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

On Wed, Jul 31, 2019 at 06:06:02AM +0200, René Scharfe wrote:

> Am 31.07.19 um 05:27 schrieb Jeff King:
> > One thing that makes it all a bit funky is that the "put" lines also
> > output the old value (which is what all those NULLs) are. And I think
> > that solves my "value3" puzzlement from earlier. It is not part of the
> > iteration at all, but rather the result of the duplicate "put".
> >
> > That would perhaps be clearer if the "hashmap" tool actually did the
> > sorting itself (so we'd sort _just_ the iteration, not the whole
> > output). Something like this, though I'm on the fence about whether it
> > is worth it:
> 
> We already have a few other tests that sort and compare.  Perhaps it's
> time for a test_cmp_ignore_order?

That would be OK with me, but I think it would just replace the calls to
"sort". To do the "just sort the iteration, but not the rest of the
output that I'm talking about above", the change _has_ to go into
test-hashmap.c (but I think I still favor just sorting the whole thing
for simplicity).

> And perhaps something like this might even be worth implementing as a
> diff option?  https://github.com/l0b0/diff-ignore-moved-lines has
> post-processing script for that..

Given that we don't actually care that much about looking at the
resulting diff (and so we don't care about preserving order), I think
just sorting both sides gives equivalent results and is much simpler.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  3:27             ` Jeff King
  2019-07-31  3:53               ` Jeff King
  2019-07-31  4:06               ` [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop René Scharfe
@ 2019-07-31  6:04               ` Todd Zullinger
  2 siblings, 0 replies; 24+ messages in thread
From: Todd Zullinger @ 2019-07-31  6:04 UTC (permalink / raw)
  To: Jeff King
  Cc: SZEDER Gábor, René Scharfe, Randall S. Becker,
	'Junio C Hamano', Christian Couder, git, git-packagers

Jeff King wrote:
> On Tue, Jul 30, 2019 at 09:59:17PM -0400, Todd Zullinger wrote:
>> At the risk of showing my complete lack of knowledge about
>> these tests, I was wondering if the order mattered for the
>> other tests in t0011 and t0016.
[...]
>> You've got a more comprehensive patch and a proper commit
>> message, so this is really just a matter of curiosity.
> 
> I think the order does matter for those ones. E.g., the ones that run
> "get" want to make sure they're seeing the values in the same order in
> which they were requested.

Ahh, thanks for clarifying that (and saving me from sending
the version I had which would have incorrectly sorted all
the test_{hashmap,oidmap} output. :)

FWIW, I applied your patch for sorting hashmap iterations
(<20190731012336.GA13880@sigill.intra.peff.net>¹) and ran it
through the Fedora build system.  All architectures passed,
as expected.

¹ https://public-inbox.org/git/20190731012336.GA13880@sigill.intra.peff.net/

-- 
Todd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  0:59       ` Jeff King
  2019-07-31  1:23         ` Jeff King
@ 2019-07-31 16:57         ` Junio C Hamano
  1 sibling, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2019-07-31 16:57 UTC (permalink / raw)
  To: Jeff King
  Cc: SZEDER Gábor, Todd Zullinger, René Scharfe,
	Randall S. Becker, Christian Couder, git, git-packagers

Jeff King <peff@peff.net> writes:

> So regardless of the endian thing above, it probably does make sense for
> any hashmap iteration output to be sorted before comparing. That goes
> for t0011, too; it doesn't have this endian thing, but it looks to be
> relying on hash order that could change if we swapped out hash
> functions.

Sounds sensible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop
  2019-07-31  3:53               ` Jeff King
@ 2019-07-31 17:17                 ` Junio C Hamano
  2019-07-31 21:22                   ` non-cryptographic hash algorithms in git Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2019-07-31 17:17 UTC (permalink / raw)
  To: Jeff King
  Cc: Todd Zullinger, SZEDER Gábor, René Scharfe,
	Randall S. Becker, Christian Couder, git, git-packagers

Jeff King <peff@peff.net> writes:

> And here it is for reference with the matching change in test-oidmap,
> and the adjustments necessary for the test scripts (from master, not
> from my earlier patch). I think I prefer the simpler "just sort it all"
> version I posted with the commit message.

Yeah, let's go with that version.

I am wondering if we should follow suit to certain language's hash
implementation to make sure the iteration order is unpredictable to
catch bad scripts ;-)  Perhaps that is not worth it, either.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* non-cryptographic hash algorithms in git
  2019-07-31 17:17                 ` Junio C Hamano
@ 2019-07-31 21:22                   ` Jeff King
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2019-07-31 21:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[breaking the thread since we're way off the original topic, and this
 subject might get more interesting attention]

On Wed, Jul 31, 2019 at 10:17:01AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > And here it is for reference with the matching change in test-oidmap,
> > and the adjustments necessary for the test scripts (from master, not
> > from my earlier patch). I think I prefer the simpler "just sort it all"
> > version I posted with the commit message.
> 
> Yeah, let's go with that version.
> 
> I am wondering if we should follow suit to certain language's hash
> implementation to make sure the iteration order is unpredictable to
> catch bad scripts ;-)  Perhaps that is not worth it, either.

That would be a nice side effect, but the real benefit is that it makes
hash-collision denial-of-service attacks harder. I experimented with
this some when I looked at swapping out the xdiff hash algorithm[1] for
murmur, siphash, or similar, but I could never get the performance quite
on par with what we have now.

I haven't pursued randomization that much because git's diff engine is
a ready-made DoS machine in the first place. If you're diffing untrusted
input, you have to be ready to cut it off after using too much CPU and
say "nope, this one is too big". That may be less true for our general
purpose hashmap, though.

-Peff

[1] I didn't dig up the emails, but this was several years ago, when we
    realized that XDL_FAST_HASH didn't actually work well. I recently
    found out about xxhash:

      https://cyan4973.github.io/xxHash/

    which looks promising. But I haven't gotten around to plugging it in
    and timing the result. Anybody else is welcome to beat me to it. :)

    IIRC, one really tricky thing about our diff code is that it finds
    the newline for each line while it's hashing. That makes it hard to
    plug in an existing hash implementation without going over each line
    twice, and many of them perform poorly if you hand them a byte at a
    time.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-07-31 21:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 17:08 [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop Randall S. Becker
2019-07-30 17:31 ` Junio C Hamano
2019-07-30 18:09   ` Matheus Tavares Bernardino
2019-07-30 18:10   ` Randall S. Becker
2019-07-30 18:35     ` Junio C Hamano
2019-07-30 19:45 ` Jeff King
2019-07-30 20:25   ` Randall S. Becker
2019-07-30 19:49 ` Todd Zullinger
2019-07-30 20:02   ` Jeff King
2019-07-30 20:39     ` Junio C Hamano
2019-07-30 20:56     ` SZEDER Gábor
2019-07-31  0:59       ` Jeff King
2019-07-31  1:23         ` Jeff King
2019-07-31  1:27           ` Jeff King
2019-07-31  1:59           ` Todd Zullinger
2019-07-31  3:27             ` Jeff King
2019-07-31  3:53               ` Jeff King
2019-07-31 17:17                 ` Junio C Hamano
2019-07-31 21:22                   ` non-cryptographic hash algorithms in git Jeff King
2019-07-31  4:06               ` [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop René Scharfe
2019-07-31  4:30                 ` Jeff King
2019-07-31  6:04               ` Todd Zullinger
2019-07-31 16:57         ` Junio C Hamano
2019-07-30 20:27   ` Randall S. Becker

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).