From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 9F60F1F4B4 for ; Wed, 7 Apr 2021 22:24:49 +0000 (UTC) Received: from localhost ([::1]:39108 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lUGbQ-0006DN-EO for normalperson@yhbt.net; Wed, 07 Apr 2021 18:24:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56420) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lUGbN-0006DG-Ey for bug-gnulib@gnu.org; Wed, 07 Apr 2021 18:24:45 -0400 Received: from vmicros1.altlinux.org ([194.107.17.57]:45506) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lUGbK-0004BG-OC for bug-gnulib@gnu.org; Wed, 07 Apr 2021 18:24:45 -0400 Received: from mua.local.altlinux.org (mua.local.altlinux.org [192.168.1.14]) by vmicros1.altlinux.org (Postfix) with ESMTP id 78DA972C8B1; Thu, 8 Apr 2021 01:24:40 +0300 (MSK) Received: by mua.local.altlinux.org (Postfix, from userid 508) id 654307CC69B; Thu, 8 Apr 2021 01:24:40 +0300 (MSK) Date: Thu, 8 Apr 2021 01:24:40 +0300 From: "Dmitry V. Levin" To: Bruno Haible Subject: Re: [PATCH] tests: fix test-execute with GNU make jobserver Message-ID: <20210407222440.GA1491@altlinux.org> References: <20210407021550.GA19950@altlinux.org> <2966714.KBtFPX2tBE@omega> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2966714.KBtFPX2tBE@omega> Received-SPF: pass client-ip=194.107.17.57; envelope-from=ldv@altlinux.org; helo=vmicros1.altlinux.org X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" Hi Bruno, On Wed, Apr 07, 2021 at 11:30:47PM +0200, Bruno Haible wrote: > Hi Dmitry, > > Thanks for the proposed patch. > > > On POSIX systems the GNU make jobserver is implemented as a pipe, > > and these two unexpected descriptors make test-execute-child fail. > > This problem description is a bit technical. It took me a bit of work > to translate your description into a "how to reproduce". Namely, > > $ make -j 8 check Just "make -j2 check", in fact, any parallelism would do, but it's not a race, it's not due to any parallelized execution, it's a side effect of the GNU make jobserver being enabled. > produces this output: > > Test case 14: 0 1 2 3 4 10 > test-execute-main.c:295: assertion 'ret == 0' failed > Aborted (core dumped) > test-execute.sh: test case 14 failed > Test case 15: 0 1 2 3 4 10 > test-execute-main.c:305: assertion 'ret == 0' failed > Aborted (core dumped) > test-execute.sh: test case 15 failed > Test case 16: 0 1 2 3 4 > test-execute-main.c:315: assertion 'ret == 0' failed > Aborted (core dumped) > test-execute.sh: test case 16 failed > > Now, there are several questions: > > 1) Why is GNU make passing these descriptors 3 and 4 to the child > process? The Makefile rule looks like this: > > test-execute.sh.log: test-execute.sh > @p='test-execute.sh'; \ > b='test-execute.sh'; \ > $(am__check_pre) $(LOG_DRIVER) --test-name "$$f" \ > --log-file $$b.log --trs-file $$b.trs \ > $(am__common_driver_flags) $(AM_LOG_DRIVER_FLAGS) $(LOG_DRIVER_FLAGS) -- $(LOG_COMPILE) \ > "$$tst" $(AM_TESTS_FD_REDIRECT) > > and it does not start with '+'. (Cf. [1].) > > Oh, I see: $(am__check_pre) expands to something that contains > $(TESTS_ENVIRONMENT), and $(TESTS_ENVIRONMENT) contains > MAKE='$(MAKE)' > and that occurrence of $(MAKE) makes 'make' think that the rule may > execute a sub-make. > > 2) You suggest to deal with the file descriptors 3..19. Is this reliable? Yes, it is reliable as it mirrors for (fd = 0; fd < 20; fd++) piece of code in tests/test-execute-child.c file. > The MAKEFLAGS variable in the environment is defined as > MAKEFLAGS="w -j8 --jobserver-auth=3,4 -- ..." > So we could parse this string and look for the --jobserver-auth > option. > > Oh, wait: this option was named differently in older GNU make versions.[2] Since tests/test-execute-child.c explicitly checks for any unexpected descriptors, I think the most appropriate thing to do is to make sure that no unexpected descriptors are inherited by the child. The jobserver was the thing that triggered the test error, but there could be other external sources of inherited descriptors, so the test should rather be robust and make sure these external sources do not affect the test. > 3) You suggest to set the cloexec bit on these file descriptors, so that > test-execute-child will have them closed. However, it is simpler to just > close() them in test-execute-parent already, since test-execute-parent > does not need these file descriptors either. Yes, this is another option, in fact, it's more reliable on GNU/Linux: The test currently relies on the fact that descriptor #10 is not inherited, so if this is not the case, the test will fail. If, however, all descriptors in the range are closed, the test succeeds. I'd choose a simple close() rather than a bit more complex cloexec if tests/test-execute-child.c did not contain this uncommented hunk: for (fd = 0; fd < 20; fd++) #ifdef __NetBSD__ if (fd != 3) #endif If for some reason __NetBSD__ is allowed to have descriptor #3 inherited, then it shouldn't be closed in tests/test-execute-main.c either. Bruno, while we are here, could you shed some light on this __NetBSD__ exception, ideally as a comment in tests/test-execute-child.c, please? Maybe the reason is not strong enough to avoid closing this descriptor at all costs. > 4) How about other environments? I saw 'test-execute.sh' also fail in > some recent Linux distro with KDE desktop. Apparently 'plasma' is passing > one or two open file descriptors to all children (that is, to all processes > running in this desktop environment). Do you think this is worth reporting? If it isn't documented, then I'd say it's a bug, please report it. > 5) Will glibc at some point export the close_range() function [3]? Yes, glibc will provide close_range and closefrom functions sooner or later, the patches are being reviewed: https://sourceware.org/pipermail/libc-alpha/2021-March/123532.html https://sourceware.org/pipermail/libc-alpha/2021-March/123535.html If the review goes smoothly, they are likely to be available in glibc 2.34. > If so, Gnulib should probably provide a substitute close_range(), and > test-execute-main.c should better call this function, rather than rolling > its own loop. Yes, I think it makes sense to provide this API in gnulib, although it won't be able to address all issues the syscall is intended to address. I don't think, however, that the test-execute fix should wait for this to happen. -- ldv