From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-5.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 75D0F1F670 for ; Tue, 19 Oct 2021 14:13:33 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 36ECD385841C for ; Tue, 19 Oct 2021 14:13:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 36ECD385841C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1634652812; bh=qrm7SpAtUVIYHC5RIgG/CNO9IiJJN7hJgV+HJ0a8ejk=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=uL8z2L0YCXSCAUlrWG/IWkqW39bLMwT82+LUHpnz1I5xTJtn7Hb15ZYjBnpAIXO8L p/4mTJiqq8AomHeYzIAdCWQF6XNz34dXFQ4LjIOVJx2h+8NgWa5jHzif6BNHJyh3oG WFwQWPGfjLA/zd2diuz22vibOmpozccUytLXC7MM= Received: from mail-ua1-x933.google.com (mail-ua1-x933.google.com [IPv6:2607:f8b0:4864:20::933]) by sourceware.org (Postfix) with ESMTPS id 46781385800E for ; Tue, 19 Oct 2021 14:12:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 46781385800E Received: by mail-ua1-x933.google.com with SMTP id a17so37602uax.12 for ; Tue, 19 Oct 2021 07:12:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=qrm7SpAtUVIYHC5RIgG/CNO9IiJJN7hJgV+HJ0a8ejk=; b=f46FvYYdvI07KBwQy2Jgv5klq/4QsQ3ybMnUVe1Ad+0H3vyjIjzMn2lycNr2XeqO2i 38JVRDHf2lpqqAsQRethqELthkbHg0cVedffpDQEgIYw2FCtxABH+fkmI00xk7xLC9B7 lOBrcK+IQqT1omry9WxrdpsSuM1OxveGcUUXJKLgXg3k+CS3gmuL31fkVb8bQdVQ5wRY NzcR+m7O2OH0btgxryZ0rSZCAK/dmn8i6cDnokOJ/YI1ApEJo/9KQGIejQTMPaJuAdgj MRJPoBy0R4yitTiXl4XZU+U8B8wjvrYurqABI7xO/zThfQsUHSGzQkacMa6lICQG4PpP jDoQ== X-Gm-Message-State: AOAM531xZeLRqoXloAfbR8rrnzjAUAiqiNOPHAA/bL6rnq/V7hKSioZ4 LZlfQzKxZSULnlhLSdA1l/Vydg== X-Google-Smtp-Source: ABdhPJx0UjrOLrjoQAQYt6Oa34hJFEBzxntytKi5FhIUXpRuQpOWLR/1YLFh8Jm17UQTggSKNMEIDA== X-Received: by 2002:a9f:2aca:: with SMTP id d10mr189847uaj.26.1634652767737; Tue, 19 Oct 2021 07:12:47 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:2654:a356:8d7d:aee2:9edb? ([2804:431:c7ca:2654:a356:8d7d:aee2:9edb]) by smtp.gmail.com with ESMTPSA id m48sm11242249vkf.50.2021.10.19.07.12.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 19 Oct 2021 07:12:47 -0700 (PDT) Message-ID: Date: Tue, 19 Oct 2021 11:12:45 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.2 Subject: Re: [PATCH v7 2/2] BZ #17645, fix slow DSO sorting behavior in dynamic loader -- Algorithm changes Content-Language: en-US To: Chung-Lin Tang , GNU C Library , Florian Weimer References: <30aa9fbd-fc08-9a6f-d34b-accb47abd8af@codesourcery.com> <334c6848-e805-4666-add5-e611f1d7c255@linaro.org> <4fefc9b5-64fa-1384-075c-4b3363d95941@codesourcery.com> <83afca9e-8938-3093-c968-8769776094e9@codesourcery.com> In-Reply-To: <83afca9e-8938-3093-c968-8769776094e9@codesourcery.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Adhemerval Zanella via Libc-alpha Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On 15/10/2021 12:22, Chung-Lin Tang wrote: > Hi Adhemerval, > > On 2021/10/6 1:15 AM, Adhemerval Zanella wrote: >>> I'm not sure if defining 'enum dso_sort_algorithm' is really needed? >>> Is __builtin_unreachable() that undesirable? >> The tunables code should already sanitize the input and returns the >> possible algorithms as a type, instead of a number.  I think it is >> clear to work with predefined type, than hint the compiler explicitly. > > Okay, understood. > >>> Also, the final code should probably add __glibc_likely() to the default >>> algorithm case. >> Sometime I think we abuse the __glibc_{un}likely, specially on short code >> like that.  But please reinstate it if you see that it does improve code >> generation. > > I have re-done some microbenchmarking using hp-timing facilities, and it appears > that __glibc_likely() helps a "little" bit overall (slightly more consistent > on powerpc64le, a little less on my own x86_64 desktop), at worst no apparent > improvement. So I've added it back. > > The attached patch is basically the version on the azanella/dso-opt branch, plus > the above __glibc_likely() addition, plus a little bit of editing of comments. > Rebased and re-tested, is this okay for master? Thanks, the patch does look ok and I have tested on some architectures (aarch64, armhf, powerpc64le, powerpc64, powerpc, and sparc64). I only saw a failure on powerpc64le, however it is unrelated to this patch: elf/tst-dso-ordering9.test-result sbrk() failure while processing tunables sbrk() failure while processing tunables This is an issue with the tunables framework which does not have a fallback mechanism to allocate memory if sbrk() fails (I recall someone already reported this on some mips environment). I will try to check this. I just note we still need three things for this patch: 1. A proper patch description. 2. The documentation for the new tunable. 3. A NEWS entry. For 1. you can use the original message on first RFC. Below I adjusted it with some changes over the version, so you can use it if you think it is ok (below). For 2. I think a brief description of both algorithm the shortcomings the new one tries to overcome should be suffice. Something like (please fell free to improve it if you think I am missing something): diff --git a/manual/tunables.texi b/manual/tunables.texi index 658547c613..1bcf5f7c5e 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -309,6 +309,16 @@ changed once allocated at process startup. The default allocation of optional static TLS is 512 bytes and is allocated in every thread. @end deftp +@deftp Tunable glibc.rtld.dynamic_sort +Sets the algorithm to use on DSO sorting. The value of @samp{1} sorts +according to dependencies of the program and the cycle detection might +cause performance issues since the algorithm is O(n^3). The value of +@samp{2} uses a different algorithm that implements a topological sort +by depth-first search to obtain the Reverse-Post Order sequence of the +DSO depedencies. It does not show the performance issues of @samp{1} +algorithm. + +The default value of this tunable is @samp{1}. @node Elision Tunables @section Elision Tunables For 3. we can describe briefly the defaults and the new algorithm (again please fell free to improve it if you think I am missing something): diff --git a/NEWS b/NEWS index 220d327071..795a686b3e 100644 --- a/NEWS +++ b/NEWS @@ -51,6 +51,11 @@ Major new features: * The ISO C2X macro _PRINTF_NAN_LEN_MAX has been added to . +* A new tunable, glibc.rtld.dynamic_sort, can be used to select diferent + DSO sorting algorithms. The default one '1' might show performance + issues when the chain of dependencies has cycles, and the new algorithm + '2' fixes by using topological sorting. + Deprecated and removed features, and other changes affecting compatibility: * The r_version update in the debugger interface makes the glibc binary So please send a v8 with changes so I can finally push it to master. This is mainly to avoid maintainers to 'tune' the patch before apply, ideally we apply the patch as-is from patchwork. I hope that we can get the new algorithm as default for 2.35. --- This part is the actual code changes. While the past attempts appeared to be either (1) more sophisticated graph algorithms, with attempts to add Tarjan SCC, or (2) modifying of heuristics to augment the old algorithm to behave more reasonably, here I have basically adhered to the KISS principle. The main algorithm here is simply depth-first search (DFS) to obtain the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1 bitfield is added to struct link_map to more elegantly facilitate such a search. I also have experimented with doing an iterative version, but it is obviously harder to understand, and actually slower when measured by hp-timing.h facilities. I have chosen to use simple recursive DFS, for clarity and performance (Some measures were however taken to curb recursion depth) The DFS algorithm is applied to the input maps[nmap-1] backwards towards maps[0]. This has the effect of a more "shallow" recursion depth in general since the input is in BFS. Also, when combined with the natural order of processing l_initfini[] at each node, this creates a resulting output sorting closer to the intuitive "left-to-right" order in most cases. Per-the discussion in #15311 about relocation dependencies overriding link dependencies, similar to comments #7,#9 there, a second pass of link-dependency-only sorting has been added in that case. Detection of existence of reldeps is done during the first DFS traversal pass, to cull away unneeded cases of this 2nd sorting pass. This also allows discarding of the simple limited cycle detection (i.e. X has reldep on Y, Y links to X) in the current algorithm. A testcase expressing the #15311 case has also been added. On the further general issue of circular dependencies causing SCCs across shared objects, the ELF specification explicitly states that behavior in this case is undefined, although I have found at least one reference describing Solaris' behavior here as basically retaining the received original ordering of those objects [1]. While quite well defined, I'm a little unsure this is the reasonable behavior, as this seems to imply a single circular dependency link will nullify correct topological dependency behavior for the majority of nodes within that SCC. The Tarjan SCC algorthm has been mentioned multiple times in these related BZ issues. i It could be said that the Tarjan algorithm is a generalization of post-order DFS traversal; some might say that's a understatement, but the phases of the node visiting and processing really look analogous. It would be possible to extend and implement it mostly within the confines of the code of my patch, but considering the undefined status in the spec, arguably some ambiguities of proper reasonable behavior, and the much more bookkeeping in a piece of code that will be repeatedly executed an incredible number of times across all systems, of which only applies to quite rare cases, I have refrained from adding that kind of processing in this patch, though such issues may be revisited later. Other two notable implementation adjustments related to this _dl_sort_maps() change are: (1) The additional pass of sorting in dl_open_worker() right before relocation has been removed. _dl_map_object_deps() already does a pass of sorting, and simply collecting objects by that order is adequate. Sorting again in that place before relocation appears to be redundant. (2) The use of two char arrays 'used' and 'done' in _dl_close_worker to represent two per-map attributes has been changed to simply use the two bits in the 'l_reserved' field in struct link_map to implement. This also allows discarding the clunky 'used' array sorting that _dl_sort_maps had to (sometimes) do along the way. Checked on x86_64, i686, powerpc64le, powerpc64, powerpc, aarch64, and armhf. [1] https://docs.oracle.com/cd/E19957-01/806-0641/6j9vuquip/index.html (section "Initialization and Termination Routines")