From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 989151F670 for ; Fri, 15 Oct 2021 17:56:43 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 535023858C60 for ; Fri, 15 Oct 2021 17:56:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 535023858C60 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1634320601; bh=q1LfGg/0USMjd/JTYUDOuR/ABGqxXktH/sYu17wxwjs=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=JanwDLPxMvrDUzC5930KehbNmjqEPnxC5lky05696do5NECNITvKBQ2QZfnIr4uHA DP/NBJREYtyiYcBMj6WRXj7lO1rNZy3tNFSgUO0ozoDP8xwYKl3miNkr7NKLiqaZkf cyZbaOfwlbCZU6Tdg9NnsK7A/uDSPYMPBmYdvFJk= Received: from mail-ua1-x92d.google.com (mail-ua1-x92d.google.com [IPv6:2607:f8b0:4864:20::92d]) by sourceware.org (Postfix) with ESMTPS id C0AA13858C60 for ; Fri, 15 Oct 2021 17:56:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C0AA13858C60 Received: by mail-ua1-x92d.google.com with SMTP id h4so19689475uaw.1 for ; Fri, 15 Oct 2021 10:56:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=q1LfGg/0USMjd/JTYUDOuR/ABGqxXktH/sYu17wxwjs=; b=jjHKvupkvRFrm7fcf12wOFfonGEoydsUD3DFlVSrgCxim2GOWLNh5veYZMqVrEj4kw l0aQzq5gIJyrjTVmjt8eiQrN/vTepcdZGiTkeXGqgaUjVRPcAXafBL3VL2+xC2ZI0efI TwBS/MuSXT5KoZlBuzN6uhZgJz8z1QPnaw0i9S31WPU5erWjgcAtE3ZRcI+YQF6Aj0xl 4/lg4o51b3srkEge7xGHFDiRXJU6ExU2WFx/5931l4fJJOCQDW1bT50gp9MrJ3XqOxqa ORNoCfRkhtlkOM37XAuLmnFfakusjyvKvSg2OA2KCsMLMDS7piMFmy2Npz7eBYCWv4Xv LAFQ== X-Gm-Message-State: AOAM530a8HFXDY/AeJd001ysYcPoMBBwcWmOisPhJcPfSUuXMaJkZq1V zAF1ZgUBzqM924M4acGl8PTXE3XubFyMYw== X-Google-Smtp-Source: ABdhPJyK0RvkUGk79eq1/Rj9nCh0cZ2+yZSJXKhaUPrwQeKVPhfNvT+ZReB2OoimlJeW8/2/Ty3r4Q== X-Received: by 2002:a9f:3399:: with SMTP id p25mr14635614uab.24.1634320580955; Fri, 15 Oct 2021 10:56:20 -0700 (PDT) Received: from ?IPv6:2804:431:c7ca:c6c7:f05e:9652:ab99:7fa2? ([2804:431:c7ca:c6c7:f05e:9652:ab99:7fa2]) by smtp.gmail.com with ESMTPSA id d15sm4094880vsj.16.2021.10.15.10.56.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 15 Oct 2021 10:56:20 -0700 (PDT) Subject: Re: [PATCH v3 3/7] stdlib: Optimization qsort{_r} swap implementation (BZ #19305) To: Noah Goldstein References: <20210903171144.952737-1-adhemerval.zanella@linaro.org> <20210903171144.952737-4-adhemerval.zanella@linaro.org> <788ad382-68eb-cf90-0bf7-681ed54177da@linaro.org> Message-ID: Date: Fri, 15 Oct 2021 14:56:19 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Adhemerval Zanella via Libc-alpha Reply-To: Adhemerval Zanella Cc: GNU C Library Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On 15/10/2021 14:45, Noah Goldstein wrote: > swap is in the inner loop. Seems like a pretty critical component to have fully > optimized. The aarch64 version looks good, but the x86_64 version seems > to be lacking. Not arguing for an arch specific version, but if the directives > can add value to x86_64 without detracting from aarch64 seems like a zero > cost improvement. > > Since size is non-constant for the tail I don't see how we are going > avoid 3x memcpy calls. Although that can be another patch if it > gets values. > >> >> >> [1] https://godbolt.org/z/v7e4xxqGa Maybe use a byte copy in the tail to avoid memcpy [1], another option might to tune SWAP_GENERIC_SIZE to make the tail less costly (16 should be ok for most architecture, although some might be better with large values). [1] https://godbolt.org/z/G76dcej16