From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id E7D0620248 for ; Sat, 23 Mar 2019 15:00:45 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; q=dns; s=default; b=VC2R SubP8CHKRnqg5b1D1tqzglZj0IQF1pX2ggi7fmWhA8I9rWLpMGEkFY8neajiWQLY 1oHmTl3IuhnLj6+tTosyNHRCUHpr9nsFLfYOYOKEN9oqcos+6XkHU412pXyCvFc8 m/MPbmrvAde9hGeAPKS90w0FZIu54skMMVVHldQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; s=default; bh=lwWU5DjBD7 HVJu5AGIWqQexDerQ=; b=d0YFSAJcR3R3tu20v+YD8UTy6IvOyzDJhEsGbixf2+ sYjr7mchRhVo9jLw5QJv6+fPENqzq9SYMjSQJ77UtbqqakgcWmOObWA+H2j0vLp1 r6FuOwybb7NNQTcjDd3EVM6Zq6NE4MWPRrOcXK5anFZJ8NEv8ZVAMh71v3pGWQ+q o= Received: (qmail 3902 invoked by alias); 23 Mar 2019 15:00:41 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 3590 invoked by uid 89); 23 Mar 2019 15:00:23 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: smtpout1.mo528.mail-out.ovh.net Date: Sat, 23 Mar 2019 12:00:07 -0300 From: "Gabriel F. T. Gomes" To: Adhemerval Zanella CC: Subject: Re: [PATCH v2 1/6] wcsmbs: Add wcscpy loop unroll option Message-ID: <20190323150007.mxeaankezp6abslb@tereshkova.lan> References: <20190313140317.8894-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20190313140317.8894-1-adhemerval.zanella@linaro.org> User-Agent: NeoMutt/20180716 X-Ovh-Tracer-Id: 5331417537641565897 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedutddrjeefgdejgecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemucehtddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd On Wed, Mar 13 2019, Adhemerval Zanella wrote: > This allows an architecture the use the old generic implementation > and also set explicit loop unrolling. s/the use/to use/ > * include/loop_unroll.h: New file. > * wcsmbs/wcscpy (__wcscpy): Add option to use loop unrolling > besides generic implementation. OK. > +/* Loop unroll macro to be used for explicity force loop unrolling with a > + configurable number or iterations. The idea is to make the loop unrolling > + independent of whether compiler is able to unrolling through specific > + optimizations options (-funroll-loops or -funroll-all-loops). > + > + For instance, to implement strcpy with SRC being the source input and > + DEST the destination buffer, it is expected the macro to be used in this > + way: > + > + #define ITERATION(index) \ > + ({ char c = *str++; *dest++ = c; c != '\0' }) > + > + while (1) > + UNROLL_REPEAT (4, ITERATION) > + > + The loop will be manually unrolled 4 times. Another option is to do > + the index update after the tests: > + > + #define ITERATION(index) \ > + ({ char c = *(str + index); *(dest + index) = c; c != '\0' }) > + #define UPDATE(n) \ > + str += n; dst += n > + > + while (1) > + UNROLL_REPEAT_UPDATE (4, ITERATION, UPDATE) > + > + The loop will be manually unrolled 4 times and the SRC and DEST pointers > + will be update only after last iteration. > + > + Currently both macros unrolls the loop 8 times at maximum. */ Thanks for writing this detailed explanation. > +#define UNROLL_REPEAT_1(X) if (!X(0)) break; > +#define UNROLL_REPEAT_2(X) UNROLL_REPEAT_1(X) if (!X(1)) break; > +#define UNROLL_REPEAT_3(X) UNROLL_REPEAT_2(X) if (!X(2)) break; > +#define UNROLL_REPEAT_4(X) UNROLL_REPEAT_3(X) if (!X(3)) break; > +#define UNROLL_REPEAT_5(X) UNROLL_REPEAT_4(X) if (!X(4)) break; > +#define UNROLL_REPEAT_6(X) UNROLL_REPEAT_5(X) if (!X(5)) break; > +#define UNROLL_REPEAT_7(X) UNROLL_REPEAT_6(X) if (!X(6)) break; > +#define UNROLL_REPEAT_8(X) UNROLL_REPEAT_7(X) if (!X(7)) break; ^ ^ Missing space between macro call and parentheses (twice in each line). > +#define UNROLL_REPEAT__(N, X) UNROLL_EXPAND(UNROLL_REPEAT_ ## N) (X) ^ I think that this is OK according to the style guidelines, as it just expands the name to UNROLL_REPEAT_1, UNROLL_REPEAT_2, etc. But maybe I just didn't quite understand the guidelines. > +#define UNROLL_REPEAT(N, X) \ > + (void) ({ \ > + UNROLL_REPEAT_(UNROLL_EXPAND(N), X); \ ^ Missing space. > + /* Some architectures might have costly tail function call (powerpc > + for instance) where wmemcpy call overhead for smalls sizes might > + be costly than just unroll the main loop. */ s/be costly than just unroll/be more costly than just unrolling/ ... I think. Looks good to me with these changes. Reviewed-by: Gabriel F. T. Gomes