unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Gabriel F. T. Gomes" <gabriel@inconstante.eti.br>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: <libc-alpha@sourceware.org>
Subject: Re: [PATCH v2 1/6] wcsmbs: Add wcscpy loop unroll option
Date: Sat, 23 Mar 2019 12:00:07 -0300	[thread overview]
Message-ID: <20190323150007.mxeaankezp6abslb@tereshkova.lan> (raw)
In-Reply-To: <20190313140317.8894-1-adhemerval.zanella@linaro.org>

On Wed, Mar 13 2019, Adhemerval Zanella wrote:
> This allows an architecture the use the old generic implementation
> and also set explicit loop unrolling.

s/the use/to use/

> 	* include/loop_unroll.h: New file.
> 	* wcsmbs/wcscpy (__wcscpy): Add option to use loop unrolling
> 	besides generic implementation.

OK.

> +/* Loop unroll macro to be used for explicity force loop unrolling with a
> +   configurable number or iterations.  The idea is to make the loop unrolling
> +   independent of whether compiler is able to unrolling through specific
> +   optimizations options (-funroll-loops or -funroll-all-loops).
> +
> +   For instance, to implement strcpy with SRC being the source input and 
> +   DEST the destination buffer, it is expected the macro to be used in this
> +   way:
> +
> +     #define ITERATION(index)	\
> +       ({ char c = *str++; *dest++ = c; c != '\0' })
> +
> +     while (1)
> +       UNROLL_REPEAT (4, ITERATION)
> +
> +   The loop will be manually unrolled 4 times.  Another option is to do
> +   the index update after the tests:
> +
> +     #define ITERATION(index)	\
> +       ({ char c = *(str + index); *(dest + index) = c; c != '\0' })
> +     #define UPDATE(n)		\
> +       str += n; dst += n
> +
> +     while (1)
> +       UNROLL_REPEAT_UPDATE (4, ITERATION, UPDATE)
> +
> +   The loop will be manually unrolled 4 times and the SRC and DEST pointers
> +   will be update only after last iteration.
> +
> +   Currently both macros unrolls the loop 8 times at maximum.  */

Thanks for writing this detailed explanation.

> +#define UNROLL_REPEAT_1(X)    if (!X(0)) break;
> +#define UNROLL_REPEAT_2(X)    UNROLL_REPEAT_1(X) if (!X(1)) break;
> +#define UNROLL_REPEAT_3(X)    UNROLL_REPEAT_2(X) if (!X(2)) break;
> +#define UNROLL_REPEAT_4(X)    UNROLL_REPEAT_3(X) if (!X(3)) break;
> +#define UNROLL_REPEAT_5(X)    UNROLL_REPEAT_4(X) if (!X(4)) break;
> +#define UNROLL_REPEAT_6(X)    UNROLL_REPEAT_5(X) if (!X(5)) break;
> +#define UNROLL_REPEAT_7(X)    UNROLL_REPEAT_6(X) if (!X(6)) break;
> +#define UNROLL_REPEAT_8(X)    UNROLL_REPEAT_7(X) if (!X(7)) break;
                                                ^         ^
Missing space between macro call and parentheses (twice in each line).

> +#define UNROLL_REPEAT__(N, X) UNROLL_EXPAND(UNROLL_REPEAT_ ## N) (X)
                                              ^
I think that this is OK according to the style guidelines, as it just
expands the name to UNROLL_REPEAT_1, UNROLL_REPEAT_2, etc.  But maybe I
just didn't quite understand the guidelines.

> +#define UNROLL_REPEAT(N, X)                \
> +  (void) ({                                \
> +    UNROLL_REPEAT_(UNROLL_EXPAND(N), X);   \
                     ^
Missing space.

> +  /* Some architectures might have costly tail function call (powerpc
> +     for instance) where wmemcpy call overhead for smalls sizes might
> +     be costly than just unroll the main loop.  */

s/be costly than just unroll/be more costly than just unrolling/

... I think.

Looks good to me with these changes.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>

      parent reply	other threads:[~2019-03-23 15:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-13 14:03 [PATCH v2 1/6] wcsmbs: Add wcscpy loop unroll option Adhemerval Zanella
2019-03-13 14:03 ` [PATCH v2 2/6] powerpc: Use generic wcscpy optimization Adhemerval Zanella
2019-03-23 14:56   ` Gabriel F. T. Gomes
2019-03-13 14:03 ` [PATCH v2 3/6] wcsmbs: Use loop_unroll on wcschr Adhemerval Zanella
2019-03-23 15:00   ` Gabriel F. T. Gomes
2019-03-13 14:03 ` [PATCH v2 4/6] powerpc: Use generic wcschr optimization Adhemerval Zanella
2019-03-23 15:00   ` Gabriel F. T. Gomes
2019-03-13 14:03 ` [PATCH v2 5/6] wcsmbs: Use loop_unroll on wcsrchr Adhemerval Zanella
2019-03-23 15:00   ` Gabriel F. T. Gomes
2019-03-13 14:03 ` [PATCH v2 6/6] powerpc: Use generic wcscpy optimization Adhemerval Zanella
2019-03-23 15:01   ` Gabriel F. T. Gomes
2019-03-23 15:00 ` Gabriel F. T. Gomes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190323150007.mxeaankezp6abslb@tereshkova.lan \
    --to=gabriel@inconstante.eti.br \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).