From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, PDS_RDNS_DYNAMIC_FP,RCVD_IN_DNSWL_MED,RDNS_DYNAMIC,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 2565F1F5AE for ; Tue, 20 Jul 2021 16:48:54 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 36A47398F037 for ; Tue, 20 Jul 2021 16:48:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 36A47398F037 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626799733; bh=QdImIqgNi1VTcK1ObU6MK8/HIRz33XjI5/ahSMLt1rs=; h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=w/2o8uvVYiAc8hSJCDd4Tae3U4sgx9eKjvad+oXju1z1JtdvuOP9uyskxdHYg01SD VLJOXiWfT0CK9w1TYLBKOR5kAnYs4znue1dvzxRh0+0alLbRL51rNiMDyPlMsmGZce kNCWB65RI9GA7lvjc1y7+X7dca6g6eZIrotdhaGY= Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by sourceware.org (Postfix) with ESMTPS id 5A07B3857433 for ; Tue, 20 Jul 2021 16:48:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5A07B3857433 Received: by mail-pj1-x102c.google.com with SMTP id h6-20020a17090a6486b029017613554465so2784835pjj.4 for ; Tue, 20 Jul 2021 09:48:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nPLGodRpJO24sfVw3ipp4FI4m6vAyCFMj3Kv3oZRW4E=; b=ogS8wwzJW8QQS5w0rMiE2vw4QzC7aFvRjrAtoNQrZi2HNIXfmX82dpQWwCjQETeLA+ aUVjsPhIos4ncI0fsWd/tdsTe3PcYecJPlS3/GdW0ze7j7gJFsTUHy4Mla6ckKd1/01b IdSZUfT1MK3qx+PoRxVwqCqW7Hy9ckuqOvXSRxex5LhniByI/9VQhJlge92/r8Rx6Uai L4OhbWH2093gxHy/YxDh5zcQVHx2U7uRCbN8x5RVppCeIJ7Ekob4l1ciqbeO2AgwN8Sf EW3xYfeq15M+7tAqfv3t+rCSgiIxtVs9HQuo8k+azyhXBGWI2pOnBKyXSp/j5tLJOT80 8s0w== X-Gm-Message-State: AOAM533IwPmOhgIQFwLbUEs8GTp6abVvadVTFjP/Oi6p2oIKZWCgeAJ9 rQ0sXwGcJAE0R0FeLA10sm6zXnYCOuMnCD2s1/A= X-Google-Smtp-Source: ABdhPJy+qpZVMskZuVugJC1FaqC7qfU0l1T2JdJNy3yme6gRVSDdFqbpZ8TRs2oOQtxTLTdOKsX0WXeNs/NEB3G+scQ= X-Received: by 2002:a17:90a:55c8:: with SMTP id o8mr35986310pjm.223.1626799710505; Tue, 20 Jul 2021 09:48:30 -0700 (PDT) MIME-Version: 1.0 References: <20210713082214.307529-1-naohirot@fujitsu.com> <20210720063500.362313-1-naohirot@fujitsu.com> In-Reply-To: <20210720063500.362313-1-naohirot@fujitsu.com> Date: Tue, 20 Jul 2021 12:48:19 -0400 Message-ID: Subject: Re: [PATCH v2 2/5] benchtests: Add memset zero fill benchtest To: Naohiro Tamura Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Noah Goldstein via Libc-alpha Reply-To: Noah Goldstein Cc: GNU C Library , Wilco Dijkstra Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On Tue, Jul 20, 2021 at 2:35 AM Naohiro Tamura wrote: > Memset takes 0 as the second parameter in most cases. > However, we cannot measure the zero fill performance by bench-memset.c > and bench-memset-large.c precisely. > X86_64 micro-architecture has some zero-over-zero optimization, and > AArch64 micro-architecture also has some optimization for DC ZVA > instruction. > This patch provides bench-memset-zerofill.c which is suitable to > analyze the zero fill performance by zero-over-zero and zero-over-one > test cases from 16KB(L1), through L2 and L3, to 64MB(RAM). > --- > benchtests/Makefile | 2 +- > benchtests/bench-memset-zerofill.c | 128 +++++++++++++++++++++++++++++ > 2 files changed, 129 insertions(+), 1 deletion(-) > create mode 100644 benchtests/bench-memset-zerofill.c > > diff --git a/benchtests/Makefile b/benchtests/Makefile > index 1530939a8ce8..21b95c736190 100644 > --- a/benchtests/Makefile > +++ b/benchtests/Makefile > @@ -53,7 +53,7 @@ string-benchset := memccpy memchr memcmp memcpy memmem > memmove \ > strncasecmp strncat strncmp strncpy strnlen strpbrk > strrchr \ > strspn strstr strcpy_chk stpcpy_chk memrchr strsep > strtok \ > strcoll memcpy-large memcpy-random memmove-large > memset-large \ > - memcpy-walk memset-walk memmove-walk > + memcpy-walk memset-walk memmove-walk memset-zerofill > > # Build and run locale-dependent benchmarks only if we're building > natively. > ifeq (no,$(cross-compiling)) > diff --git a/benchtests/bench-memset-zerofill.c > b/benchtests/bench-memset-zerofill.c > new file mode 100644 > index 000000000000..2579b6edd09e > --- /dev/null > +++ b/benchtests/bench-memset-zerofill.c > @@ -0,0 +1,128 @@ > +/* Measure memset functions with zero fill data. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#define TEST_MAIN > +#define TEST_NAME "memset" > +#define START_SIZE (16 * 1024) > +#define MIN_PAGE_SIZE (getpagesize () + 64 * 1024 * 1024) > +#define TIMEOUT (20 * 60) > +#include "bench-string.h" > + > +#include "json-lib.h" > + > +void *generic_memset (void *, int, size_t); > +typedef void *(*proto_t) (void *, int, size_t); > + > +IMPL (MEMSET, 1) > +IMPL (generic_memset, 0) > + > +static void > Do we want __attribute__((noinline, noclone))? > +do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s, > + int c1 __attribute ((unused)), int c2 __attribute ((unused)), > + size_t n) > +{ > + size_t i, iters = 16; > I think 16 is probably too few iterations for reliable benchmarking. Maybe `INNER_LOOP_ITERS` which is 8192 + timing_t start, stop, cur; > + > + TIMING_NOW (start); > + for (i = 0; i < iters; i += 2) > + { > + CALL (impl, s, c1, n); > I am a bit worried that the overhead from the first call with `c1` will distort the results. Is it possible to implement it with a nested loop where you fill `s` with `c1` for `n * inner_loop_iterations` in the outer loop and in the inner loop fill `c2` on `s + n * i`? In that case maybe 16 for inner loop iterations and 512 for outer loop iterations. > + CALL (impl, s, c2, n); > + } > + TIMING_NOW (stop); > + > + TIMING_DIFF (cur, start, stop); > + > + json_element_double (json_ctx, (double) cur / (double) iters); > +} > + > +static void > +do_test (json_ctx_t *json_ctx, size_t align, int c1, int c2, size_t len) > +{ > + align &= 63; > Can you make this `align &= getpagesize () - 1;`? > + if ((align + len) * sizeof (CHAR) > page_size) > + return; > + > + json_element_object_begin (json_ctx); > + json_attr_uint (json_ctx, "length", len); > + json_attr_uint (json_ctx, "alignment", align); > + json_attr_int (json_ctx, "char1", c1); > + json_attr_int (json_ctx, "char2", c2); > + json_array_begin (json_ctx, "timings"); > + > + FOR_EACH_IMPL (impl, 0) > + { > + do_one_test (json_ctx, impl, (CHAR *) (buf1) + align, c1, c2, len); > + alloc_bufs (); > + } > + > + json_array_end (json_ctx); > + json_element_object_end (json_ctx); > +} > + > +int > +test_main (void) > +{ > + json_ctx_t json_ctx; > + size_t i; > + int c1, c2; > + > + test_init (); > + > + json_init (&json_ctx, 0, stdout); > + > + json_document_begin (&json_ctx); > + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); > + > + json_attr_object_begin (&json_ctx, "functions"); > + json_attr_object_begin (&json_ctx, TEST_NAME); > + json_attr_string (&json_ctx, "bench-variant", "zerofill"); > + > + json_array_begin (&json_ctx, "ifuncs"); > + FOR_EACH_IMPL (impl, 0) > + json_element_string (&json_ctx, impl->name); > + json_array_end (&json_ctx); > + > + json_array_begin (&json_ctx, "results"); > + > + c2 = 0; > + for (c1 = 0; c1 < 2; c1++) > + for (i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1) > + { > + do_test (&json_ctx, 0, c1, c2, i); > + do_test (&json_ctx, 3, c1, c2, i); > + } > + > + json_array_end (&json_ctx); > + json_attr_object_end (&json_ctx); > + json_attr_object_end (&json_ctx); > + json_document_end (&json_ctx); > + > + return ret; > +} > + > +#include > + > +#define libc_hidden_builtin_def(X) > +#define libc_hidden_def(X) > +#define libc_hidden_weak(X) > +#define weak_alias(X,Y) > +#undef MEMSET > +#define MEMSET generic_memset > +#include > -- > 2.17.1 > >