From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 628B91F8C6 for ; Mon, 13 Sep 2021 14:08:06 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 76C8E385841A for ; Mon, 13 Sep 2021 14:08:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 76C8E385841A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1631542085; bh=5Q5HCEMCMoHzV22mCmTDAT9xk76mAdh8o43EMdzdQuQ=; h=In-Reply-To:References:Subject:To:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Bb7CLyR+QXNhEU0rkeAVeLYaB7v5PUOwo/zgvPnA9eT9NwUv/uMjzeSvHqA4cpVfh 0M4RNLtt3IQVaiGPLO5Vs7SjDjq+qK/G/xQ6ggX6iQGkkdXclAplU3XZqeCkBShI13 USq/LsrCOsV+37XKkmKCLl8z4vtNGgHHbAQW2tGI= Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id CDD903858C60 for ; Mon, 13 Sep 2021 14:06:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CDD903858C60 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.0.43) with SMTP id 18DDxGNu018512; Mon, 13 Sep 2021 10:06:18 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3b2476xp0u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Sep 2021 10:06:18 -0400 Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 18DBLbOO005078; Mon, 13 Sep 2021 10:06:17 -0400 Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0a-001b2d01.pphosted.com with ESMTP id 3b2476xp0d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Sep 2021 10:06:17 -0400 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18DE3uJ7010539; Mon, 13 Sep 2021 14:06:16 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma01wdc.us.ibm.com with ESMTP id 3b0m38ss37-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Sep 2021 14:06:16 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 18DE5FlF24904160 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 13 Sep 2021 14:05:16 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CF830112072; Mon, 13 Sep 2021 14:05:15 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 67BAE112063; Mon, 13 Sep 2021 14:05:15 +0000 (GMT) Received: from localhost (unknown [9.160.169.186]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 13 Sep 2021 14:05:15 +0000 (GMT) Content-Type: text/plain; charset="utf-8" In-Reply-To: References: <20210805074733.433430-1-naohirot@fujitsu.com> <20210805075053.433538-1-naohirot@fujitsu.com> <163130642274.404689.6991051609396665932@localhost.localdomain> Subject: RE: [PATCH v3 2/5] benchtests: Add memset zero fill benchtest To: Noah Goldstein , Wilco Dijkstra , libc-alpha@sourceware.org, "naohirot@fujitsu.com" Date: Mon, 13 Sep 2021 11:05:14 -0300 Message-ID: <163154191414.705584.12050866556951422556@localhost.localdomain> User-Agent: alot/0.9.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: tsOXmxy4Jl4ZH3tVurP8pTRtEqJYI7O6 X-Proofpoint-ORIG-GUID: xt5X4hvAKjC_sgIgmI8Hstg3dWeu5oQm Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.687,Hydra:6.0.235,FMLib:17.0.607.475 definitions=2020-10-13_15,2020-10-13_02,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 adultscore=0 mlxscore=0 priorityscore=1501 spamscore=0 impostorscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109130063 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: "Lucas A. M. Magalhaes via Libc-alpha" Reply-To: "Lucas A. M. Magalhaes" Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Quoting naohirot@fujitsu.com (2021-09-12 21:53:22) > Hi Lucas, >=20 > > From: Lucas A. M. Magalhaes > > Sent: Saturday, September 11, 2021 5:40 AM > >=20 > > Thanks for working on this. Please, correct me if I'm wrong but I guess= you sent > > an old version by mistake. This patch is lacking the bench-variant > > implementations mentioned on the commit message. >=20 > Thank you for the comment! > I double checked the source code and confirmed it is the one I intended. > 4 patterns are combination of json attribute "char1" and "char2". > "char1" and "char2" varies 0 and 1 respectively. >=20 > zero-over-zero: char1=3D0, char2=3D0 > zero-over-one: char1=3D0, char2=3D1 > one-over-zero: char1=3D1, char2=3D0 > one-over-one: char1=3D1, char2=3D1 >=20 > I made a comment inline too. >=20 Thanks for clarifying, now I got it. Please can you add a comment on the code explaining this patterns and the reason behind them? With that said this patch LGTM. > BTW, could you review the patch "benchtests: Remove redundant assert.h" [= 1] > that is reflected your comment [2] to other bench tests if you had time? >=20 > [1] https://sourceware.org/pipermail/libc-alpha/2021-August/129840.html > [2] https://sourceware.org/pipermail/libc-alpha/2021-July/128989.html >=20 > >=20 > > Quoting Naohiro Tamura (2021-08-05 04:50:53) > > > Memset takes 0 as the second parameter in most cases. > > > However, we cannot measure the zero fill performance by > > > bench-memset.c, bench-memset-large.c and bench-memset-walk.c > > > precisely. > > > X86_64 micro-architecture has some zero-over-zero optimization, and > > > AArch64 micro-architecture also has some optimization for DC ZVA > > > instruction. > > > This patch provides bench-memset-zerofill.c which is suitable to > > > analyze the zero fill performance by comparing among 4 patterns, > > > zero-over-zero, zero-over-one, one-over-zero and one-over-one, from > > > 256B to 64MB(RAM) through L1, L2 and L3 caches. > > > > > > The following commands are examples to analyze a JSON output, > > > bench-memset-zerofill.out, by 'jq' and 'plot_strings.py'. > > > > > > 1) compare zero-over-zero performance > > > > > > $ cat bench-memset-zerofill.out | \ > > > jq -r ' > > > .functions.memset."bench-variant"=3D"zerofill-0o0" | > > > del(.functions.memset.results[] | select(.char1 !=3D 0 or .char2 = !=3D 0)) > > > ' | \ > > > plot_strings.py -l -p thru -v - > > > > > > 2) compare zero paformance > > > > > > $ cat bench-memset-zerofill.out | \ > > > jq -r ' > > > .functions.memset."bench-variant"=3D"zerofill-zero" | > > > del(.functions.memset.results[] | select(.char2 !=3D 0)) > > > ' | \ > > > plot_strings.py -l -p thru -v - > > > > > > 3) compare nonzero paformance > > > > > > $ cat bench-memset-zerofill.out | \ > > > jq -r ' > > > .functions.memset."bench-variant"=3D"zerofill-nonzero" | > > > del(.functions.memset.results[] | select(.char2 =3D=3D 0)) > > > ' | \ > > > plot_strings.py -l -p thru -v - > > > --- > > > benchtests/Makefile | 2 +- > > > benchtests/bench-memset-zerofill.c | 134 +++++++++++++++++++++++++++= ++ > > > 2 files changed, 135 insertions(+), 1 deletion(-) > > > create mode 100644 benchtests/bench-memset-zerofill.c > > > > > > diff --git a/benchtests/Makefile b/benchtests/Makefile > > > index 1530939a8ce8..21b95c736190 100644 > > > --- a/benchtests/Makefile > > > +++ b/benchtests/Makefile > > > @@ -53,7 +53,7 @@ string-benchset :=3D memccpy memchr memcmp memcpy m= emmem memmove \ > > > strncasecmp strncat strncmp strncpy strnlen strpbr= k strrchr \ > > > strspn strstr strcpy_chk stpcpy_chk memrchr strsep= strtok \ > > > strcoll memcpy-large memcpy-random memmove-large m= emset-large \ > > > - memcpy-walk memset-walk memmove-walk > > > + memcpy-walk memset-walk memmove-walk memset-zerofi= ll > > > > > > # Build and run locale-dependent benchmarks only if we're building n= atively. > > > ifeq (no,$(cross-compiling)) > > > diff --git a/benchtests/bench-memset-zerofill.c b/benchtests/bench-me= mset-zerofill.c > > > new file mode 100644 > > > index 000000000000..7aa7fe048574 > > > --- /dev/null > > > +++ b/benchtests/bench-memset-zerofill.c > > > @@ -0,0 +1,134 @@ > > > +/* Measure memset functions with zero fill data. > > > + Copyright (C) 2021 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/or > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version. > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#define TEST_MAIN > > > +#define TEST_NAME "memset" > > > +#define START_SIZE 256 > > > +#define MIN_PAGE_SIZE (getpagesize () + 64 * 1024 * 1024) > > > +#define TIMEOUT (20 * 60) > > > +#include "bench-string.h" > > > + > > > +#include "json-lib.h" > > > + > > > +void *generic_memset (void *, int, size_t); > > > +typedef void *(*proto_t) (void *, int, size_t); > > > + > > > +IMPL (MEMSET, 1) > > > +IMPL (generic_memset, 0) > > > + > > > +static void > > > +__attribute__((noinline, noclone)) > > > +do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s, > > > + int c1 __attribute ((unused)), int c2 __attribute ((unus= ed)), > > > + size_t n) > > > +{ > > > + size_t i, iters =3D 32; > > > + timing_t start, stop, cur, latency =3D 0; > > > + > > > + CALL (impl, s, c2, n); // warm up > > > + > > > + for (i =3D 0; i < iters; i++) > > > + { > > > + memset (s, c1, n); // alternation > > > + > > > + TIMING_NOW (start); > > > + > > > + CALL (impl, s, c2, n); > > > + > > > + TIMING_NOW (stop); > > > + TIMING_DIFF (cur, start, stop); > > > + TIMING_ACCUM (latency, cur); > > > + } > > > + > > > + json_element_double (json_ctx, (double) latency / (double) iters); > > > +} > > > + Ok. > > > +static void > > > +do_test (json_ctx_t *json_ctx, size_t align, int c1, int c2, size_t = len) > > > +{ > > > + align &=3D getpagesize () - 1; > > > + if ((align + len) * sizeof (CHAR) > page_size) > > > + return; > > > + > > > + json_element_object_begin (json_ctx); > > > + json_attr_uint (json_ctx, "length", len); > > > + json_attr_uint (json_ctx, "alignment", align); > > > + json_attr_int (json_ctx, "char1", c1); > > > + json_attr_int (json_ctx, "char2", c2); > > > + json_array_begin (json_ctx, "timings"); > > > + > > > + FOR_EACH_IMPL (impl, 0) > > > + { > > > + do_one_test (json_ctx, impl, (CHAR *) (buf1) + align, c1, c2, = len); > > > + alloc_bufs (); > > > + } > > > + > > > + json_array_end (json_ctx); > > > + json_element_object_end (json_ctx); > > > +} Ok. > > > + > > > +int > > > +test_main (void) > > > +{ > > > + json_ctx_t json_ctx; > > > + size_t i; > > > + int c1, c2; > > > + > > > + test_init (); > > > + > > > + json_init (&json_ctx, 0, stdout); > > > + > > > + json_document_begin (&json_ctx); > > > + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); > > > + > > > + json_attr_object_begin (&json_ctx, "functions"); > > > + json_attr_object_begin (&json_ctx, TEST_NAME); > > > + json_attr_string (&json_ctx, "bench-variant", "zerofill"); > > > + > > > + json_array_begin (&json_ctx, "ifuncs"); > > > + FOR_EACH_IMPL (impl, 0) > > > + json_element_string (&json_ctx, impl->name); > > > + json_array_end (&json_ctx); > > > + > > > + json_array_begin (&json_ctx, "results"); > > > + > > > + for (c1 =3D 0; c1 < 2; c1++) > > > + for (c2 =3D 0; c2 < 2; c2++) > > > + for (i =3D START_SIZE; i <=3D MIN_PAGE_SIZE; i <<=3D 1) > > > + { > > > + do_test (&json_ctx, 0, c1, c2, i); > > > + do_test (&json_ctx, 3, c1, c2, i); > > > + } > > > + > > > + json_array_end (&json_ctx); > > > + json_attr_object_end (&json_ctx); > > > + json_attr_object_end (&json_ctx); > > > + json_document_end (&json_ctx); > > > + > > > + return ret; > > > +} Ok. > > > + > > > +#include > > > + > > > +#define libc_hidden_builtin_def(X) > > > +#define libc_hidden_def(X) > > > +#define libc_hidden_weak(X) > > > +#define weak_alias(X,Y) > > > +#undef MEMSET > > > +#define MEMSET generic_memset > > > +#include > > > -- > > > 2.17.1 > > >