From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id B91761F5AE for ; Wed, 21 Jul 2021 18:15:04 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A2BF83833029 for ; Wed, 21 Jul 2021 18:15:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A2BF83833029 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626891303; bh=VegX8RAniiSYB6KjICLdu7EmMFFlw+sNKNwYal2BjiA=; h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=WIqRj8/VdSDIIPKu/eH9oQ077QMwyOqm5d3kn6gsF7pZ7UcHqdDgH5yGfHIBbCeFb exMURw3RPKSLZ9pxw8g6J+7RMQK2wNnumkqITO9RQFqTym83TRvk21d2y0WBlzn6WH Fm7fRsPDhM79zOBudLgoQuNrIEpCkYMFoG+SFx+o= Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by sourceware.org (Postfix) with ESMTPS id 4F925384843F for ; Wed, 21 Jul 2021 18:14:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4F925384843F Received: by mail-pj1-x1033.google.com with SMTP id my10so2168344pjb.1 for ; Wed, 21 Jul 2021 11:14:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JM3v8BrOFYSzRdwqfCEv7sWOTVHEGpTaq8cZbBvtO4I=; b=qsRXiQsVwsXLfnhADYqvFkNiLdBu8xEX7kQNF8C+aJOja6amSgTdHyz+lJbX0XKfve 2LeJi9yM/nHdXjfZxHTUvGsMuLbfSvr9GxRIQX0dcekUBm6bz+Vtkmlu6ZMln8DWLzFf SWpPYo/0a6wTSW7SBYhu9jiVDVibeBtYwSrR/TsBMkMrzT+pFZY3i5b7pg2qN6KE7Cpi 1u7r5AmSgA/q7ng/hjWSqd4HK9SO050V4zs7rMrZaBnXWItGTQFcLh7qvslFoQC7aVMh tVEPIJGUNqbWJPjhpFMPGzqDOSPwVjeqDTjEgUE+kEIsZIh7HJDGPpfdPnzFmv0zW5iE z3KA== X-Gm-Message-State: AOAM5309+/AtpC0XeJl7RA0GDqGsyBoliRAMLBvdbhp0S7MJhgNBIH/A mOSWxv6IXOKGjJfVfAu24QzXBFaX/Kk7aCWQtyOWW6WrhhQ= X-Google-Smtp-Source: ABdhPJz3dm3WTbvxjC06TOAf77uPX6RXVpZNMVYcdov2ckCHpT1qviBt6AAngWdVDUOgnaUMVNI8zMcjcHu/rImIxk8= X-Received: by 2002:a17:90a:6d63:: with SMTP id z90mr38025169pjj.177.1626891283455; Wed, 21 Jul 2021 11:14:43 -0700 (PDT) MIME-Version: 1.0 References: <20210713082214.307529-1-naohirot@fujitsu.com> <20210720063500.362313-1-naohirot@fujitsu.com> In-Reply-To: Date: Wed, 21 Jul 2021 14:14:32 -0400 Message-ID: Subject: Re: [PATCH v2 2/5] benchtests: Add memset zero fill benchtest To: "naohirot@fujitsu.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Noah Goldstein via Libc-alpha Reply-To: Noah Goldstein Cc: GNU C Library , Wilco Dijkstra Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On Wed, Jul 21, 2021 at 9:07 AM naohirot@fujitsu.com wrote: > Hi Noah, > > One typo in the updated code. > Wrong: > #define START_SIZE (16 * 1024) > Right: > #define BUF1PAGES 16 > > Thanks > Naohiro > ________________________________________ > From: Tamura, Naohiro/=E7=94=B0=E6=9D=91 =E7=9B=B4=E5=BA=83 > Sent: Wednesday, 21 July 2021 21:56 > To: Noah Goldstein > Cc: Wilco Dijkstra; Lucas A. M. Magalhaes; GNU C Library > Subject: RE: [PATCH v2 2/5] benchtests: Add memset zero fill benchtest > > Hi Noah, > > Thank you for the review. > > > > +#define TEST_MAIN > > > +#define TEST_NAME "memset" > > > +#define START_SIZE (16 * 1024) > > > +#define MIN_PAGE_SIZE (getpagesize () + 64 * 1024 * 1024) > > > +#define TIMEOUT (20 * 60) > > > +#include "bench-string.h" > > > + > > > +#include "json-lib.h" > > > + > > > +void *generic_memset (void *, int, size_t); > > > +typedef void *(*proto_t) (void *, int, size_t); > > > + > > > +IMPL (MEMSET, 1) > > > +IMPL (generic_memset, 0) > > > + > > > +static void > > Do we want __attribute__((noinline, noclone))? > > Yes, I'll add it. > > > > +do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s, > > > + int c1 __attribute ((unused)), int c2 __attribute > ((unused)), > > > + size_t n) > > > +{ > > > + size_t i, iters =3D 16; > > > > I think 16 is probably too few iterations for reliable benchmarking. > > Maybe `INNER_LOOP_ITERS` which is 8192 > > I tried it. If it is changed to 8192, it hit the TIMEOUT (20 * 60) on > a64fx. > Please check the code below. > > > > > > + timing_t start, stop, cur; > > > + > > > + TIMING_NOW (start); > > > + for (i =3D 0; i < iters; i +=3D 2) > > > + { > > > + CALL (impl, s, c1, n); > > I am a bit worried that the overhead from the first call with `c1` will > distort the results. > > Is it possible to implement it with a nested loop where you fill `s` > with `c1` for > > `n * inner_loop_iterations` in the outer loop and in the inner loop fil= l > `c2` on `s + n * i`? > > In that case maybe 16 for inner loop iterations and 512 for outer loop > iterations. > > It seems that we have to set smaller number if this implementation is not > wrong. > Because it will take 99.4 minutes estimating from the case that "iters = =3D > 32" > took 23.3 seconds. > (8192/32*23.3/60=3D99.4) I see. I think 16 for the inner loop makes sense. From the x86_64 perspective this will keep the loop from running out of the LSD which is necessary for accurate benchmarking. I guess then somewhere between [2, 8] is reasonable for the outer loop? > #define START_SIZE (16 * 1024) > ... > static void > __attribute__((noinline, noclone)) > do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s, > int c1 __attribute ((unused)), int c2 __attribute ((unused))= , > size_t n) > { > size_t i, j, iters =3D INNER_LOOP_ITERS; // 32; > timing_t start, stop, cur, latency =3D 0; > > for (i =3D 0; i < 512; i++) // for (i =3D 0; i < 2; i++) > { CALL (impl, s, c1, n * 16); > TIMING_NOW (start); > for (j =3D 0; j < 16; j++) > CALL (impl, s + n * j, c2, n); > TIMING_NOW (stop); > TIMING_DIFF (cur, start, stop); > TIMING_ACCUM (latency, cur); > } > This looks good. But as you said, a much smaller value for outer loop. > > json_element_double (json_ctx, (double) latency / (double) iters); > } > > > > + CALL (impl, s, c2, n); > > > + } > > > + TIMING_NOW (stop); > > > + > > > + TIMING_DIFF (cur, start, stop); > > > + > > > + json_element_double (json_ctx, (double) cur / (double) iters); > > > +} > > > + > > > +static void > > > +do_test (json_ctx_t *json_ctx, size_t align, int c1, int c2, size_t > len) > > > +{ > > > + align &=3D 63; > > Can you make this `align &=3D getpagesize () - 1;`? > > I'll change it. > > Thanks. > Naohiro >