From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.3 required=3.0 tests=AWL,BAYES_00,BODY_8BITS, DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 58B781F453 for ; Thu, 14 Feb 2019 16:39:04 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; q=dns; s=default; b=MBJ2nSiwDqXVA9YMSzdN3WhQDGgyd LcEkPob+6ashw3CmdlxWMZVzxK9Rnr9HAea32kQLU2gt5QZ6puzZZBsqX6kZb0zn /ssP5pNtqi52Kf32p6awPjgXjPMVP0PjXrgIc27308Cozw/OMgYNeN39TQs64YfW M3/0BKbjPqjsUc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; s=default; bh=FarNPZZNYY45whdS6N5za0kRxhc=; b=hqr 91OoHSSzL2gAtPlSpa6gGbejPHPqg7QIj4iickmN0WF1gbymq9zDcF6POUTUNMTp vkovlcsRruzuxwXI08cYKXry8wrOcnxQpePt+LGwcA3v6BVAEmkHS479/oZViCT/ uqH2QKUP1FD9VAG9JpI0uJr6jna5mOPCJMi8TNNg= Received: (qmail 28167 invoked by alias); 14 Feb 2019 16:39:01 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 28144 invoked by uid 89); 14 Feb 2019 16:39:01 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BBmXpiYvUR6trNaWL77EgTajRG5KD+yEkaKCMPlEuBg=; b=pWYDFvRRVPPV/q2s+93Ucx5qNmX3/sC5ZWCEH105v7WkJudAJ32md1ZKbDpZgAMPcrtBxQMQQby6iNmhLn36DA5hKsBXLgwp0u+yN9RpNtMsKo4YGx5H46sDxCBAuxNt/YT56lzC1AR5ZzeOVROhMGu9f9zYnBofJ7hVKmy/TJE= From: Wilco Dijkstra To: DJ Delorie CC: "libc-alpha@sourceware.org" , nd Subject: Re: [PATCH] Add malloc micro benchmark Date: Thu, 14 Feb 2019 16:38:54 +0000 Message-ID: References: (message from Wilco Dijkstra on Fri, 1 Feb 2019 16:27:34 +0000), In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-exchange-purlcount: 1 received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED Hi DJ, > Looks good to me, although I'd like some additional comments in the test > code. Thanks for the review - I've added some extra comments: +/* Benchmark the malloc/free performance of a varying number of blocks of = a + given size. This enables performance tracking of the t-cache and fastb= ins. + It tests 3 different scenarios: single-threaded using main arena, + multi-threaded using thread-arena, and main arena with SINGLE_THREAD_P + false. */ > +=A0=A0=A0=A0=A0=A0 else \ > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 for thr in 8 16 32 64 128 256 512 1= 024 2048 4096; do \ > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 echo "Running $${run} $${thr}= "; \ > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 $(run-bench) $${thr} > $${run= }-$${thr}.out; \ > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 done;\ > +=A0=A0=A0=A0=A0=A0 fi;\ >=A0=A0=A0=A0=A0=A0=A0 done > I wonder if this could be done more elegantly, but I'm OK with a simple > approach for now.=A0 If we end up adding many more such tests we might > need to revisit this part. The main concern was to get a clean state so that the test of a previous bl= ock size doesn't affect subsequent results. > +#define NUM_ITERS 1000000 > +#define NUM_ALLOCS 4 > +#define MAX_ALLOCS 1600 > How long does this test take to run, on average, compared to other > tests?=A0 Do we have to worry about increasing timeouts for slow hosts? All the tests together runs finish in a fraction of the time taken by a sin= gle test of bench-malloc-thread, so if anything we need to reduce the time of that one by an order of magnitude (it takes ~5 minutes!). > +static void > +do_benchmark (malloc_args *args, int **arr) > +{ > +=A0 timing_t start, stop; > +=A0 size_t iters =3D args->iters; > +=A0 size_t size =3D args->size; > +=A0 int n =3D args->n; > + > +=A0 TIMING_NOW (start); > + > +=A0 for (int j =3D 0; j < iters; j++) > +=A0=A0=A0 { > +=A0=A0=A0=A0=A0 for (int i =3D 0; i < n; i++) > +=A0=A0=A0=A0 arr[i] =3D malloc (size); > + > +=A0=A0=A0=A0=A0 for (int i =3D 0; i < n; i++) > +=A0=A0=A0=A0 free (arr[i]); > +=A0=A0=A0 } > + > +=A0 TIMING_NOW (stop); > + > +=A0 TIMING_DIFF (args->elapsed, start, stop); > +} > Simple loop, but doesn't test for malloc returning NULL. Yeah, the benchmark doesn't need to care since the amount we allocate is tiny (6.4MBytes). Cheers, Wilco I've committed this: Add a malloc micro benchmark to enable accurate testing of the various paths in malloc and free. The benchmark does a varying number of allocations of a given block size, then frees them again. It tests 3 different scenarios: single-threaded using main arena, multi-threaded using thread-arena, main arena with SINGLE_THREAD_P false. OK for commit? ChangeLog: 2019-02-14 Wilco Dijkstra * benchtests/Makefile: Add malloc-simple benchmark. * benchtests/bench-malloc-simple.c: New benchmark. -- diff --git a/benchtests/Makefile b/benchtests/Makefile index 12036b1935dc7ea84b421f024d6fe3190ae35a6e..09f7cb8e475a312268eebb4d346= edde70d22bb3d 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -90,7 +90,7 @@ CFLAGS-bench-trunc.c +=3D -fno-builtin CFLAGS-bench-truncf.c +=3D -fno-builtin =20 ifeq (${BENCHSET},) -bench-malloc :=3D malloc-thread +bench-malloc :=3D malloc-thread malloc-simple else bench-malloc :=3D $(filter malloc-%,${BENCHSET}) endif @@ -98,7 +98,7 @@ endif $(addprefix $(objpfx)bench-,$(bench-math)): $(libm) $(addprefix $(objpfx)bench-,$(math-benchset)): $(libm) $(addprefix $(objpfx)bench-,$(bench-pthread)): $(shared-thread-library) -$(objpfx)bench-malloc-thread: $(shared-thread-library) +$(addprefix $(objpfx)bench-,$(bench-malloc)): $(shared-thread-library) =20 =0C =20 @@ -165,7 +165,7 @@ bench-clean: ifneq ($(strip ${BENCHSET}),) VALIDBENCHSETNAMES :=3D bench-pthread bench-math bench-string string-bench= set \ wcsmbs-benchset stdlib-benchset stdio-common-benchset math-benchset \ - malloc-thread + malloc-thread malloc-simple INVALIDBENCHSETNAMES :=3D $(filter-out ${VALIDBENCHSETNAMES},${BENCHSET}) ifneq (${INVALIDBENCHSETNAMES},) $(info The following values in BENCHSET are invalid: ${INVALIDBENCHSETNAME= S}) @@ -194,10 +194,18 @@ bench-set: $(binaries-benchset) =20 bench-malloc: $(binaries-bench-malloc) for run in $^; do \ + echo "$${run}"; \ + if [ `basename $${run}` =3D "bench-malloc-thread" ]; then \ for thr in 1 8 16 32; do \ echo "Running $${run} $${thr}"; \ - $(run-bench) $${thr} > $${run}-$${thr}.out; \ - done;\ + $(run-bench) $${thr} > $${run}-$${thr}.out; \ + done;\ + else \ + for thr in 8 16 32 64 128 256 512 1024 2048 4096; do \ + echo "Running $${run} $${thr}"; \ + $(run-bench) $${thr} > $${run}-$${thr}.out; \ + done;\ + fi;\ done =20 # Build and execute the benchmark functions. This target generates JSON diff --git a/benchtests/bench-malloc-simple.c b/benchtests/bench-malloc-sim= ple.c new file mode 100644 index 0000000000000000000000000000000000000000..83203ff3187654a1710c9ef8101= 6f854957b9d64 --- /dev/null +++ b/benchtests/bench-malloc-simple.c @@ -0,0 +1,188 @@ +/* Benchmark malloc and free functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include "bench-timing.h" +#include "json-lib.h" + +/* Benchmark the malloc/free performance of a varying number of blocks of = a + given size. This enables performance tracking of the t-cache and fastb= ins. + It tests 3 different scenarios: single-threaded using main arena, + multi-threaded using thread-arena, and main arena with SINGLE_THREAD_P + false. */ + +#define NUM_ITERS 200000 +#define NUM_ALLOCS 4 +#define MAX_ALLOCS 1600 + +typedef struct +{ + size_t iters; + size_t size; + int n; + timing_t elapsed; +} malloc_args; + +static void +do_benchmark (malloc_args *args, int **arr) +{ + timing_t start, stop; + size_t iters =3D args->iters; + size_t size =3D args->size; + int n =3D args->n; + + TIMING_NOW (start); + + for (int j =3D 0; j < iters; j++) + { + for (int i =3D 0; i < n; i++) + arr[i] =3D malloc (size); + + for (int i =3D 0; i < n; i++) + free (arr[i]); + } + + TIMING_NOW (stop); + + TIMING_DIFF (args->elapsed, start, stop); +} + +static malloc_args tests[3][NUM_ALLOCS]; +static int allocs[NUM_ALLOCS] =3D { 25, 100, 400, MAX_ALLOCS }; + +static void * +thread_test (void *p) +{ + int **arr =3D (int**)p; + + /* Run benchmark multi-threaded. */ + for (int i =3D 0; i < NUM_ALLOCS; i++) + do_benchmark (&tests[2][i], arr); + + return p; +} + +void +bench (unsigned long size) +{ + size_t iters =3D NUM_ITERS; + int **arr =3D (int**) malloc (MAX_ALLOCS * sizeof (void*)); + unsigned long res; + + TIMING_INIT (res); + + for (int t =3D 0; t <=3D 3; t++) + for (int i =3D 0; i < NUM_ALLOCS; i++) + { + tests[t][i].n =3D allocs[i]; + tests[t][i].size =3D size; + tests[t][i].iters =3D iters / allocs[i]; + + /* Do a quick warmup run. */ + if (t =3D=3D 0) + do_benchmark (&tests[0][i], arr); + } + + /* Run benchmark single threaded in main_arena. */ + for (int i =3D 0; i < NUM_ALLOCS; i++) + do_benchmark (&tests[0][i], arr); + + /* Run benchmark in a thread_arena. */ + pthread_t t; + pthread_create (&t, NULL, thread_test, (void*)arr); + pthread_join (t, NULL); + + /* Repeat benchmark in main_arena with SINGLE_THREAD_P =3D=3D false. */ + for (int i =3D 0; i < NUM_ALLOCS; i++) + do_benchmark (&tests[1][i], arr); + + free (arr); + + json_ctx_t json_ctx; + + json_init (&json_ctx, 0, stdout); + + json_document_begin (&json_ctx); + + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); + + json_attr_object_begin (&json_ctx, "functions"); + + json_attr_object_begin (&json_ctx, "malloc"); + + char s[100]; + double iters2 =3D iters; + + json_attr_object_begin (&json_ctx, ""); + json_attr_double (&json_ctx, "malloc_block_size", size); + + struct rusage usage; + getrusage (RUSAGE_SELF, &usage); + json_attr_double (&json_ctx, "max_rss", usage.ru_maxrss); + + for (int i =3D 0; i < NUM_ALLOCS; i++) + { + sprintf (s, "main_arena_st_allocs_%04d_time", allocs[i]); + json_attr_double (&json_ctx, s, tests[0][i].elapsed / iters2); + } + + for (int i =3D 0; i < NUM_ALLOCS; i++) + { + sprintf (s, "main_arena_mt_allocs_%04d_time", allocs[i]); + json_attr_double (&json_ctx, s, tests[1][i].elapsed / iters2); + } + + for (int i =3D 0; i < NUM_ALLOCS; i++) + { + sprintf (s, "thread_arena__allocs_%04d_time", allocs[i]); + json_attr_double (&json_ctx, s, tests[2][i].elapsed / iters2); + } + + json_attr_object_end (&json_ctx); + + json_attr_object_end (&json_ctx); + + json_attr_object_end (&json_ctx); + + json_document_end (&json_ctx); +} + +static void usage (const char *name) +{ + fprintf (stderr, "%s: \n", name); + exit (1); +} + +int +main (int argc, char **argv) +{ + long val =3D 16; + if (argc =3D=3D 2) + val =3D strtol (argv[1], NULL, 0); + + if (argc > 2 || val <=3D 0) + usage (argv[0]); + + bench (val); + + return 0; +}=