From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 336AC1F44D for ; Fri, 26 Apr 2024 04:04:04 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=epoBXWL0; dkim-atps=neutral Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2A48D3858416 for ; Fri, 26 Apr 2024 04:04:00 +0000 (GMT) Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id 2C2B13858C50 for ; Fri, 26 Apr 2024 04:03:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2C2B13858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2C2B13858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714104223; cv=none; b=mk44wW5dHqaX0uCYS4pmkVD28+4yebDmNwfXiikAA/mfE6/VJ9rpvaRRjmByUh5nuYMu5rhTE5XeeJxOZXfYpO9Cfzs0SLf8XgoPL6vSvULX0MVqzbvBDQ4hIVKSvbod3KETnsu3ySqcTHEmx5wkG57QC76nQwIG6mL2j1L0ve8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714104223; c=relaxed/simple; bh=XAbcA43SsUe0EGyQyk3TleH42QQ16XD8PqHsitj9nxU=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=YkZF1o37SWC2Qj8NU4smbmJ36zOfWlQXmgRWdSDFnmq1wbU4wltWR0qkKIQMXE5d8HAXwNEsfxmfqIj1AMlORekgKx3q3AuPZzmh2LWO17zRor5lQiPNnNC/jBRE+7HmPyba8SdH0mrqrS7Q/3GlfowLtt8w7rYr0rqgWTLTFTg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-53fa455cd94so1295823a12.2 for ; Thu, 25 Apr 2024 21:03:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714104212; x=1714709012; darn=sourceware.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=D6nEMbdyBD+G/oK6Y5prmweQ24SsBEMgfvLg9eYFtt0=; b=epoBXWL0n4m2MiZXoPk7GXtOkTHHhlnoeExaTemldefgBdrziQk1At3bfsXTkCj+l/ bUvRLF+PEO0OXku2c0lXZwKRku/ovhyQelgtTdZ1Bqw0Q9PCu1AdwxOhpOdsF9Lr2i/x FvRhVUswsRZroOkOpqO8c4wel8abC2d/xs7GsW7b7ZmNrkvH/r3jZp5oAKoho/Zaqo7/ drhhbAbSfGiMZ7w0PszkNy1CFsJZhtAmx9eJurRPIO03/fvWVCDfgikZM90Kn8MFEI43 +rXq9a5+oKjinGJKG2KTRViGQPsv2QWasDH80mr2pZ5iXfUEP8QXmLU2c2ANOgxvkWsx rmpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714104212; x=1714709012; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=D6nEMbdyBD+G/oK6Y5prmweQ24SsBEMgfvLg9eYFtt0=; b=Duvwrw1wisn7Yt/k/1264HYAMQdewnJjxl0CaZDGj39ekjKQqSbFZTgASzMZ/QQmG1 hokKgZ+4EKEGnGNJpvvbts5WMGPggfdJfnfsr0Xo2Jl2e3C2eg0uOjvO5fa0zIZ2Uip1 PFs6VPGE6nTlEzkXNdzl+Thk0OmN201jnMBhXSUDo/qDewaJ7XCW3KKCpz7lsNNOqdaQ VUKndll+AFSzLr/vpQdxGQjQCljion5bLpGYCp6FCbxrZlwQPxu8Yl5iXfX7xj9u8/py 6JOF6u8KfBl2ULaBD9hDrVsiPvRFDAuyNHJEJSUE0LJzMefs4lBq/Kdi0a0zeB0uuM6/ bwLw== X-Forwarded-Encrypted: i=1; AJvYcCVu4XRhZvi84A5nLCvuy19OFIyen2dBisfXPMkJtPyXu4vKebFB8T0JdMZvCw0OYNA0su39IhFoB0lzHww49358hmJTRz/6kj9o X-Gm-Message-State: AOJu0YxyYcNMD5DuKTcsJWFxmOXJSpmIcIeLfic9k/m8C04Y3/oxLKpg 0F2FOrSvTbJjABDQNvdLJr0HDz7vxfRotk/Jkw8em/mFIA2s6gEdCQGkxainXiIpm8ZDgpXupB7 KNURDdaeiLwtKl6eW3RQc5W356Jp2zpd3jvk= X-Google-Smtp-Source: AGHT+IGOGjf0PXbx9lKwKdPKKZZ/lTtB9ynLD2qTLNNEF6WoYthZkSHpIL3mxbXlotMBnwZbDcZw9hsgxb4uvR7dfbg= X-Received: by 2002:a17:90a:cc3:b0:2ac:9187:201d with SMTP id 3-20020a17090a0cc300b002ac9187201dmr1445524pjt.12.1714104211886; Thu, 25 Apr 2024 21:03:31 -0700 (PDT) MIME-Version: 1.0 From: abush wang Date: Fri, 26 Apr 2024 12:03:20 +0800 Message-ID: Subject: x86-64: strlen-evex performance performance degradation compared to strlen-avx2 To: "H.J. Lu" , abushwang via Libc-alpha Content-Type: multipart/alternative; boundary="000000000000a85a080616f7fc76" X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org --000000000000a85a080616f7fc76 Content-Type: text/plain; charset="UTF-8" Hi, H.J. When I test glibc performance between 2.28 and 2.38, I found there is a performance degradation about strlen. In fact, this difference comes from __strlen_avx2 and __strlen_evex ``` 2.28 __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42 42 ENTRY (STRLEN) 2.38 __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79 79 ENTRY_P2ALIGN (STRLEN, 6) ``` This is my test: ``` #include #include #include #include #define MAX_STRINGS 100 uint64_t rdtsc() { uint32_t lo, hi; __asm__ __volatile__ ( "rdtsc" : "=a"(lo), "=d"(hi) ); return ((uint64_t)hi << 32) | lo; } int main(int argc, char *argv[]) { char *input_str[MAX_STRINGS]; size_t lengths[MAX_STRINGS]; int num_strings = 0; // Number of input strings uint64_t start_cycles, end_cycles; // Parse command line arguments and store pointers in input_str array for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) { input_str[num_strings] = argv[i]; num_strings++; } // Measure the strlen operation for each string start_cycles = rdtsc(); for (int i = 0; i < num_strings; ++i) { lengths[i] = strlen(input_str[i]); } end_cycles = rdtsc(); unsigned long long total_cycle = end_cycles - start_cycles; unsigned long long av_cycle = total_cycle / num_strings; // Print the total cycles taken for the strlen operations printf("Total cycles: %llu av cycle: %llu \n", total_cycle, av_cycle); // Print the recorded lengths printf("Lengths of the input strings:\n"); for (int i = 0; i < num_strings; ++i) { printf("String %d length: %zu\n", i, lengths[i]); } return 0; } ``` This is result ``` 2.28 ./strlen_test str1 str2 str3 str4 str5 Total cycles: 1468 av cycle: 293 Lengths of the input strings: String 0 length: 4 String 1 length: 4 String 2 length: 4 String 3 length: 4 String 4 length: 4 2.38 ./strlen_test str1 str2 str3 str4 str5 Total cycles: 1814 av cycle: 362 Lengths of the input strings: String 0 length: 4 String 1 length: 4 String 2 length: 4 String 3 length: 4 String 4 length: 4 ``` Thanks, abush --000000000000a85a080616f7fc76 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi, H.J.
When I test glibc performance between 2.28 and= 2.38,
I found there is a performance degradation about strlen.
In fa= ct, this difference comes from __strlen_avx2 and __strlen_evex

```2.28
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42<= br>42 ENTRY (STRLEN)


2.38
__strlen_evex () at ../sysdeps/x86_= 64/multiarch/strlen-evex.S:79
79 ENTRY_P2ALIGN (STRLEN, 6)
```
This is my test:
```
#include <stdio.h>
#include <stdlib= .h>
#include <stdint.h>
#include <string.h>

#de= fine MAX_STRINGS 100

uint64_t rdtsc() {
=C2=A0 =C2=A0 uint32_t lo= , hi;
=C2=A0 =C2=A0 __asm__ __volatile__ (
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 "rdtsc" : "=3Da"(lo), "=3Dd"(hi)
=C2= =A0 =C2=A0 );
=C2=A0 =C2=A0 return ((uint64_t)hi << 32) | lo;
}=

int main(int argc, char *argv[]) {
=C2=A0 =C2=A0 char *input_str= [MAX_STRINGS];
=C2=A0 =C2=A0 size_t lengths[MAX_STRINGS];
=C2=A0 =C2= =A0 int num_strings =3D 0; // Number of input strings
=C2=A0 =C2=A0 uint= 64_t start_cycles, end_cycles;

=C2=A0 =C2=A0 // Parse command line a= rguments and store pointers in input_str array
=C2=A0 =C2=A0 for (int i = =3D 1; i < argc && num_strings < MAX_STRINGS; ++i) {
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 input_str[num_strings] =3D argv[i];
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 num_strings++;
=C2=A0 =C2=A0 }

=C2=A0 =C2=A0 //= Measure the strlen operation for each string
=C2=A0 =C2=A0 start_cycles= =3D rdtsc();
=C2=A0 =C2=A0 for (int i =3D 0; i < num_strings; ++i) {=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 lengths[i] =3D strlen(input_str[i]);
=C2= =A0 =C2=A0 }
=C2=A0 =C2=A0 end_cycles =3D rdtsc();

=C2=A0 =C2=A0 = unsigned long long total_cycle =3D end_cycles - start_cycles;
=C2=A0 =C2= =A0 unsigned long long av_cycle =3D total_cycle / num_strings; =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0
=C2=A0 =C2=A0 // Print the total cycles taken for the strlen oper= ations
=C2=A0 =C2=A0 printf("Total cycles: %llu av cycle: %llu \n&q= uot;, total_cycle, av_cycle);

=C2=A0 =C2=A0 // Print the recorded le= ngths
=C2=A0 =C2=A0 printf("Lengths of the input strings:\n");=
=C2=A0 =C2=A0 for (int i =3D 0; i < num_strings; ++i) {
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 printf("String %d length: %zu\n", i, lengths= [i]);
=C2=A0 =C2=A0 }

=C2=A0 =C2=A0 return 0;
}
```

= This is result
```
2.28
./strlen_test str1 str2 str3 str4 str5
= Total cycles: 1468 av cycle: 293
Lengths of the input strings:
String= 0 length: 4
String 1 length: 4
String 2 length: 4
String 3 length= : 4
String 4 length: 4

2.38
./strlen_test str1 str2 str3 str4 = str5
Total cycles: 1814 av cycle: 362
Lengths of the input strings:String 0 length: 4
String 1 length: 4
String 2 length: 4
String = 3 length: 4
String 4 length: 4
```

Thanks,
abush
--000000000000a85a080616f7fc76--