On Thu, Apr 25, 2024 at 9:03 PM abush wang <abushwangs@gmail.com> wrote:
>
> Hi, H.J.
> When I test glibc performance between 2.28 and 2.38,
> I found there is a performance degradation about strlen.
> In fact, this difference comes from __strlen_avx2 and __strlen_evex
>
> ```
> 2.28
> __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
> 42 ENTRY (STRLEN)
>
>
> 2.38
> __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
> 79 ENTRY_P2ALIGN (STRLEN, 6)
> ```
>
> This is my test:
> ```
> #include <stdio.h>
> #include <stdlib.h>
> #include <stdint.h>
> #include <string.h>
>
> #define MAX_STRINGS 100
>
> uint64_t rdtsc() {
> uint32_t lo, hi;
> __asm__ __volatile__ (
> "rdtsc" : "=a"(lo), "=d"(hi)
> );
> return ((uint64_t)hi << 32) | lo;
> }
>
> int main(int argc, char *argv[]) {
> char *input_str[MAX_STRINGS];
> size_t lengths[MAX_STRINGS];
> int num_strings = 0; // Number of input strings
> uint64_t start_cycles, end_cycles;
>
> // Parse command line arguments and store pointers in input_str array
> for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
> input_str[num_strings] = argv[i];
> num_strings++;
> }
>
> // Measure the strlen operation for each string
> start_cycles = rdtsc();
> for (int i = 0; i < num_strings; ++i) {
> lengths[i] = strlen(input_str[i]);
> }
> end_cycles = rdtsc();
>
> unsigned long long total_cycle = end_cycles - start_cycles;
> unsigned long long av_cycle = total_cycle / num_strings;
> // Print the total cycles taken for the strlen operations
> printf("Total cycles: %llu av cycle: %llu \n", total_cycle, av_cycle);
>
> // Print the recorded lengths
> printf("Lengths of the input strings:\n");
> for (int i = 0; i < num_strings; ++i) {
> printf("String %d length: %zu\n", i, lengths[i]);
> }
>
> return 0;
> }
> ```
>
> This is result
> ```
> 2.28
> ./strlen_test str1 str2 str3 str4 str5
> Total cycles: 1468 av cycle: 293
> Lengths of the input strings:
> String 0 length: 4
> String 1 length: 4
> String 2 length: 4
> String 3 length: 4
> String 4 length: 4
>
> 2.38
> ./strlen_test str1 str2 str3 str4 str5
> Total cycles: 1814 av cycle: 362
> Lengths of the input strings:
> String 0 length: 4
> String 1 length: 4
> String 2 length: 4
> String 3 length: 4
> String 4 length: 4
> ```
>
> Thanks,
> abush
I'm not sure how you are measuring the performance of strlen function.
Are you making performance conclusion based on these 2 runs?
2.28
Total cycles: 1468 av cycle: 293
2.38
Total cycles: 1814 av cycle: 362
Please use glibc microbenchmark to see if you can reproduce perf drop.
Which processors did you use? Sunil, Noah, can we reproduce it?
--
H.J.