On Fri, Apr 26, 2024 at 6:30 AM H.J. Lu wrote: > On Thu, Apr 25, 2024 at 9:03 PM abush wang wrote: > > > > Hi, H.J. > > When I test glibc performance between 2.28 and 2.38, > > I found there is a performance degradation about strlen. > > In fact, this difference comes from __strlen_avx2 and __strlen_evex > > > > ``` > > 2.28 > > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42 > > 42 ENTRY (STRLEN) > > > > > > 2.38 > > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79 > > 79 ENTRY_P2ALIGN (STRLEN, 6) > > ``` > > > > This is my test: > > ``` > > #include > > #include > > #include > > #include > > > > #define MAX_STRINGS 100 > > > > uint64_t rdtsc() { > > uint32_t lo, hi; > > __asm__ __volatile__ ( > > "rdtsc" : "=a"(lo), "=d"(hi) > > ); > > return ((uint64_t)hi << 32) | lo; > > } > > > > int main(int argc, char *argv[]) { > > char *input_str[MAX_STRINGS]; > > size_t lengths[MAX_STRINGS]; > > int num_strings = 0; // Number of input strings > > uint64_t start_cycles, end_cycles; > > > > // Parse command line arguments and store pointers in input_str array > > for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) { > > input_str[num_strings] = argv[i]; > > num_strings++; > > } > > > > // Measure the strlen operation for each string > > start_cycles = rdtsc(); > > for (int i = 0; i < num_strings; ++i) { > > lengths[i] = strlen(input_str[i]); > > } > > end_cycles = rdtsc(); > > > > unsigned long long total_cycle = end_cycles - start_cycles; > > unsigned long long av_cycle = total_cycle / num_strings; > > // Print the total cycles taken for the strlen operations > > printf("Total cycles: %llu av cycle: %llu \n", total_cycle, > av_cycle); > > > > // Print the recorded lengths > > printf("Lengths of the input strings:\n"); > > for (int i = 0; i < num_strings; ++i) { > > printf("String %d length: %zu\n", i, lengths[i]); > > } > > > > return 0; > > } > > ``` > > > > This is result > > ``` > > 2.28 > > ./strlen_test str1 str2 str3 str4 str5 > > Total cycles: 1468 av cycle: 293 > > Lengths of the input strings: > > String 0 length: 4 > > String 1 length: 4 > > String 2 length: 4 > > String 3 length: 4 > > String 4 length: 4 > > > > 2.38 > > ./strlen_test str1 str2 str3 str4 str5 > > Total cycles: 1814 av cycle: 362 > > Lengths of the input strings: > > String 0 length: 4 > > String 1 length: 4 > > String 2 length: 4 > > String 3 length: 4 > > String 4 length: 4 > > ``` > > > > Thanks, > > abush > I'm not sure how you are measuring the performance of strlen function. Are you making performance conclusion based on these 2 runs? 2.28 Total cycles: 1468 av cycle: 293 2.38 Total cycles: 1814 av cycle: 362 Please use glibc microbenchmark to see if you can reproduce perf drop. > > Which processors did you use? Sunil, Noah, can we reproduce it? > > -- > H.J. >