On Wed, Dec 5, 2018 at 9:33 AM Carlos O'Donell wrote: > > On 12/5/18 9:29 AM, H.J. Lu wrote: > > To optimize for multi-node NUMA system, I need a very fast way to identity which > > node the current process is running on. getcpu: > > > > NAME > > getcpu - determine CPU and NUMA node on which the calling thread is > > running > > > > SYNOPSIS > > #include > > > > int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache); > > > > Note: There is no glibc wrapper for this system call; see NOTES. > > > > DESCRIPTION > > The getcpu() system call identifies the processor and node on which the > > calling thread or process is currently running and writes them into the > > integers pointed to by the cpu and node arguments. The processor is a > > unique small integer identifying a CPU. The node is a unique small > > identifier identifying a NUMA node. When either cpu or node is NULL > > nothing is written to the respective pointer. > > > > returns such info. But syscall () is too slow. I'd like to add a wrapper to > > glibc. Any comments? > > I don't object to adding syscall wrappers, but your comment appears to indicate > that the glibc wrapper will be doing something more than just wrapping, what > do you have in mind? I am enclosing a patch to add getcpu. Testing on x86-64, x32 and i686 are done. I am running build-many-glibcs.py as we speak. > On some architectures we might have vdso support for this, while on others it > will be a syscall. What's wrong with just using the fastest mechanism possible > and that's it? > > I see that on x86 you have a vdso vgetcpu, and that lsl is one instruction and > loads the cpunode mask and ccpunode bits in one shot (atomic). So this should > work fine, but for all other callers I assume this will be a syscall. I need vdso getcpu to avoid syscall. I am working on a NUMA spinlock library which depends on a very fast getcpu. -- H.J.