On Wed, Dec 5, 2018 at 9:33 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 12/5/18 9:29 AM, H.J. Lu wrote:
> > To optimize for multi-node NUMA system, I need a very fast way to identity which
> > node the current process is running on.  getcpu:
> >
> > NAME
> >        getcpu  -  determine  CPU  and NUMA node on which the calling thread is
> >        running
> >
> > SYNOPSIS
> >        #include <linux/getcpu.h>
> >
> >        int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);
> >
> >        Note: There is no glibc wrapper for this system call; see NOTES.
> >
> > DESCRIPTION
> >        The getcpu() system call identifies the processor and node on which the
> >        calling thread or process is currently running and writes them into the
> >        integers pointed to by the cpu and node arguments.  The processor is  a
> >        unique  small  integer  identifying  a CPU.  The node is a unique small
> >        identifier identifying a NUMA node.  When either cpu or  node  is  NULL
> >        nothing is written to the respective pointer.
> >
> > returns such info.  But syscall () is too slow.  I'd like to add a wrapper to
> > glibc.   Any comments?
>
> I don't object to adding syscall wrappers, but your comment appears to indicate
> that the glibc wrapper will be doing something more than just wrapping, what
> do you have in mind?

I am enclosing a patch to add getcpu.  Testing on x86-64, x32 and i686
are done.  I am running build-many-glibcs.py as we speak.

> On some architectures we might have vdso support for this, while on others it
> will be a syscall. What's  wrong with just using the fastest mechanism possible
> and that's it?
>
> I see that on x86 you have a vdso vgetcpu, and that lsl is one instruction and
> loads the cpunode mask and ccpunode bits in one shot (atomic). So this should
> work fine, but for all other callers I assume this will be a syscall.

I need vdso getcpu to avoid syscall.  I am working on a NUMA spinlock
library which depends on a very fast getcpu.

--
H.J.