unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/11] Library OS support
@ 2019-09-11 21:03 Isaku Yamahata
  2019-09-11 21:03 ` [RFC PATCH 01/11] x86-64, elf: make elf_machine_lazy_rel() ignore R_X86_64_NONE Isaku Yamahata
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:03 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This patch is to add Library OS(LibOS in short) to glibc.
This is the first version of patch series to support LibOS.
The feedback is more than welcome.
I'll give a remote presentation at GNU cauldron 2019 on 13 September.


Why LibOS support?
==================
Recently there are many Library OS projects and some of them have been already
deployed in the fields. Typically they uses modified libc to get control
instead of kernel to process system call. Such modifications are done
independently by each project. They are making duplicated effort.
This effort is to upstream those common modification to glibc upstream.
Also some projects adapt other libc implementation for some reasons.
With this LibOS support, glibc can gain more user base.


What is LibOS?
==============
LibOS implements OS functionalities as library that executes in the application
address space. To invoke its entry point, usually function call is used instead
of special instruction like syscall instruction.
The common use cases are, container(unikernel or sand box),
compatibility(e.g. SGX support), and/or performance(e.g. avoiding
kernel overhead).
There are a lot of academic papers and industrial white papers.


What does LibOS?
================
There are common behaviors of LibOS in general.

bootup:
As boot up, LibOS is booted by kernel and then LibOS takes control.
LibOS loads interpreter(ld.so) and target application binary instead of kernel.

several hooks:
small number of hooks are needed to inject special logic.
For example, LibOS wants to know when ld.so (un)loads shared library so that
it can inject extra logic.

heap allocation:
LibOS has its own virtual address layout so that it has restriction
for heap space. So heap allocator of libc needs to be aware of it.
If heap allocator requests too large area, it results in ENOMEM.

Thread Control Block:
LibOS also needs to have its own thread control block in addition to
pthread tcb(tcbhead_t).
one way is modify tcbhead_t at source code level Another way is to use %gs.

Hooking system call:
For LibOS to take control instead of kernel on system call, system call
instruction is replaced with function call to LibOS.
There are two points. how to identify system call instruction and
how to replace it with function call.

The approach varies among LibOSes. There are two major ones.
- modify libc at source code level
  prepare shared library specialized for LibOS and use them instead of
  those installed on the system.
  This assumes that executables are usually dynamically liked and shared
  library (ld.so, libc.so libpthread.so etc.) can be easily replaced.
  The downside is, this can't be applied to statically linked binary easily.
- analyze opcodes and replace system call instruction somehow
  It can be done at loading time, execution time or offline.
  This technique applies to both dynamically/statically linked binary.
  The downside is, such logic is complex and fragile. The code tends to
  be huge. The existing binary analysis framework can be used.


High level direction
====================
Single version of binary:
For library maintenance, the single version of binary should serve both
native case(traditional tools stack) and LibOS case.
We shouldn't have multiple versions of binary.
e.g. version for native, version for LibOS X and more.
multiple version won't scale and the maintenance of multiple version
won't be sustainable.
This also means the approach should be agnostic to LibOS.

Minimize maintenance const in glibc/overhead for native case:
For maintainability in glibc, the change to glibc should be minimized and
the complexity/burden should be put on LibOSes when trade off is needed.
Also the overhead(cpu cycles and memory space) for native case should be
minimized because the largest user base is native case.


Proposal
========
adding initialization hooks:
add stub functions of weak symbol so that LibOS can interpose those functions.

heap allocator:
introduce new tunables to specify heap size so that LibOS can tell heap size
to heap allocator

hp-timing(profiling rtld):
introduce weak symbol to disable rtld profiling

new .note:
introduce new note which describes LibOS support. Then LibOS can easily
check if it supports LibOS or not.

hooking system call:
For LibOS to use binary editing, create a list of system call instructions and
adds nop instruction for binary editing.
introduce .libos.instruction.syscall section for it.

x86 instruction has variable length and syscall instruction has 2 byte length.
4 byte jump/call requires 5 bytes. So it complicates binary editing.
To make binary editing easier nop instructions are added around syscall
instruction.

Alternatives for system call hooking:
- weak symbol function
  The extrem option is, replace system call instruction with syscall function.
  and make syscall function as weak symbol.
  Then LibOS needs to hook only syscall function.
  This may incur function call overhead on native case.
- SDT marker
  SDT marker is optimized for its usage. It's not suitable for hooking
  system call instructions.
  For example, only single nop is inserted and the size of .note is 32+ bytes
  per marker.


Analysis and benchmark
======================
The number of syscall instruction and space overhead:
I counted the number of syscall instruction and space overhead
on my environment.
Although the actual number may vary depending on the environment,
the result won't be greatly different.

Library              |File size |# of   |nop    |List size| space
                     |(stripped)|syscall|(N * 3)|(N * 16) |   sum
                     |          |       |bytes  |bytes    | bytes
---------------------+----------+-------+-------+---------+------
Libc.so.6            |      1.7M|    701|   2013|    11216| 13329
libpthread.so.0      |      112K|    208|    624|     3328|  3958
ld-linux-x86-64.so.2 |      162K|     36|    108|      576|   684
librt.so.1           |       31K|     29|     87|      464|   551
libnal.so            |       15K|      3|      9|       48|    57


overhead of function call or nop:
I measured the time of N-Loop of gettid system call.
The difference was less than OS noise so that I couldn't get meaningful
result.
Probably the size of instructions are small enough so that they are
all stored in cpu icache. With real applications the effect might be
different.

- syscal(SYS_gettid); <function call>
- asm("syscall\n": "=a"(ret): "0"(SYS_gettid)); <base line>
- asm("syscall\n nop * <3>\n": "=a"(ret): "0"(SYS_gettid));
- asm("syscall\n nop * <10>\n": "=a"(ret): "0"(SYS_gettid));
- asm("jmp 1f\n nop *<1>\n 1:\n syscall\n": "=a"(ret): "0"(SYS_gettid));
- asm("jmp 1f\n nop *<8>\n 1:\n syscall\n": "=a"(ret): "0"(SYS_gettid));


Impact on LibOSes
=================
LibOS may need update to implement new logic of
  - inject symbols for hooking functions
  - hook syscall instruction somehow based on this proposal
    Although it's up to LibOS how/when to do it, LibOS may needs update.
If LibOS wants to stick to their own way without glibc help, it's okay.

This change to glibc may break the exiting heuristics of LibOS.
Some LibOSes expect specific opcode sequence which includes syscall
instruction for binary editing. This proposal changes such sequence.
So such heuristic may break.
glibc doesn't guarantee anything about such opcode sequence and this shows
the weak point of such heuristic.
This proposal introduces explicit contract between glibc and LibOS.


full disclosure
I'm working on Graphene LibOS project[1]. The above discussion is generic
and applicable to LibOS projects. At least I tried.
But I may be biased. The feedback from other LibOS projects is also more
than welcome.
[1] https://grapheneproject.io

Isaku Yamahata (11):
  x86-64, elf: make elf_machine_lazy_rel() ignore R_X86_64_NONE
  elf: add macro to define note section for LibOS
  elf: define note section for LibOS
  elf: add stub functions for LibOS support
  elf: add hook, __libos_map_library to dl-open.c
  elf/rtld: introduce runtime option to disable HP_TIMING_INLINE
  malloc: make arena size configurable on startup
  x86-64: replace syscall instruction with SYSCALL_INST macro
  x86-64: add nop instruction after syscall instrunction
  x86-64: make the number of nops after syscall configurable
  benchtests: simple benchmark to measure nop effects

 benchtests/bench-nop.c                        | 128 ++++++++++++++++++
 configure.ac                                  |  19 +++
 elf/Makefile                                  |   3 +-
 elf/Versions                                  |   6 +
 elf/dl-load.c                                 |  22 +--
 elf/dl-tunables.list                          |   5 +
 elf/libos.c                                   |  36 +++++
 elf/libos.h                                   |  98 ++++++++++++++
 elf/rtld.c                                    |  20 ++-
 malloc/arena.c                                |  17 +--
 malloc/malloc.c                               |  25 ++++
 malloc/malloc.h                               |   1 +
 .../unix/sysv/linux/x86_64/____longjmp_chk.S  |   2 +-
 .../unix/sysv/linux/x86_64/__start_context.S  |   2 +-
 sysdeps/unix/sysv/linux/x86_64/cancellation.S |   2 +-
 sysdeps/unix/sysv/linux/x86_64/clone.S        |   4 +-
 sysdeps/unix/sysv/linux/x86_64/getcontext.S   |   4 +-
 sysdeps/unix/sysv/linux/x86_64/setcontext.S   |   2 +-
 sysdeps/unix/sysv/linux/x86_64/sigaction.c    |   2 +-
 sysdeps/unix/sysv/linux/x86_64/swapcontext.S  |   4 +-
 sysdeps/unix/sysv/linux/x86_64/syscall.S      |   2 +-
 sysdeps/unix/sysv/linux/x86_64/sysdep.h       |  50 +++++--
 sysdeps/unix/sysv/linux/x86_64/vfork.S        |   2 +-
 sysdeps/unix/sysv/linux/x86_64/x32/times.c    |   2 +-
 sysdeps/x86_64/dl-machine.h                   |   2 +
 sysdeps/x86_64/nptl/tls.h                     |   2 +-
 26 files changed, 418 insertions(+), 44 deletions(-)
 create mode 100644 benchtests/bench-nop.c
 create mode 100644 elf/libos.c
 create mode 100644 elf/libos.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 01/11] x86-64, elf: make elf_machine_lazy_rel() ignore R_X86_64_NONE
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
@ 2019-09-11 21:03 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 02/11] elf: add macro to define note section for LibOS Isaku Yamahata
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:03 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This adds a check of R_X86_64_NONE to ignore it on lazy relocation to
elf_machine_lazy_rel().
LibOS may also do relocation and change the relocation entry to NONE.
So elf_machine_lazy_rel() may see R_X86_64_NONE in this code path in
addition to elf_machine_rela().

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 sysdeps/x86_64/dl-machine.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index 95a13b35b5..50db45c082 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -578,6 +578,8 @@ elf_machine_lazy_rel (struct link_map *map,
 	value = ((ElfW(Addr) (*) (void)) value) ();
       *reloc_addr = value;
     }
+  else if (__glibc_unlikely (r_type == R_X86_64_NONE))
+    return;
   else
     _dl_reloc_bad_type (map, r_type, 1);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 02/11] elf: add macro to define note section for LibOS
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
  2019-09-11 21:03 ` [RFC PATCH 01/11] x86-64, elf: make elf_machine_lazy_rel() ignore R_X86_64_NONE Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 03/11] elf: " Isaku Yamahata
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This patch defines macro to define note section for LibOS.
Later patches will use those macros.

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 elf/libos.h | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)
 create mode 100644 elf/libos.h

diff --git a/elf/libos.h b/elf/libos.h
new file mode 100644
index 0000000000..0610c212ff
--- /dev/null
+++ b/elf/libos.h
@@ -0,0 +1,84 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _LIBOS_H
+#define _LIBOS_H    1
+
+#include <link.h>
+
+#define STRINGIFY(name) STRINGIFY_1(name)
+#define STRINGIFY_1(name)   #name
+
+#define NT_LIBOS    0x4f62694c    /* "LibO" in little endian */
+#define ELF_NOTE_LIBOS  "LibOS"
+#define LIBOS_NOTE_VERSION  1
+#define LIBOS_NOTE_FUNCTION 2
+#define LIBOS_NOTE_VARIABLE 3
+
+struct libos_note_desc {
+    ElfW(Word) name_sz;
+    ElfW(Word) type;
+    ElfW(Addr) ptr;
+    ElfW(Word) sz;
+    /* name */
+};
+
+#if __SIZEOF_PTRDIFF_T__  == 8
+#define LIBOS_NOTES(n_name, type, ptr, sz, name)                \
+    __asm__ (                                                   \
+        "   .pushsection .note.libos." n_name ",\"a\",@note\n"  \
+        "   .balign 8\n"                                        \
+        "   .long 1f - 0f\n"    /* name length */               \
+        "   .long 5f - 2f\n"    /* desc size */                 \
+        "   .long " STRINGIFY(NT_LIBOS) "\n"  /* note type*/    \
+        "0: .asciz \"libos\"\n"                                 \
+        "1:\n"                                                  \
+        "   .balign 8\n"                                        \
+        "2:\n"                                                  \
+        "   .long 4f - 3f\n"                                    \
+        "   .long " STRINGIFY(type) "\n"                        \
+        "   .quad " STRINGIFY(ptr) "\n"                         \
+        "   .quad " STRINGIFY(sz) "\n"                          \
+        "3: .asciz \"" name "\"\n"                              \
+        "4:\n"                                                  \
+        "   .balign 8\n"                                        \
+        "5:\n"                                                  \
+        "   .popsection\n")
+#elif __SIZEOF_PTRDIFF_T__  == 4
+#define LIBOS_NOTES(n_name, type, ptr, sz, name)                \
+    __asm__ (                                                   \
+        "   .pushsection .note.libos." n_name ",\"a\",@note\n"  \
+        "   .balign 4\n"                                        \
+        "   .long 1f - 0f\n"    /* name length */               \
+        "   .long 5f - 2f\n"    /* desc size */                 \
+        "   .long " STRINGIFY(NT_LIBOS) "\n"  /* note type*/    \
+        "0: .asciz \"libos\"\n"                                 \
+        "1:\n"                                                  \
+        "   .balign 4\n"                                        \
+        "2:\n"                                                  \
+        "   .long 4f - 3f\n"                                    \
+        "   .long " STRINGIFY(type) "\n"                        \
+        "   .long " STRINGIFY(ptr) "\n"                         \
+        "   .long " STRINGIFY(sz) "\n"                          \
+        "3: .asciz \"" name "\"\n"                              \
+        "4:\n"                                                  \
+        "   .balign 4\n"                                        \
+        "5:\n"                                                  \
+        "   .popsection\n")
+#endif
+
+#endif /* libos.h */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 03/11] elf: define note section for LibOS
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
  2019-09-11 21:03 ` [RFC PATCH 01/11] x86-64, elf: make elf_machine_lazy_rel() ignore R_X86_64_NONE Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 02/11] elf: add macro to define note section for LibOS Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 04/11] elf: add stub functions for LibOS support Isaku Yamahata
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

Define note section for LibOS which describes glibc release/version
and abi for LibOS.
LibOS can easily check it when loading ld.so.

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 elf/Makefile |  3 ++-
 elf/Versions |  3 +++
 elf/libos.c  | 26 ++++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 elf/libos.c

diff --git a/elf/Makefile b/elf/Makefile
index d470e41402..275105b0de 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -33,7 +33,8 @@ dl-routines	= $(addprefix dl-,load lookup object reloc deps hwcaps \
 				  runtime init fini debug misc \
 				  version profile tls origin scope \
 				  execstack open close trampoline \
-				  exception sort-maps)
+				  exception sort-maps) \
+		libos
 ifeq (yes,$(use-ldconfig))
 dl-routines += dl-cache
 endif
diff --git a/elf/Versions b/elf/Versions
index 3b09901f6c..b9b4ae168a 100644
--- a/elf/Versions
+++ b/elf/Versions
@@ -78,5 +78,8 @@ ld {
 
     # Set value of a tunable.
     __tunable_get_val;
+
+    # libos
+    __libos_release; __libos_version; __libos_abi;
   }
 }
diff --git a/elf/libos.c b/elf/libos.c
new file mode 100644
index 0000000000..8fe3df4944
--- /dev/null
+++ b/elf/libos.c
@@ -0,0 +1,26 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <libos.h>
+#include "../version.h"
+
+const char * __libos_release = RELEASE;
+const char * __libos_version = VERSION;
+const uint64_t __libos_abi = 0;
+LIBOS_NOTES("versions", LIBOS_NOTE_VERSION, __libos_release, __WORDSIZE / 8, "release");
+LIBOS_NOTES("versions", LIBOS_NOTE_VERSION, __libos_version, __WORDSIZE / 8, "version");
+LIBOS_NOTES("versions", LIBOS_NOTE_VERSION, __libos_abi, 8, "abi");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 04/11] elf: add stub functions for LibOS support
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (2 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 03/11] elf: " Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 05/11] elf: add hook, __libos_map_library to dl-open.c Isaku Yamahata
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This patch add a stub function for LibOS support which will
be used by later patch.
This impact on traditional run-time is single stub function as
weak symbol so that LibOS can inject the function at runtime.
As statically linked case, dynamic symbol interposing isn't usable.
For such case, the symbol address is recored in note section
and nop instructions are added so that LibOS can overwrite
jump instruction.

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 elf/Versions |  2 ++
 elf/libos.c  | 10 ++++++++++
 elf/libos.h  | 14 ++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/elf/Versions b/elf/Versions
index b9b4ae168a..619676afef 100644
--- a/elf/Versions
+++ b/elf/Versions
@@ -81,5 +81,7 @@ ld {
 
     # libos
     __libos_release; __libos_version; __libos_abi;
+    # stub symbols for libos support
+    __libos_map_library;
   }
 }
diff --git a/elf/libos.c b/elf/libos.c
index 8fe3df4944..8f6036283f 100644
--- a/elf/libos.c
+++ b/elf/libos.c
@@ -15,6 +15,8 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <errno.h>
+
 #include <libos.h>
 #include "../version.h"
 
@@ -24,3 +26,11 @@ const uint64_t __libos_abi = 0;
 LIBOS_NOTES("versions", LIBOS_NOTE_VERSION, __libos_release, __WORDSIZE / 8, "release");
 LIBOS_NOTES("versions", LIBOS_NOTE_VERSION, __libos_version, __WORDSIZE / 8, "version");
 LIBOS_NOTES("versions", LIBOS_NOTE_VERSION, __libos_abi, 8, "abi");
+
+int __attribute__((weak)) __libos_map_library (int fd, const char * name,
+                                               unsigned long load_address)
+{
+  NOP_FILL;
+  return 0;
+}
+LIBOS_NOTES("functions", LIBOS_NOTE_FUNCTION, __libos_map_library, 0, "__libos_map_library");
diff --git a/elf/libos.h b/elf/libos.h
index 0610c212ff..6624e8d3a7 100644
--- a/elf/libos.h
+++ b/elf/libos.h
@@ -81,4 +81,18 @@ struct libos_note_desc {
         "   .popsection\n")
 #endif
 
+#ifdef __x86_64__
+  /* 16 bytes space for 8 bytes offset jump */
+# define NOP_FILL                                               \
+  do {                                                          \
+      /* ".nops 16, 1" requires relatively recent gas */        \
+      __asm__ volatile ("nop;nop;nop;nop;nop;nop;nop;nop;\n");  \
+      __asm__ volatile ("nop;nop;nop;nop;nop;nop;nop;nop;\n");  \
+  } while (0)
+#else
+# define NOP_FILL   /* nothing */
+#endif
+
+extern int __libos_map_library (int fd, const char * name, unsigned long load_address);
+
 #endif /* libos.h */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 05/11] elf: add hook, __libos_map_library to dl-open.c
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (3 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 04/11] elf: add stub functions for LibOS support Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 06/11] elf/rtld: introduce runtime option to disable HP_TIMING_INLINE Isaku Yamahata
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This patch adds a hook on loading shared library.
The impact on traditional run-time is stub function call.
LibOS can inject its own symbol and interact with debugger.

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 elf/dl-load.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/elf/dl-load.c b/elf/dl-load.c
index 5abeb867f1..6a811f41e0 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -72,6 +72,7 @@ struct filebuf
 #include <dl-sysdep-open.h>
 #include <dl-prop.h>
 #include <not-cancel.h>
+#include <libos.h>
 
 #include <endian.h>
 #if BYTE_ORDER == BIG_ENDIAN
@@ -1293,15 +1294,6 @@ cannot enable executable stack as shared object requires");
   if (l->l_tls_initimage != NULL)
     l->l_tls_initimage = (char *) l->l_tls_initimage + l->l_addr;
 
-  /* We are done mapping in the file.  We no longer need the descriptor.  */
-  if (__glibc_unlikely (__close_nocancel (fd) != 0))
-    {
-      errstring = N_("cannot close file descriptor");
-      goto call_lose_errno;
-    }
-  /* Signal that we closed the file.  */
-  fd = -1;
-
   /* If this is ET_EXEC, we should have loaded it as lt_executable.  */
   assert (type != ET_EXEC || l->l_type == lt_executable);
 
@@ -1397,6 +1389,18 @@ cannot enable executable stack as shared object requires");
     }
 #endif
 
+  /* register the library to libos */
+  __libos_map_library(fd, l->l_name, l->l_addr);
+
+  /* We are done mapping in the file.  We no longer need the descriptor.  */
+  if (__glibc_unlikely (__close_nocancel (fd) != 0))
+    {
+      errstring = N_("cannot close file descriptor");
+      goto call_lose_errno;
+    }
+  /* Signal that we closed the file.  */
+  fd = -1;
+
   return l;
 }
 \f
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 06/11] elf/rtld: introduce runtime option to disable HP_TIMING_INLINE
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (4 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 05/11] elf: add hook, __libos_map_library to dl-open.c Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 07/11] malloc: make arena size configurable on startup Isaku Yamahata
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This patch introduce runtime option, __hp_timing_disabled with
weak symbol to disable rtld profiling with HP_TIMING_INLINE.
Because some LibOS doesn't suport rdtsc/rdtscp (e.g. SGX enclave),
this allows LibOS to disable HP_TIMING profiling on startup.
The impact on traditional runtime is "if (__hp_timing_disabled)".

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 elf/Versions |  1 +
 elf/rtld.c   | 20 +++++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/elf/Versions b/elf/Versions
index 619676afef..d7d12d7aba 100644
--- a/elf/Versions
+++ b/elf/Versions
@@ -83,5 +83,6 @@ ld {
     __libos_release; __libos_version; __libos_abi;
     # stub symbols for libos support
     __libos_map_library;
+    __hp_timing_disabled;
   }
 }
diff --git a/elf/rtld.c b/elf/rtld.c
index c9490ff694..8d759dfa8c 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -46,6 +46,8 @@
 
 #include <assert.h>
 
+#include "libos.h"
+
 /* Only enables rtld profiling for architectures which provides non generic
    hp-timing support.  The generic support requires either syscall
    (clock_gettime), which will incur in extra overhead on loading time.
@@ -58,9 +60,18 @@
 # define RTLD_TIMING_SET(var, value) (var) = (value)
 # define RTLD_TIMING_REF(var)        &(var)
 
+bool __hp_timing_disabled __attribute__((weak))= false;
+# define HP_TIMING_DISABLED __hp_timing_disabled
+LIBOS_NOTES("variables", LIBOS_NOTE_VARIABLE,
+            __hp_timing_disabled, 1, "__hp_timing_disabled");
+
 static inline void
 rtld_timer_start (hp_timing_t *var)
 {
+  if (HP_TIMING_DISABLED) {
+    memset(var, 0, sizeof(*var));
+    return;
+  }
   HP_TIMING_NOW (*var);
 }
 
@@ -68,6 +79,8 @@ static inline void
 rtld_timer_stop (hp_timing_t *var, hp_timing_t start)
 {
   hp_timing_t stop;
+  if (HP_TIMING_DISABLED)
+    return;
   HP_TIMING_NOW (stop);
   HP_TIMING_DIFF (*var, start, stop);
 }
@@ -76,6 +89,8 @@ static inline void
 rtld_timer_accum (hp_timing_t *sum, hp_timing_t start)
 {
   hp_timing_t stop;
+  if (HP_TIMING_DISABLED)
+    return;
   rtld_timer_stop (&stop, start);
   HP_TIMING_ACCUM_NT(*sum, stop);
 }
@@ -87,6 +102,7 @@ rtld_timer_accum (hp_timing_t *sum, hp_timing_t start)
 # define rtld_timer_start(var)
 # define rtld_timer_stop(var, start)
 # define rtld_timer_accum(sum, start)
+# define HP_TIMING_DISABLED false
 #endif
 
 /* Avoid PLT use for our local calls at startup.  */
@@ -2748,6 +2764,8 @@ static void
 print_statistics_item (const char *title, hp_timing_t time,
 		       hp_timing_t total)
 {
+  if (HP_TIMING_DISABLED)
+    return;
   char cycles[HP_TIMING_PRINT_SIZE];
   HP_TIMING_PRINT (cycles, sizeof (cycles), time);
 
@@ -2779,7 +2797,7 @@ __attribute ((noinline))
 print_statistics (const hp_timing_t *rtld_total_timep)
 {
 #if HP_TIMING_INLINE
-  {
+  if (!HP_TIMING_DISABLED) {
     char cycles[HP_TIMING_PRINT_SIZE];
     HP_TIMING_PRINT (cycles, sizeof (cycles), *rtld_total_timep);
     _dl_debug_printf ("\nruntime linker statistics:\n"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 07/11] malloc: make arena size configurable on startup
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (5 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 06/11] elf/rtld: introduce runtime option to disable HP_TIMING_INLINE Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-12  1:03   ` DJ Delorie
  2019-09-11 21:04 ` [RFC PATCH 08/11] x86-64: replace syscall instruction with SYSCALL_INST macro Isaku Yamahata
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This patch is mainly to show the idea and to get feedback.
Probably there might be a better implementation.

This patch introduce tunable to makes heap allocator friendly
to LibOS.
It allows a way for the LibOS to adjust allocation size.
LibOS may have its own virtual address space layout (duto software
or hardware e.g. SGX) and as a result, it may have the limited heap.
If heap allocator tries to allocate too large memory, ENOMEM on
startup.

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 elf/dl-tunables.list |  5 +++++
 malloc/arena.c       | 17 +++++++----------
 malloc/malloc.c      | 25 +++++++++++++++++++++++++
 malloc/malloc.h      |  1 +
 4 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 525c3767b5..31aaecd8bd 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -64,6 +64,11 @@ glibc {
       env_alias: MALLOC_MMAP_MAX_
       security_level: SXID_IGNORE
     }
+    heap_max {
+      type: INT_32
+      env_alias: MALLOC_HEAP_MAX_
+      security_level: SXID_IGNORE
+    }
     arena_max {
       type: SIZE_T
       env_alias: MALLOC_ARENA_MAX
diff --git a/malloc/arena.c b/malloc/arena.c
index a32eb403ec..acd03c53eb 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -26,14 +26,7 @@
 
 /* Compile-time constants.  */
 
-#define HEAP_MIN_SIZE (32 * 1024)
-#ifndef HEAP_MAX_SIZE
-# ifdef DEFAULT_MMAP_THRESHOLD_MAX
-#  define HEAP_MAX_SIZE (2 * DEFAULT_MMAP_THRESHOLD_MAX)
-# else
-#  define HEAP_MAX_SIZE (1024 * 1024) /* must be a power of two */
-# endif
-#endif
+#define HEAP_MAX_SIZE	(mp_.heap_max)
 
 /* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps
    that are dynamically created for multi-threaded programs.  The
@@ -226,6 +219,7 @@ TUNABLE_CALLBACK (__name) (tunable_val_t *valp)				      \
 
 TUNABLE_CALLBACK_FNDECL (set_mmap_threshold, size_t)
 TUNABLE_CALLBACK_FNDECL (set_mmaps_max, int32_t)
+TUNABLE_CALLBACK_FNDECL (set_heap_max, int32_t)
 TUNABLE_CALLBACK_FNDECL (set_top_pad, size_t)
 TUNABLE_CALLBACK_FNDECL (set_perturb_byte, int32_t)
 TUNABLE_CALLBACK_FNDECL (set_trim_threshold, size_t)
@@ -316,6 +310,7 @@ ptmalloc_init (void)
   TUNABLE_GET (mmap_threshold, size_t, TUNABLE_CALLBACK (set_mmap_threshold));
   TUNABLE_GET (trim_threshold, size_t, TUNABLE_CALLBACK (set_trim_threshold));
   TUNABLE_GET (mmap_max, int32_t, TUNABLE_CALLBACK (set_mmaps_max));
+  TUNABLE_GET (heap_max, int32_t, TUNABLE_CALLBACK (set_heap_max));
   TUNABLE_GET (arena_max, size_t, TUNABLE_CALLBACK (set_arena_max));
   TUNABLE_GET (arena_test, size_t, TUNABLE_CALLBACK (set_arena_test));
 # if USE_TCACHE
@@ -365,6 +360,8 @@ ptmalloc_init (void)
                     __libc_mallopt (M_MMAP_MAX, atoi (&envline[10]));
                   else if (memcmp (envline, "ARENA_MAX", 9) == 0)
                     __libc_mallopt (M_ARENA_MAX, atoi (&envline[10]));
+                  else if (memcmp (envline, "HEAP_MAX_", 9) == 0)
+                    __libc_mallopt (M_HEAP_MAX, atoi (&envline[10]));
                 }
               break;
             case 10:
@@ -472,7 +469,7 @@ new_heap (size_t size, size_t top_pad)
      mapping (on Linux, this is the case for all non-writable mappings
      anyway). */
   p2 = MAP_FAILED;
-  if (aligned_heap_area)
+  if (aligned_heap_area && !mp_.heap_max_specified)
     {
       p2 = (char *) MMAP (aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE,
                           MAP_NORESERVE);
@@ -493,7 +490,7 @@ new_heap (size_t size, size_t top_pad)
           ul = p2 - p1;
           if (ul)
             __munmap (p1, ul);
-          else
+          else if (!mp_.heap_max_specified)
             aligned_heap_area = p2 + HEAP_MAX_SIZE;
           __munmap (p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul);
         }
diff --git a/malloc/malloc.c b/malloc/malloc.c
index fe973770a6..f776bb2452 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -981,6 +981,13 @@ int      __posix_memalign(void **, size_t, size_t);
 #define DEFAULT_MMAP_MAX       (65536)
 #endif
 
+#ifdef HEAP_MAX_SIZE
+# define DEFAULT_HEAP_MAX_SIZE	(2 * DEFAULT_MMAP_THREASHOLD_MAX)
+#else
+# define DEFAULT_HEAP_MAX_SIZE	(1024 * 1024)	/* must be power of two */
+#endif
+#define HEAP_MIN_SIZE		(32 * 1024)
+
 #include <malloc.h>
 
 #ifndef RETURN_ADDRESS
@@ -1713,6 +1720,8 @@ struct malloc_par
      it manually, at which point we need to disable any
      dynamic behavior. */
   int no_dyn_threshold;
+  int heap_max;
+  int heap_max_specified;
 
   /* Statistics */
   INTERNAL_SIZE_T mmapped_mem;
@@ -1766,6 +1775,8 @@ static struct malloc_par mp_ =
   .top_pad = DEFAULT_TOP_PAD,
   .n_mmaps_max = DEFAULT_MMAP_MAX,
   .mmap_threshold = DEFAULT_MMAP_THRESHOLD,
+  .heap_max = DEFAULT_HEAP_MAX_SIZE,
+  .heap_max_specified = 0,
   .trim_threshold = DEFAULT_TRIM_THRESHOLD,
 #define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))
   .arena_test = NARENAS_FROM_NCORES (1)
@@ -5052,6 +5063,16 @@ do_set_mmaps_max (int32_t value)
   return 1;
 }
 
+static __always_inline int
+do_set_heap_max (int32_t value)
+{
+  LIBC_PROBE (memory_mallopt_heap_max, 3, value, mp_.heap_max,
+	      mp_.no_dyn_threshold);
+  mp_.heap_max = value;
+  mp_.heap_max_specified = 1;
+  return 1;
+}
+
 static __always_inline int
 do_set_mallopt_check (int32_t value)
 {
@@ -5166,6 +5187,10 @@ __libc_mallopt (int param_number, int value)
       do_set_mmaps_max (value);
       break;
 
+    case M_HEAP_MAX:
+      do_set_heap_max (value);
+      break;
+
     case M_CHECK_ACTION:
       do_set_mallopt_check (value);
       break;
diff --git a/malloc/malloc.h b/malloc/malloc.h
index 70d8282bdc..30b93281aa 100644
--- a/malloc/malloc.h
+++ b/malloc/malloc.h
@@ -124,6 +124,7 @@ extern struct mallinfo mallinfo (void) __THROW;
 #define M_PERTURB           -6
 #define M_ARENA_TEST        -7
 #define M_ARENA_MAX         -8
+#define M_HEAP_MAX          -9
 
 /* General SVID/XPG interface to tunable parameters. */
 extern int mallopt (int __param, int __val) __THROW;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 08/11] x86-64: replace syscall instruction with SYSCALL_INST macro
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (6 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 07/11] malloc: make arena size configurable on startup Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 09/11] x86-64: add nop instruction after syscall instrunction Isaku Yamahata
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This patch is a preparation so that syscall instruction can be
replaced easily with other instructions.
This patch doesn't change resulted instructions. the next patch
introduces such change.

LibOS hooks system call and redirect the control to it so that
it can handle system call instead of kernel.
This patch make such change easier.

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 .../unix/sysv/linux/x86_64/____longjmp_chk.S  |  2 +-
 .../unix/sysv/linux/x86_64/__start_context.S  |  2 +-
 sysdeps/unix/sysv/linux/x86_64/cancellation.S |  2 +-
 sysdeps/unix/sysv/linux/x86_64/clone.S        |  4 ++--
 sysdeps/unix/sysv/linux/x86_64/getcontext.S   |  4 ++--
 sysdeps/unix/sysv/linux/x86_64/setcontext.S   |  2 +-
 sysdeps/unix/sysv/linux/x86_64/sigaction.c    |  2 +-
 sysdeps/unix/sysv/linux/x86_64/swapcontext.S  |  4 ++--
 sysdeps/unix/sysv/linux/x86_64/syscall.S      |  2 +-
 sysdeps/unix/sysv/linux/x86_64/sysdep.h       | 22 ++++++++++++-------
 sysdeps/unix/sysv/linux/x86_64/vfork.S        |  2 +-
 sysdeps/unix/sysv/linux/x86_64/x32/times.c    |  2 +-
 sysdeps/x86_64/nptl/tls.h                     |  2 +-
 13 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S b/sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S
index 568bd66dc6..0b1f3ff075 100644
--- a/sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S
+++ b/sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S
@@ -89,7 +89,7 @@ ENTRY(____longjmp_chk)
 	xorl	%edi, %edi
 	lea	-sizeSS(%rsp), %RSI_LP
 	movl	$__NR_sigaltstack, %eax
-	syscall
+	SYSCALL_INST
 	/* Without working sigaltstack we cannot perform the test.  */
 	testl	%eax, %eax
 	jne	.Lok2
diff --git a/sysdeps/unix/sysv/linux/x86_64/__start_context.S b/sysdeps/unix/sysv/linux/x86_64/__start_context.S
index a51454d06d..8b8b4551ef 100644
--- a/sysdeps/unix/sysv/linux/x86_64/__start_context.S
+++ b/sysdeps/unix/sysv/linux/x86_64/__start_context.S
@@ -52,7 +52,7 @@ ENTRY(__push___start_context)
 	movl	$ARCH_CET_ALLOC_SHSTK, %edi
 	movl	$__NR_arch_prctl, %eax
 	/* The new shadow stack base is returned in __ssp[1].  */
-	syscall
+	SYSCALL_INST
 	testq	%rax, %rax
 	jne	L(hlt)		/* This should never happen.  */
 
diff --git a/sysdeps/unix/sysv/linux/x86_64/cancellation.S b/sysdeps/unix/sysv/linux/x86_64/cancellation.S
index bb4910764a..ad5c4985c0 100644
--- a/sysdeps/unix/sysv/linux/x86_64/cancellation.S
+++ b/sysdeps/unix/sysv/linux/x86_64/cancellation.S
@@ -98,7 +98,7 @@ ENTRY(__pthread_disable_asynccancel)
 	xorq	%r10, %r10
 	addq	$CANCELHANDLING, %rdi
 	LOAD_PRIVATE_FUTEX_WAIT (%esi)
-	syscall
+	SYSCALL_INST
 	movl	%fs:CANCELHANDLING, %eax
 	jmp	3b
 END(__pthread_disable_asynccancel)
diff --git a/sysdeps/unix/sysv/linux/x86_64/clone.S b/sysdeps/unix/sysv/linux/x86_64/clone.S
index 4fe755421f..cc7ef29555 100644
--- a/sysdeps/unix/sysv/linux/x86_64/clone.S
+++ b/sysdeps/unix/sysv/linux/x86_64/clone.S
@@ -73,7 +73,7 @@ ENTRY (__clone)
 	/* End FDE now, because in the child the unwind info will be
 	   wrong.  */
 	cfi_endproc;
-	syscall
+	SYSCALL_INST
 
 	testq	%rax,%rax
 	jl	SYSCALL_ERROR_LABEL
@@ -96,7 +96,7 @@ L(thread_start):
 	/* Call exit with return value from function call. */
 	movq	%rax, %rdi
 	movl	$SYS_ify(exit), %eax
-	syscall
+	SYSCALL_INST
 	cfi_endproc;
 
 	cfi_startproc;
diff --git a/sysdeps/unix/sysv/linux/x86_64/getcontext.S b/sysdeps/unix/sysv/linux/x86_64/getcontext.S
index 8d74d033e2..60199eb326 100644
--- a/sysdeps/unix/sysv/linux/x86_64/getcontext.S
+++ b/sysdeps/unix/sysv/linux/x86_64/getcontext.S
@@ -73,7 +73,7 @@ ENTRY(__getcontext)
 	mov	%RSP_LP, %RSI_LP
 	movl	$ARCH_CET_STATUS, %edi
 	movl	$__NR_arch_prctl, %eax
-	syscall
+	SYSCALL_INST
 	testq	%rax, %rax
 	jz	L(continue_no_err)
 
@@ -125,7 +125,7 @@ L(no_shstk):
 #endif
 	movl	$_NSIG8,%r10d
 	movl	$__NR_rt_sigprocmask, %eax
-	syscall
+	SYSCALL_INST
 	cmpq	$-4095, %rax		/* Check %rax for error.  */
 	jae	SYSCALL_ERROR_LABEL	/* Jump to error handler if error.  */
 
diff --git a/sysdeps/unix/sysv/linux/x86_64/setcontext.S b/sysdeps/unix/sysv/linux/x86_64/setcontext.S
index bd89b77ec6..969928fe58 100644
--- a/sysdeps/unix/sysv/linux/x86_64/setcontext.S
+++ b/sysdeps/unix/sysv/linux/x86_64/setcontext.S
@@ -44,7 +44,7 @@ ENTRY(__setcontext)
 	movl	$SIG_SETMASK, %edi
 	movl	$_NSIG8,%r10d
 	movl	$__NR_rt_sigprocmask, %eax
-	syscall
+	SYSCALL_INST
 	/* Pop the pointer into RDX. The choice is arbitrary, but
 	   leaving RDI and RSI available for use later can avoid
 	   shuffling values.  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/sigaction.c b/sysdeps/unix/sysv/linux/x86_64/sigaction.c
index e09ae246fa..6098231ea7 100644
--- a/sysdeps/unix/sysv/linux/x86_64/sigaction.c
+++ b/sysdeps/unix/sysv/linux/x86_64/sigaction.c
@@ -78,7 +78,7 @@ asm									\
    "	.type __" #name ",@function\n"					\
    "__" #name ":\n"							\
    "	movq $" #syscall ", %rax\n"					\
-   "	syscall\n"							\
+   SYSCALL_INST								\
    ".LEND_" #name ":\n"							\
    ".section .eh_frame,\"a\",@progbits\n"				\
    ".LSTARTFRAME_" #name ":\n"						\
diff --git a/sysdeps/unix/sysv/linux/x86_64/swapcontext.S b/sysdeps/unix/sysv/linux/x86_64/swapcontext.S
index 52c1216921..14e3b2f1fa 100644
--- a/sysdeps/unix/sysv/linux/x86_64/swapcontext.S
+++ b/sysdeps/unix/sysv/linux/x86_64/swapcontext.S
@@ -77,7 +77,7 @@ ENTRY(__swapcontext)
 	movl	$SIG_SETMASK, %edi
 	movl	$_NSIG8,%r10d
 	movl	$__NR_rt_sigprocmask, %eax
-	syscall
+	SYSCALL_INST
 	cmpq	$-4095, %rax		/* Check %rax for error.  */
 	jae	SYSCALL_ERROR_LABEL	/* Jump to error handler if error.  */
 
@@ -117,7 +117,7 @@ ENTRY(__swapcontext)
 	mov	%RSP_LP, %RSI_LP
 	movl	$ARCH_CET_STATUS, %edi
 	movl	$__NR_arch_prctl, %eax
-	syscall
+	SYSCALL_INST
 	testq	%rax, %rax
 	jz	L(continue_no_err)
 
diff --git a/sysdeps/unix/sysv/linux/x86_64/syscall.S b/sysdeps/unix/sysv/linux/x86_64/syscall.S
index ea2ff051cf..668aa10024 100644
--- a/sysdeps/unix/sysv/linux/x86_64/syscall.S
+++ b/sysdeps/unix/sysv/linux/x86_64/syscall.S
@@ -34,7 +34,7 @@ ENTRY (syscall)
 	movq %r8, %r10
 	movq %r9, %r8
 	movq 8(%rsp),%r9	/* arg6 is on the stack.  */
-	syscall			/* Do the system call.  */
+	SYSCALL_INST		/* Do the system call.  */
 	cmpq $-4095, %rax	/* Check %rax for error.  */
 	jae SYSCALL_ERROR_LABEL	/* Jump to error handler if error.  */
 	ret			/* Return to caller.  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
index 0a3ddd37e1..4f1aab7209 100644
--- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
@@ -26,6 +26,12 @@
 /* Defines RTLD_PRIVATE_ERRNO.  */
 #include <dl-sysdep.h>
 
+#ifdef __ASSEMBLER__
+# define SYSCALL_INST syscall
+#else
+# define SYSCALL_INST "syscall\n\t"
+#endif
+
 /* For Linux we can use the system call table in the header file
 	/usr/include/asm/unistd.h
    of the kernel.  But these symbols do not follow the SYS_* syntax
@@ -176,7 +182,7 @@
 # define DO_CALL(syscall_name, args)		\
     DOARGS_##args				\
     movl $SYS_ify (syscall_name), %eax;		\
-    syscall;
+    SYSCALL_INST;
 
 # define DOARGS_0 /* nothing */
 # define DOARGS_1 /* nothing */
@@ -240,7 +246,7 @@
 ({									\
     unsigned long int resultvar;					\
     asm volatile (							\
-    "syscall\n\t"							\
+    SYSCALL_INST							\
     : "=a" (resultvar)							\
     : "0" (number)							\
     : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);			\
@@ -254,7 +260,7 @@
     TYPEFY (arg1, __arg1) = ARGIFY (arg1);			 	\
     register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
     asm volatile (							\
-    "syscall\n\t"							\
+    SYSCALL_INST							\
     : "=a" (resultvar)							\
     : "0" (number), "r" (_a1)						\
     : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);			\
@@ -270,7 +276,7 @@
     register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;			\
     register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
     asm volatile (							\
-    "syscall\n\t"							\
+    SYSCALL_INST							\
     : "=a" (resultvar)							\
     : "0" (number), "r" (_a1), "r" (_a2)				\
     : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);			\
@@ -288,7 +294,7 @@
     register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;			\
     register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
     asm volatile (							\
-    "syscall\n\t"							\
+    SYSCALL_INST							\
     : "=a" (resultvar)							\
     : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3)			\
     : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);			\
@@ -308,7 +314,7 @@
     register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;			\
     register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
     asm volatile (							\
-    "syscall\n\t"							\
+    SYSCALL_INST							\
     : "=a" (resultvar)							\
     : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4)		\
     : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);			\
@@ -330,7 +336,7 @@
     register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;			\
     register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
     asm volatile (							\
-    "syscall\n\t"							\
+    SYSCALL_INST							\
     : "=a" (resultvar)							\
     : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4),		\
       "r" (_a5)								\
@@ -355,7 +361,7 @@
     register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;			\
     register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
     asm volatile (							\
-    "syscall\n\t"							\
+    SYSCALL_INST							\
     : "=a" (resultvar)							\
     : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4),		\
       "r" (_a5), "r" (_a6)						\
diff --git a/sysdeps/unix/sysv/linux/x86_64/vfork.S b/sysdeps/unix/sysv/linux/x86_64/vfork.S
index 22be88d17a..1ff362f409 100644
--- a/sysdeps/unix/sysv/linux/x86_64/vfork.S
+++ b/sysdeps/unix/sysv/linux/x86_64/vfork.S
@@ -51,7 +51,7 @@ ENTRY (__vfork)
 
 	/* Stuff the syscall number in RAX and enter into the kernel.  */
 	movl	$SYS_ify (vfork), %eax
-	syscall
+	SYSCALL_INST
 
 #if !SHSTK_ENABLED
 	/* Push back the return PC.  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/times.c b/sysdeps/unix/sysv/linux/x86_64/x32/times.c
index 1ea0b2e1cd..a79fb200ca 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/times.c
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/times.c
@@ -26,7 +26,7 @@
     TYPEFY (arg1, __arg1) = ARGIFY (arg1);			 	\
     register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
     asm volatile (							\
-    "syscall\n\t"							\
+    SYSCALL_INST							\
     : "=a" (resultvar)							\
     : "0" (number), "r" (_a1)						\
     : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);			\
diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h
index e25430a928..1a4b4052a6 100644
--- a/sysdeps/x86_64/nptl/tls.h
+++ b/sysdeps/x86_64/nptl/tls.h
@@ -161,7 +161,7 @@ _Static_assert (offsetof (tcbhead_t, __glibc_unused2) == 0x80,
      _head->self = _thrdescr;						      \
 									      \
      /* It is a simple syscall to set the %fs value for the thread.  */	      \
-     asm volatile ("syscall"						      \
+     asm volatile (SYSCALL_INST						      \
 		   : "=a" (_result)					      \
 		   : "0" ((unsigned long int) __NR_arch_prctl),		      \
 		     "D" ((unsigned long int) ARCH_SET_FS),		      \
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 09/11] x86-64: add nop instruction after syscall instrunction
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (7 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 08/11] x86-64: replace syscall instruction with SYSCALL_INST macro Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 10/11] x86-64: make the number of nops after syscall configurable Isaku Yamahata
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This patch replaces syscall instruction with syscall + nops with
annotation.
The impact on traditional runtime is, extra nops and list of
syscall instruction.

LibOS hooks system call and redirects the control to it so that it can
handle system call instead of kernel.
The fallback way is trap-and-emulate(e.g. by SIGSYS, SIGILL), but it's
slow.
As optimization syscall instruction is replaced somehow. The approach
can vary among LibOSes.  The common challenges are
  a) identify syscall instruction and
  b) replace syscall with instruction sequence.
x86-64 instruction has variable length and syscall instruciton has 2
bytes. On the other hand 4 byte call/jump requires 5 bytes which
imposes difficulties. (If 8 bytes absolute address jump is wanted,
more space is needed.)

This patch create a list of syscall instructions and adds nops
after syscall instruction to keep enough room for binary editing
without fragile complex tricks.

The assumed instruction sequence to replace syscall instrction
is as follows. But LibOSes can do whatever they want.
Notice that Linux x86-64 syscall ABI is stricter than normal function
call convention.
(%rcx, %r11 clobbered, %rflags preserved, redzone can't be used.)
If we can relax it, those snippets can be optimized/shortened.
Actually almost all the callers of syscall instruction allow
the use of redzone, %rflags clobbered.
For now those sequence is chosen to minimize glibc impact.

syscall sequence:
> syscall
> nop; nop; ... (add enough room for binary editing)

replacing sequence:
>  leaq 1f(%rip), %rcx
>  jmp syscall_func
>  1f:
>
> 48 8d 0d 06 00 00 00         leaq   0x6(%rip),%rcx
> e9 00 00 00 00               jmpq   0x79
>          R_X86_64_PC32       syscall_func-0x4

the callee function looks something like this.
save %rflags, reserve redzone and call LibOS entry point,
restore redzone, restore %rflags and jump back to the caller.
> syscall_func:
>         xchgq %r11, -8(%rsp)
>         pushfq
>         xchgq %r11, (%rsp)
>         subq $120, %rsp
>         pushq %r11
>         pushq %rcx
>         callq <LibOS entry point>
>         popq %rcx
>         popq %r11
>         addq $120, %rsp
>         xchgq %r11, (%rsp)
>         popfq
>         xchgq %r11, -8(%rsp)
>         jmpq *%rcx

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 sysdeps/unix/sysv/linux/x86_64/sysdep.h | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
index 4f1aab7209..d958c1ca7a 100644
--- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
@@ -27,9 +27,28 @@
 #include <dl-sysdep.h>
 
 #ifdef __ASSEMBLER__
-# define SYSCALL_INST syscall
+.macro SYSCALL_INST
+    551:
+    syscall
+    nop;nop;nop;nop;nop;nop;nop;nop;nop;nop
+    552:
+    .pushsection .libos.instructions.syscall, "a"
+    .balign 8
+    .quad 551b
+    .byte 552b - 551b
+    .popsection
+.endm
 #else
-# define SYSCALL_INST "syscall\n\t"
+#define SYSCALL_INST                                        \
+    "551:\n\t"                                              \
+    "syscall\n\t"                                           \
+    "nop;nop;nop;nop;nop;nop;nop;nop;nop;nop\n\t"           \
+    "552:\n\t"                                              \
+    ".pushsection .libos.instructions.syscall, \"a\"\n\t"   \
+    ".balign 8\n\t"                                         \
+    ".quad 551b\n\t"                                        \
+    ".byte 552b-551b\n\t"                                   \
+    ".popsection\n\t"
 #endif
 
 /* For Linux we can use the system call table in the header file
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 10/11] x86-64: make the number of nops after syscall configurable
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (8 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 09/11] x86-64: add nop instruction after syscall instrunction Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:04 ` [RFC PATCH 11/11] benchtests: simple benchmark to measure nop effects Isaku Yamahata
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This is tentative patch for convenience.
This patch make it configurable how many nops are inserted after
syscall instruction.
Once the consensus on the number of nops is made,
this patch can be removed.

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 configure.ac                            | 19 +++++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/sysdep.h | 15 ++++++++++++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/configure.ac b/configure.ac
index aa902787f0..150570dd54 100644
--- a/configure.ac
+++ b/configure.ac
@@ -480,6 +480,25 @@ AC_ARG_ENABLE([cet],
 	      [enable_cet=$enableval],
 	      [enable_cet=no])
 
+AC_ARG_ENABLE([libos-num-nops],
+              AC_HELP_STRING([--enable-libos-num-nops=NUM-NOPS],
+                             [specify the number of nops for syscall, x86-64 only
+                             only mealingful when libos support enabled.]),
+              [libos_num_nops=$enableval],
+              [libos_num_nops=''])
+if test -n "$libos_num_nops"; then
+   case "$libos_num_nops" in
+   *[!0-9]*)
+     AC_MSG_ERROR([--enable-libos-num-nops requires a number])
+     ;;
+   *)
+     ;;
+   esac
+   AC_DEFINE_UNQUOTED(ENALBE_LIBOS_NUM_NOPS, $libos_num_nops, [libos num nops])
+   libos_nops=$(printf 'nops;%.0s' $(seq $libos_num_nops))
+   AC_DEFINE_UNQUOTED(ENALBE_LIBOS_NOPS, "$libos_nops", [libos nops])
+fi
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
index d958c1ca7a..8da5e4e154 100644
--- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
@@ -26,11 +26,17 @@
 /* Defines RTLD_PRIVATE_ERRNO.  */
 #include <dl-sysdep.h>
 
-#ifdef __ASSEMBLER__
+#ifdef ENABLE_LIBOS_NOPS
+# define NOP_REPEAT ENABLE_LIBOS_NOPS
+#else
+# define NOP_REPEAT nop;nop;nop;nop;nop;nop;nop;nop;nop;nop
+#endif
+
+# ifdef __ASSEMBLER__
 .macro SYSCALL_INST
     551:
     syscall
-    nop;nop;nop;nop;nop;nop;nop;nop;nop;nop
+    NOP_REPEAT
     552:
     .pushsection .libos.instructions.syscall, "a"
     .balign 8
@@ -39,10 +45,13 @@
     .popsection
 .endm
 #else
+#define STRINGIFY(name)   STRINGIFY_1(name)
+#define STRINGIFY_1(name) #name
 #define SYSCALL_INST                                        \
     "551:\n\t"                                              \
     "syscall\n\t"                                           \
-    "nop;nop;nop;nop;nop;nop;nop;nop;nop;nop\n\t"           \
+    STRINGIFY(NOP_REPEAT)                                   \
+    "\n\t"                                                  \
     "552:\n\t"                                              \
     ".pushsection .libos.instructions.syscall, \"a\"\n\t"   \
     ".balign 8\n\t"                                         \
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 11/11] benchtests: simple benchmark to measure nop effects
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (9 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 10/11] x86-64: make the number of nops after syscall configurable Isaku Yamahata
@ 2019-09-11 21:04 ` Isaku Yamahata
  2019-09-11 21:35   ` Patrick McGehearty
  2019-09-12  0:10 ` [RFC PATCH 00/11] Library OS support Joseph Myers
  2019-09-17 13:19 ` Adhemerval Zanella
  12 siblings, 1 reply; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-11 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: isaku.yamahata, Isaku Yamahata

This is simple benchmark to measure function/nop effects.
OS noise is bigger even if it's pinned down to cpu and to make it rt
process.

$ sudo chrt -r 99 taskset 1 ./a.out

Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
---
 benchtests/bench-nop.c | 128 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)
 create mode 100644 benchtests/bench-nop.c

diff --git a/benchtests/bench-nop.c b/benchtests/bench-nop.c
new file mode 100644
index 0000000000..bb98b3d371
--- /dev/null
+++ b/benchtests/bench-nop.c
@@ -0,0 +1,128 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+
+static inline unsigned long long rdtscp(void)
+{
+  unsigned int aux;
+  unsigned long long now = __builtin_ia32_rdtscp (&aux);
+  return now;
+}
+
+//#define LOOP 10000000
+#define LOOP 1000000
+
+void func0(void)
+{
+  for (int i = 0; i < LOOP; i++)
+    {
+      unsigned long ret = syscall(SYS_gettid);
+    }
+}
+
+void func1(void)
+{
+  for (int i = 0; i < LOOP; i++)
+    {
+      unsigned long ret;
+      __asm__ volatile(
+		       "syscall\n"
+		       : "=a"(ret)
+		       : "0"(SYS_gettid));
+    }
+}
+
+void func2(void)
+{
+  for (int i = 0; i < LOOP; i++)
+    {
+      unsigned long ret;
+      __asm__ volatile(
+		       "syscall\n"
+		       "nop;nop;nop\n"
+		       : "=a"(ret)
+		       : "0"(SYS_gettid));
+    }
+}
+
+void func3(void)
+{
+  for (int i = 0; i < LOOP; i++)
+    {
+      unsigned long ret;
+      __asm__ volatile(
+		       "syscall\n"
+		       "nop;nop;nop;nop;nop;nop;nop;nop;nop;nop\n"
+		       : "=a"(ret)
+		       : "0"(SYS_gettid));
+    }
+}
+
+void func4(void)
+{
+  for (int i = 0; i < LOOP; i++)
+    {
+      unsigned long ret;
+      __asm__ volatile(
+		       "jmp 1f\n"
+		       "nop\n"
+		       "1:\n"
+		       "syscall\n"
+		       : "=a"(ret)
+		       : "0"(SYS_gettid));
+    }
+}
+
+void func5(void)
+{
+  for (int i = 0; i < LOOP; i++)
+    {
+      unsigned long ret;
+      __asm__ volatile(
+		       "jmp 1f\n"
+		       "nop;nop;nop;nop;nop;nop;nop;nop\n"
+		       "1:\n"
+		       "syscall\n"
+		       : "=a"(ret)
+		       : "0"(SYS_gettid));
+    }
+}
+
+
+unsigned long long measure(void (*f)(void))
+{
+  unsigned long long start = rdtscp();
+  (*f)();
+  unsigned long long end = rdtscp();
+  return end - start;
+}
+
+int main(int argc, char** argv)
+{
+  printf("measuring syscall func\n");
+  unsigned long long time0 = measure(&func0);
+
+  printf("measuring syscall instruction\n");
+  unsigned long long time1 = measure(&func1);
+
+  printf("measuring syscall + nop * 3\n");
+  unsigned long long time2 = measure(&func2);
+
+  printf("measuring syscall + nop * 10\n");
+  unsigned long long time3 = measure(&func3);
+
+  printf("measuring jmp + nop + syscall\n");
+  unsigned long long time4 = measure(&func4);
+
+  printf("measuring jmp + nop * 8 + syscall\n");
+  unsigned long long time5 = measure(&func5);
+
+  printf("\tfunc\tinst\tnop*3\tnop*10\tjmp+nop\tjmp+nop*8\n");
+  printf("ratio\t%3.2f\t%3.2f\t%3.2f\t%3.2f\t%3.2f\t%3.2f\n",
+	 time0 * 100.0/time1,
+	 time1 * 100.0/time1,
+	 time2 * 100.0/time1,
+	 time3 * 100.0/time1,
+	 time4 * 100.0/time1,
+	 time5 * 100.0/time1);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 11/11] benchtests: simple benchmark to measure nop effects
  2019-09-11 21:04 ` [RFC PATCH 11/11] benchtests: simple benchmark to measure nop effects Isaku Yamahata
@ 2019-09-11 21:35   ` Patrick McGehearty
  0 siblings, 0 replies; 20+ messages in thread
From: Patrick McGehearty @ 2019-09-11 21:35 UTC (permalink / raw)
  To: libc-alpha

I believe this segment
+     time0 * 100.0/time1,
+     time1 * 100.0/time1,
+     time2 * 100.0/time1,
+     time3 * 100.0/time1,
+     time4 * 100.0/time1,
+     time5 * 100.0/time1);
needs to change to:
+     time0 * 100.0/time0,
+     time1 * 100.0/time1,
+     time2 * 100.0/time2,
+     time3 * 100.0/time3,
+     time4 * 100.0/time4,
+     time5 * 100.0/time5);

Also, I would recommend removing or delaying all printf statements
that occur before your timing measurements. My experience has been
when doing delicate timing experiments, printf can trigger async I/O
activity which adds significant noise to the measurements.

There may be other opportunities to refine the measurements,
including taking each measurement experiment several times
and then reporting both the min, max, median, and mean.

- patrick mcgehearty


On 9/11/2019 4:04 PM, Isaku Yamahata wrote:
> This is simple benchmark to measure function/nop effects.
> OS noise is bigger even if it's pinned down to cpu and to make it rt
> process.
>
> $ sudo chrt -r 99 taskset 1 ./a.out
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
> ---
>   benchtests/bench-nop.c | 128 +++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 128 insertions(+)
>   create mode 100644 benchtests/bench-nop.c
>
> diff --git a/benchtests/bench-nop.c b/benchtests/bench-nop.c
> new file mode 100644
> index 0000000000..bb98b3d371
> --- /dev/null
> +++ b/benchtests/bench-nop.c
> @@ -0,0 +1,128 @@
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +
> +static inline unsigned long long rdtscp(void)
> +{
> +  unsigned int aux;
> +  unsigned long long now = __builtin_ia32_rdtscp (&aux);
> +  return now;
> +}
> +
> +//#define LOOP 10000000
> +#define LOOP 1000000
> +
> +void func0(void)
> +{
> +  for (int i = 0; i < LOOP; i++)
> +    {
> +      unsigned long ret = syscall(SYS_gettid);
> +    }
> +}
> +
> +void func1(void)
> +{
> +  for (int i = 0; i < LOOP; i++)
> +    {
> +      unsigned long ret;
> +      __asm__ volatile(
> +		       "syscall\n"
> +		       : "=a"(ret)
> +		       : "0"(SYS_gettid));
> +    }
> +}
> +
> +void func2(void)
> +{
> +  for (int i = 0; i < LOOP; i++)
> +    {
> +      unsigned long ret;
> +      __asm__ volatile(
> +		       "syscall\n"
> +		       "nop;nop;nop\n"
> +		       : "=a"(ret)
> +		       : "0"(SYS_gettid));
> +    }
> +}
> +
> +void func3(void)
> +{
> +  for (int i = 0; i < LOOP; i++)
> +    {
> +      unsigned long ret;
> +      __asm__ volatile(
> +		       "syscall\n"
> +		       "nop;nop;nop;nop;nop;nop;nop;nop;nop;nop\n"
> +		       : "=a"(ret)
> +		       : "0"(SYS_gettid));
> +    }
> +}
> +
> +void func4(void)
> +{
> +  for (int i = 0; i < LOOP; i++)
> +    {
> +      unsigned long ret;
> +      __asm__ volatile(
> +		       "jmp 1f\n"
> +		       "nop\n"
> +		       "1:\n"
> +		       "syscall\n"
> +		       : "=a"(ret)
> +		       : "0"(SYS_gettid));
> +    }
> +}
> +
> +void func5(void)
> +{
> +  for (int i = 0; i < LOOP; i++)
> +    {
> +      unsigned long ret;
> +      __asm__ volatile(
> +		       "jmp 1f\n"
> +		       "nop;nop;nop;nop;nop;nop;nop;nop\n"
> +		       "1:\n"
> +		       "syscall\n"
> +		       : "=a"(ret)
> +		       : "0"(SYS_gettid));
> +    }
> +}
> +
> +
> +unsigned long long measure(void (*f)(void))
> +{
> +  unsigned long long start = rdtscp();
> +  (*f)();
> +  unsigned long long end = rdtscp();
> +  return end - start;
> +}
> +
> +int main(int argc, char** argv)
> +{
> +  printf("measuring syscall func\n");
> +  unsigned long long time0 = measure(&func0);
> +
> +  printf("measuring syscall instruction\n");
> +  unsigned long long time1 = measure(&func1);
> +
> +  printf("measuring syscall + nop * 3\n");
> +  unsigned long long time2 = measure(&func2);
> +
> +  printf("measuring syscall + nop * 10\n");
> +  unsigned long long time3 = measure(&func3);
> +
> +  printf("measuring jmp + nop + syscall\n");
> +  unsigned long long time4 = measure(&func4);
> +
> +  printf("measuring jmp + nop * 8 + syscall\n");
> +  unsigned long long time5 = measure(&func5);
> +
> +  printf("\tfunc\tinst\tnop*3\tnop*10\tjmp+nop\tjmp+nop*8\n");
> +  printf("ratio\t%3.2f\t%3.2f\t%3.2f\t%3.2f\t%3.2f\t%3.2f\n",
> +	 time0 * 100.0/time1,
> +	 time1 * 100.0/time1,
> +	 time2 * 100.0/time1,
> +	 time3 * 100.0/time1,
> +	 time4 * 100.0/time1,
> +	 time5 * 100.0/time1);
> +}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 00/11] Library OS support
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (10 preceding siblings ...)
  2019-09-11 21:04 ` [RFC PATCH 11/11] benchtests: simple benchmark to measure nop effects Isaku Yamahata
@ 2019-09-12  0:10 ` Joseph Myers
  2019-09-12  1:13   ` Isaku Yamahata
  2019-09-17 13:19 ` Adhemerval Zanella
  12 siblings, 1 reply; 20+ messages in thread
From: Joseph Myers @ 2019-09-12  0:10 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: libc-alpha, isaku.yamahata

On Wed, 11 Sep 2019, Isaku Yamahata wrote:

> This patch is to add Library OS(LibOS in short) to glibc.
> This is the first version of patch series to support LibOS.

I don't see anything here about host triplets being used.  I'd expect 
x86_64-*-libos or similar (with consequent config.sub changes being 
submitted to GNU config.git) but there's nothing to indicate that, and the 
patch series is lacking documentation (NEWS, install.texi / regeneration 
of INSTALL, other .texi files if applicable).  I'd also expect any new OS 
to have appropriate additions to build-many-glibcs.py.  There are a great 
many complications specific to existing GNU/Linux ABIs that ought to be 
irrelevant in this case (compat support for old symbol versions, 
enable-kernel support for different minimum kernel versions, etc.).

Given the modification of generic files, you should verify that installed 
stripped shared libraries for e.g. x86_64-linux-gnu are byte-for-byte 
identical before and after the patch (or justify them not being so if they 
aren't identical - I'd expect justifications of the form "this file has 
line numbers in assertions that change").

There should not be any __x86_64__ conditionals in generic files; the 
sysdeps structure should be used as appropriate instead.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 07/11] malloc: make arena size configurable on startup
  2019-09-11 21:04 ` [RFC PATCH 07/11] malloc: make arena size configurable on startup Isaku Yamahata
@ 2019-09-12  1:03   ` DJ Delorie
  2019-09-12 18:43     ` Isaku Yamahata
  0 siblings, 1 reply; 20+ messages in thread
From: DJ Delorie @ 2019-09-12  1:03 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: libc-alpha


I have no fundamental problem with tuning the heap size, but...

1. The heap size must always be a power of two; you need to enforce that
   in the tunable callback.

2. Do you have any benchmarks that show that these changes don't affect
   performance?  I don't mean differing size heaps, I mean changing a
   constant to a variable in the code.

3. I don't see the point of heap_max_specified.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 00/11] Library OS support
  2019-09-12  0:10 ` [RFC PATCH 00/11] Library OS support Joseph Myers
@ 2019-09-12  1:13   ` Isaku Yamahata
  2019-09-16 20:47     ` Joseph Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-12  1:13 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Isaku Yamahata, libc-alpha, isaku.yamahata

Thanks for feedback.


On Thu, Sep 12, 2019 at 12:10:32AM +0000,
Joseph Myers <joseph@codesourcery.com> wrote:

> On Wed, 11 Sep 2019, Isaku Yamahata wrote:
> 
> > This patch is to add Library OS(LibOS in short) to glibc.
> > This is the first version of patch series to support LibOS.
> 
> I don't see anything here about host triplets being used.  I'd expect 
> x86_64-*-libos or similar (with consequent config.sub changes being 
> submitted to GNU config.git) but there's nothing to indicate that, and the 
> patch series is lacking documentation (NEWS, install.texi / regeneration 
> of INSTALL, other .texi files if applicable).  I'd also expect any new OS 
> to have appropriate additions to build-many-glibcs.py.  There are a great 
> many complications specific to existing GNU/Linux ABIs that ought to be 
> irrelevant in this case (compat support for old symbol versions, 
> enable-kernel support for different minimum kernel versions, etc.).

Let me clarify my scope. Multiple LibOSes (typically) emulate Linux system
call (with some change). LibOSes can run unmodified (dynamically linked)
Linux executables (with modified libc as shared library).
For example, its invocation looks like as follows.
native case: $ linux-executable
LibOS case:  $ libos_loader linux-executable

I'd like to have x86_64-*-linux support both native linux and multiple LibOSes
with single version of binary.

If we have multiple versions, for example,
  x86_64-*-linux
  x86_64-*-linux_libosX
  x86_64-*-linux_libosY
  ...
it doesn't scale. It will cause maintenance hell.


> Given the modification of generic files, you should verify that installed 
> stripped shared libraries for e.g. x86_64-linux-gnu are byte-for-byte 
> identical before and after the patch (or justify them not being so if they 
> aren't identical - I'd expect justifications of the form "this file has 
> line numbers in assertions that change").
> 
> There should not be any __x86_64__ conditionals in generic files; the 
> sysdeps structure should be used as appropriate instead.

Sure. Let me address it with next respin.

Thanks,
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 07/11] malloc: make arena size configurable on startup
  2019-09-12  1:03   ` DJ Delorie
@ 2019-09-12 18:43     ` Isaku Yamahata
  0 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-12 18:43 UTC (permalink / raw)
  To: DJ Delorie; +Cc: Isaku Yamahata, libc-alpha

Thanks for feedback.


On Wed, Sep 11, 2019 at 09:03:59PM -0400,
DJ Delorie <dj@redhat.com> wrote:

> 
> I have no fundamental problem with tuning the heap size, but...
> 
> 1. The heap size must always be a power of two; you need to enforce that
>    in the tunable callback.

sure.


> 2. Do you have any benchmarks that show that these changes don't affect
>    performance?  I don't mean differing size heaps, I mean changing a
>    constant to a variable in the code.

So far not yet. let me try bench-malloc-simple.c  bench-malloc-thread.c


> 3. I don't see the point of heap_max_specified.

ok. Let me fix it with the next spin.

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 00/11] Library OS support
  2019-09-12  1:13   ` Isaku Yamahata
@ 2019-09-16 20:47     ` Joseph Myers
  2019-09-17 17:46       ` Isaku Yamahata
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph Myers @ 2019-09-16 20:47 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: libc-alpha, isaku.yamahata

On Wed, 11 Sep 2019, Isaku Yamahata wrote:

> If we have multiple versions, for example,
>   x86_64-*-linux
>   x86_64-*-linux_libosX
>   x86_64-*-linux_libosY
>   ...
> it doesn't scale. It will cause maintenance hell.

Multiple different LibOS host triplets would indeed be an issue.  My point 
is more like this: in various uses of QEMU it's often better to use 
virtual boards and devices that don't correspond to any real hardware but 
are convenient for emulation and for guest operating systems, rather than 
to use emulation of a particular piece of real hardware.  Similarly, if 
you don't constrain yourself to work with generic x86_64-*-linux-gnu 
libraries, you can make the syscall interface for LibOS into something 
that is designed to be convenient for library implementation on a wide 
range of possible host OSes, rather than being tied to all the 
peculiarities of the existing Linux kernel syscall ABI and the existing 
glibc ports.  Only one such interface should be needed, not one for each 
LibOS.

If however you continue with something that works with x86_64-*-linux-gnu 
rather than a different triplet, aiming for generic x86_64-*-linux-gnu 
libraries to work in a LibOS environment, my other point from the Cauldron 
discussion applies: this is adding new interfaces to x86_64-*-linux-gnu 
glibc and so there should be additions to the glibc testsuite that verify, 
in a normal x86_64-*-linux-gnu glibc build, that those interfaces are 
working as desired for LibOS purposes.  That probably means some kind of 
minimal LibOS loader, that passes syscalls through to the host operating 
system, should be included in the glibc testsuite - just as the 
test-in-container infrastructure can be seen as support for building and 
using a (very) minimal GNU/Linux distribution (complete with a local 
implementation of enough of /bin/sh to work for the glibc tests) for those 
tests that need to run in such a container environment.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 00/11] Library OS support
  2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
                   ` (11 preceding siblings ...)
  2019-09-12  0:10 ` [RFC PATCH 00/11] Library OS support Joseph Myers
@ 2019-09-17 13:19 ` Adhemerval Zanella
  12 siblings, 0 replies; 20+ messages in thread
From: Adhemerval Zanella @ 2019-09-17 13:19 UTC (permalink / raw)
  To: libc-alpha



On 11/09/2019 18:03, Isaku Yamahata wrote:
> This patch is to add Library OS(LibOS in short) to glibc.
> This is the first version of patch series to support LibOS.

This patchset along the Cauldron presentation give me the impression
it is really tuned to a very specific usercase (x86_64 SGX) rather than
a more generic port. For instance, I am not really sure that ignoring
dynamic R_X86_64_NONE is really the best behaviour in generic x86_64
ABI (it is for most in cases an invalid relocation).

But the main problem, as we discussed briefly on GNU Cauldron, is this
essentially brings back text relocation albeit in a different form
(libOS runtime would be the one that actually rewrite the text segment).
This lowers the memory scalability and might poses some security issues
for generic cases (excluding SGX like usage).  It might not be a problem
to SGX enclaves, but this is most likely for generic case.

Another issue is if this would be another research-like projects that
might be replaced/abandoned once a better hardware extension is available.
For instance, an architecture extension might provide a way the guest 
configure the syscall mechanism where instead of changing the to kernel
mode it calls the {uni}kernel functions directly. 

Also, some requirements such the malloc arena size restriction could be
really implemented in the libOS runtime since it would be one responsible
for the sbrk/mmap syscall (it can track and return -1/ENOMEM when a call
can not be done). Another patch piece that might add even more complexity
is '--enable-libos-num-nops', where in my understanding would possible
that libOS might fail to load older glibc if it does not have sufficient
nops on the syscalls marking (adding compatibility layer, where although
not glibc specific is another runtime issue).

All these points make me wonder if would be better to make libOS an 
specific target and move all the required bits to system specific
files.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 00/11] Library OS support
  2019-09-16 20:47     ` Joseph Myers
@ 2019-09-17 17:46       ` Isaku Yamahata
  0 siblings, 0 replies; 20+ messages in thread
From: Isaku Yamahata @ 2019-09-17 17:46 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Isaku Yamahata, libc-alpha, isaku.yamahata

Distro folks have their opinion. Can anyone from distro jump-in?

Yes, it can be implemented in either way.
If it's not tested, it's broken. test should be done with minimum libos.
I understand that your point is,
If I go for a normal x86_64-*-linux-gnu, test in the upstream CI is a must,
a strong requirement.
On the other hand, If i go for x86_64-*-linux-libos-gnu(or whatever we call it),
it not.

Thanks,


Thanks,
Isaku Yamahata

On Mon, Sep 16, 2019 at 08:47:57PM +0000,
Joseph Myers <joseph@codesourcery.com> wrote:

> On Wed, 11 Sep 2019, Isaku Yamahata wrote:
> 
> > If we have multiple versions, for example,
> >   x86_64-*-linux
> >   x86_64-*-linux_libosX
> >   x86_64-*-linux_libosY
> >   ...
> > it doesn't scale. It will cause maintenance hell.
> 
> Multiple different LibOS host triplets would indeed be an issue.  My point 
> is more like this: in various uses of QEMU it's often better to use 
> virtual boards and devices that don't correspond to any real hardware but 
> are convenient for emulation and for guest operating systems, rather than 
> to use emulation of a particular piece of real hardware.  Similarly, if 
> you don't constrain yourself to work with generic x86_64-*-linux-gnu 
> libraries, you can make the syscall interface for LibOS into something 
> that is designed to be convenient for library implementation on a wide 
> range of possible host OSes, rather than being tied to all the 
> peculiarities of the existing Linux kernel syscall ABI and the existing 
> glibc ports.  Only one such interface should be needed, not one for each 
> LibOS.
> 
> If however you continue with something that works with x86_64-*-linux-gnu 
> rather than a different triplet, aiming for generic x86_64-*-linux-gnu 
> libraries to work in a LibOS environment, my other point from the Cauldron 
> discussion applies: this is adding new interfaces to x86_64-*-linux-gnu 
> glibc and so there should be additions to the glibc testsuite that verify, 
> in a normal x86_64-*-linux-gnu glibc build, that those interfaces are 
> working as desired for LibOS purposes.  That probably means some kind of 
> minimal LibOS loader, that passes syscalls through to the host operating 
> system, should be included in the glibc testsuite - just as the 
> test-in-container infrastructure can be seen as support for building and 
> using a (very) minimal GNU/Linux distribution (complete with a local 
> implementation of enough of /bin/sh to work for the glibc tests) for those 
> tests that need to run in such a container environment.
> 
> -- 
> Joseph S. Myers
> joseph@codesourcery.com

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-09-17 17:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-11 21:03 [RFC PATCH 00/11] Library OS support Isaku Yamahata
2019-09-11 21:03 ` [RFC PATCH 01/11] x86-64, elf: make elf_machine_lazy_rel() ignore R_X86_64_NONE Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 02/11] elf: add macro to define note section for LibOS Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 03/11] elf: " Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 04/11] elf: add stub functions for LibOS support Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 05/11] elf: add hook, __libos_map_library to dl-open.c Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 06/11] elf/rtld: introduce runtime option to disable HP_TIMING_INLINE Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 07/11] malloc: make arena size configurable on startup Isaku Yamahata
2019-09-12  1:03   ` DJ Delorie
2019-09-12 18:43     ` Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 08/11] x86-64: replace syscall instruction with SYSCALL_INST macro Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 09/11] x86-64: add nop instruction after syscall instrunction Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 10/11] x86-64: make the number of nops after syscall configurable Isaku Yamahata
2019-09-11 21:04 ` [RFC PATCH 11/11] benchtests: simple benchmark to measure nop effects Isaku Yamahata
2019-09-11 21:35   ` Patrick McGehearty
2019-09-12  0:10 ` [RFC PATCH 00/11] Library OS support Joseph Myers
2019-09-12  1:13   ` Isaku Yamahata
2019-09-16 20:47     ` Joseph Myers
2019-09-17 17:46       ` Isaku Yamahata
2019-09-17 13:19 ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).