From: lourens@bearmetal.eu
To: ruby-core@ruby-lang.org
Subject: [ruby-core:88758] [Ruby trunk Misc#15007] Let all Init_xxx and extension APIs frequently called from init code paths be considered cold
Date: Thu, 30 Aug 2018 17:09:53 +0000 (UTC) [thread overview]
Message-ID: <redmine.journal-73809.20180830170950.2232a5dd84d65977@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-15007.20180818231731@ruby-lang.org
Issue #15007 has been updated by methodmissing (Lourens Naudé).
normalperson (Eric Wong) wrote:
> lourens@bearmetal.eu wrote:
> > * `MADV_HUGEPAGE` - fallback option, implicit request, kernel
> > will map it implicit if not aligned to hugepage boundary, or
> > right away if aligned to hugepage boundary (I implemented with
> > the 2MB alignment for my system). Requires a kernel compiled
> > with THP support (most are) and THP enabled (likewise)
>
> Crap, I think that conflicts with the usage of
> PR_SET_THP_DISABLE (for CoW-friendliness) since r63253 in trunk.
> MAP_HUGETLB will still work, though.
Yes, explains behaviour observed through smaps - correct, `MAP_HUGETLB` works. Will clean it up during the week and submit as a new issue.
----------------------------------------
Misc #15007: Let all Init_xxx and extension APIs frequently called from init code paths be considered cold
https://bugs.ruby-lang.org/issues/15007#change-73809
* Author: methodmissing (Lourens Naudé)
* Status: Open
* Priority: Normal
* Assignee: naruse (Yui NARUSE)
----------------------------------------
References Github PR https://github.com/ruby/ruby/pull/1934
### Why?
An incremental extraction from PR https://github.com/ruby/ruby/pull/1922, specifically addressing the feedback from Yui Naruse in https://github.com/ruby/ruby/pull/1922#issuecomment-413796710
The [Linux kernel](https://github.com/torvalds/linux/blob/ca04b3cca11acbaf904f707f2d9ca9654d7cc226/include/linux/compiler-gcc.h#L191-L206), [PHP 7](https://github.com/php/php-src/blob/2d71a28954a4f20709718ee7cb2b850d334c561c/Zend/zend_portability.h#L220) and other projects use the `hot` and `cold` function attributes to help with better code layout.
I noticed Ruby is very much CPU frontend bound (not feeding instructions into the CPU pipelines as fast as it maybe could) and therefore even most micro benchmarks have a high CPI (cycles per instruction) rate. This PR is part of a larger chunk of work I'd like to do around improving CPU frontend throughput and can take a stab at formally writing up those ideas if there's any interest from the community. I don't know.
### Implementation
This PR has an exclusive focus on having the `Init_xxx` functions for the core classes and those bundled in `ext` being flagged to be optimized for size as they're called only once at runtime.
The GCC specific [cold](https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/Function-Attributes.html) function attribute works in the following way (from GCC docs):
```
The cold attribute is used to inform the compiler that a function is unlikely executed. The function is optimized for size rather than speed and on many targets it is placed into special subsection of the text section so all cold functions appears close together improving code locality of non-cold parts of program. The paths leading to call of cold functions within code are marked as unlikely by the branch prediction mechanism. It is thus useful to mark functions used to handle unlikely conditions, such as perror, as cold to improve optimization of hot functions that do call marked functions in rare occasions.
When profile feedback is available, via -fprofile-use, hot functions are automatically detected and this attribute is ignored.
```
By declaring a function as `cold` when defined we get the following benefits:
* No-op on platforms that does not support the attribute
* Size optimization of cold functions with a smaller footprint in the instruction cache
* Therefore CPU frontend throughput increases due to a lower ratio of instruction cache misses and a lower ITLB overhead - see [original chunky PR](https://user-images.githubusercontent.com/379/44204858-4c085100-a14c-11e8-86b8-d87fcb5e4985.png) VS [then trunk](https://user-images.githubusercontent.com/379/44204870-4f9bd800-a14c-11e8-9bee-14c8ad8d3a7d.png)
* This effect can further be amplified in future work with the `hot` attribute
#### Extension APIs flagged as cold
These are and should typically only be called on extension init, and thus safe to optimize for size as well.
* `void rb_define_method_id(VALUE, ID, VALUE (*)(ANYARGS), int));`
* `void rb_undef(VALUE, ID));`
* `void rb_define_protected_method(VALUE, const char*, VALUE (*)(ANYARGS), int));`
* `void rb_define_private_method(VALUE, const char*, VALUE (*)(ANYARGS), int));`
* `void rb_define_singleton_method(VALUE, const char*, VALUE(*)(ANYARGS), int));`
* `void rb_define_alloc_func(VALUE, rb_alloc_func_t));`
* `void rb_undef_alloc_func(VALUE));`
* `VALUE rb_define_class(const char*,VALUE));`
* `VALUE rb_define_module(const char*));`
* `VALUE rb_define_class_under(VALUE, const char*, VALUE));`
* `VALUE rb_define_module_under(VALUE, const char*));`
* `void rb_define_variable(const char*,VALUE*));`
* `void rb_define_virtual_variable(const char*,VALUE(*)(ANYARGS),void(*)(ANYARGS)));`
* `void rb_define_hooked_variable(const char*,VALUE*,VALUE(*)(ANYARGS),void(*)(ANYARGS)));`
* `void rb_define_readonly_variable(const char*,const VALUE*));`
* `void rb_define_const(VALUE,const char*,VALUE));`
* `void rb_define_global_const(const char*,VALUE));`
* `void rb_define_method(VALUE,const char*,VALUE(*)(ANYARGS),int));`
* `(void rb_define_module_function(VALUE,const char*,VALUE(*)(ANYARGS),int));`
* `void rb_define_global_function(const char*,VALUE(*)(ANYARGS),int));`
* `void rb_undef_method(VALUE,const char*));`
* `void rb_define_alias(VALUE,const char*,const char*));`
* `void rb_define_attr(VALUE,const char*,int,int));`
* `void rb_global_variable(VALUE*));`
* `void rb_gc_register_mark_object(VALUE));`
* `void rb_gc_register_address(VALUE*));`
* `void rb_gc_unregister_address(VALUE*));`
#### Text segment reductions
Small changes (`3144` bytes reduction of the text segment) because this is incremental groundwork and and initial low risk PR.
this branch:
```
lourens@CarbonX1:~/src/ruby/ruby$ size ruby
text data bss dec hex filename
3462153 21056 71344 3554553 363cf9 ruby
```
trunk:
```
lourens@CarbonX1:~/src/ruby/trunk$ size ruby
text data bss dec hex filename
3465297 21056 71344 3557697 364941 ruby
```
Diffs for individual object files: https://www.diffchecker.com/T0GVzX1q
Default `text.unlikely` section where init functions are moved to:
```
lourens@CarbonX1:~/src/ruby/ruby$ readelf -S vm.o
There are 34 section headers, starting at offset 0x2a04f8:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
000000000001c37f 0000000000000000 AX 0 0 16
[ 2] .rela.text RELA 0000000000000000 00114100
000000000000a7d0 0000000000000018 I 31 1 8
[ 3] .data PROGBITS 0000000000000000 0001c3c0
0000000000000030 0000000000000000 WA 0 0 16
[ 4] .bss NOBITS 0000000000000000 0001c400
00000000000002b0 0000000000000000 WA 0 0 32
[ 5] .rodata.str1.8 PROGBITS 0000000000000000 0001c400
0000000000000d6f 0000000000000001 AMS 0 0 8
[ 6] .text.unlikely PROGBITS 0000000000000000 0001d16f <<<<<<<<<<<<<<<
0000000000001aa9 0000000000000000 AX 0 0 1
```
The relocations for `vm.o`:
```
lourens@CarbonX1:~/src/ruby/ruby$ ld -M vm.o
--- truncated ---
.text 0x0000000000400120 0x1de2f
*(.text.unlikely .text.*_unlikely .text.unlikely.*)
.text.unlikely
0x0000000000400120 0x1aa9 vm.o
0x000000000040038f rb_define_alloc_func
0x00000000004003bf rb_undef_alloc_func
0x00000000004003c5 Init_Method
0x0000000000400512 Init_vm_eval
0x00000000004007a1 Init_eval_method
0x0000000000400a54 rb_undef
0x0000000000400c1d Init_VM
0x000000000040185f Init_BareVM
0x0000000000401b16 Init_vm_objects
0x0000000000401b61 Init_top_self
*(.text.exit .text.exit.*)
*(.text.startup .text.startup.*)
*(.text.hot .text.hot.*)
*(.text .stub .text.* .gnu.linkonce.t.*)
*fill* 0x0000000000401bc9 0x7
.text 0x0000000000401bd0 0x1c37f vm.o
0x00000000004022f0 rb_f_notimplement
0x0000000000404780 rb_vm_ep_local_ep
0x00000000004047b0 rb_vm_frame_block_handler
0x00000000004047e0 rb_vm_cref_new_toplevel
0x0000000000404870 rb_vm_block_ep_update
0x0000000000404890 ruby_vm_special_exception_copy
0x0000000000406960 rb_ec_stack_overflow
0x00000000004069c0 rb_vm_push_frame
0x0000000000406b20 rb_vm_pop_frame
0x0000000000406b30 rb_error_arity
0x0000000000407180 rb_vm_frame_method_entry
0x00000000004075e0 rb_vm_rewrite_cref
0x00000000004076f0 rb_simple_iseq_p
0x0000000000407700 rb_vm_opt_struct_aref
0x0000000000407730 rb_vm_opt_struct_aset
0x0000000000407750 rb_clear_constant_cache
--- truncated ---
```
I also dabbled with the idea of an `INITFUNC` macro that also places the `Init_xxx` functions into a `text.init` section as the [kernel does](https://linuxgazette.net/157/amurray.html) for a possible future optimization of stripping out ELF sections for setup / init specific functions. I don't think that makes sense for now and possibly only interesting for mruby or embedded.
### Possible next units of work
#### Cold code specific
* Incrementally PR corner case error handling functions such as `rb_bug` from https://github.com/ruby/ruby/pull/1922
* Ditto for generic error handling functions (`rb_raise` and friends) from https://github.com/ruby/ruby/pull/1922
* Class specific error handling functions (load errors, encoding errors in the IO module, sys errors etc.) from https://github.com/ruby/ruby/pull/1922
* GCC 5+ also supports `cold` [labels](https://gcc.gnu.org/onlinedocs/gcc/Label-Attributes.html) , which I took a stab with in the bloated https://github.com/ruby/ruby/pull/1922
#### TLB (translation lookaside buffer) specific
* Further ITLB overhead investigation
* Ruby binaries built with O3 and debug symbols come in at just short of 18MB, or roughly 9 hugepages on linux. PHP core developers were able to squeeze a few % by remapping code to hugepages on supported systems - http://developers-club.com/posts/270685/ . Implementation [here](https://github.com/php/php-src/blob/fb0389b1010de5a6459bcf286409423f69e74aaf/ext/opcache/ZendAccelerator.c#L2645-L2750)
#### Bytecode specific
* The [Intel Tracing Task API](https://software.intel.com/en-us/vtune-amplifier-help-task-api) is very well suited for the instruction sequences YARV generates and to infer better per instruction CPU utilization and identify any stalls (frontend, backend, branches etc.) to drive further work.
--
https://bugs.ruby-lang.org/
next prev parent reply other threads:[~2018-08-30 17:10 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <redmine.issue-15007.20180818231731@ruby-lang.org>
2018-08-18 23:17 ` [ruby-core:88548] [Ruby trunk Misc#15007] Let all Init_xxx and extension APIs frequently called from init code paths be considered cold lourens
2018-08-20 7:28 ` [ruby-core:88566] " Eric Wong
2018-08-19 0:23 ` [ruby-core:88550] " eregontp
2018-08-19 0:38 ` [ruby-core:88551] " lourens
2018-08-20 1:19 ` [ruby-core:88559] " shyouhei
2018-08-20 16:45 ` [ruby-core:88574] " lourens
2018-08-23 8:20 ` [ruby-core:88615] " ko1
2018-08-23 14:16 ` [ruby-core:88618] " hanmac
2018-08-23 17:57 ` [ruby-core:88620] " lourens
2018-08-23 18:01 ` [ruby-core:88621] " lourens
2018-08-25 17:40 ` [ruby-core:88646] " naruse
2018-08-26 23:49 ` [ruby-core:88655] " lourens
2018-08-30 10:19 ` [ruby-core:88748] " lourens
2018-08-30 10:46 ` [ruby-core:88749] " Eric Wong
2018-08-30 17:09 ` lourens [this message]
2018-11-03 23:52 ` [ruby-core:89703] " ko1
2018-11-04 11:01 ` [ruby-core:89706] " lourens
2018-11-06 7:45 ` [ruby-core:89721] " ko1
2018-11-22 0:46 ` [ruby-core:89937] " lourens
2018-11-28 12:56 ` [ruby-core:90119] " lourens
2018-12-03 22:27 ` [ruby-core:90272] " lourens
2018-12-04 1:58 ` [ruby-core:90273] " ko1
2018-12-06 11:05 ` [ruby-core:90338] " naruse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.ruby-lang.org/en/community/mailing-lists/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=redmine.journal-73809.20180830170950.2232a5dd84d65977@ruby-lang.org \
--to=ruby-core@ruby-lang.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).