From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 078D71F4B4; Mon, 1 Feb 2021 09:07:17 +0000 (UTC) Date: Sun, 31 Jan 2021 23:07:16 -1000 From: Eric Wong To: meta@public-inbox.org Subject: Perl debug patches used to track down source of segfault Message-ID: <20210201090716.GA3388@dcvr> References: <20210201082833.3293-1-e@80x24.org> <20210201082833.3293-19-e@80x24.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0OAP2g/MAC+5xKAE" Content-Disposition: inline In-Reply-To: <20210201082833.3293-19-e@80x24.org> List-Id: --0OAP2g/MAC+5xKAE Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Attached are two patches against the Debian-packaged perl 5.28.1-6+deb10u1 which I used for tracking down the attempt to access @DB::args of PublicInbox::Listener::event_step as the source of the segfault. I don't know Perl internals very well, and I was never an advanced gdb user when I hacked C. While most segfaults I saw did not emit any errors, maybe 1-2 of them pointed out something from Carp (the error reporting and backtrace module of Perl); so I wasn't sure if there were multiple sources of segfaults. I initially thought it was something warning during the destruction phase because most of the code isn't doing anything out-of-the-ordinary aside from short-lived workers. I then ran lei with debugperl(1) (deb: perl-debug) instead of the normal "perl" on my system. That got me to the assertion failure in sv.c::Perl_sv_clear at assert(SvTYPE(sv) != (svtype)SVTYPEMASK); So I made the attached patch to sv.c to call Perl_op_dump (which I learned about from the perlhacktips(1) manpage). This patch to sv.c got me a dump which put me around lines 340-350 of Carp.pm. Poking into Carp.pm brought me to this comment around lines 330: # Guard our serialization of the stack from stack refcounting bugs # NOTE this is NOT a complete solution, we cannot 100% guard against # these bugs. However in many cases Perl *is* capable of detecting # them and throws an error when it does. Unfortunately serializing # the arguments on the stack is a perfect way of finding these bugs, # even when they would not affect normal program flow that did not # poke around inside the stack. Inside of Carp.pm it makes little # sense reporting these bugs, as Carp's job is to report the callers # errors, not the ones it might happen to tickle while doing so. # See: https://rt.perl.org/Public/Bug/Display.html?id=131046 # and: https://rt.perl.org/Public/Bug/Display.html?id=52610 # for more details and discussion. - Yves The conundrum is Carp itself is generating backtraces, so getting a Perl-level backtrace becomes tricky... I decided to edit Carp.pm to store the caller-supplied error message in a global variable, and set the G_ERR environment variable before where it would occasionally segfault. Once the environment variable is set, printing fields of the global **environ array from gdb on the core dump where G_ERR is stored would get me the $sub_name and offending error message. It turned out the error was the confess call in PublicInbox::Eml::body_str which was copied over from Email::MIME, and $sub_name was PublicInbox::Listener::event_step. That led me to realize $DescriptorMap{$fd} was no longer referenced clobbered and to this workaround. I didn't want to spend time compiling, relinking and install Perl again (and didn't want to do it the first time), but I could've printed the G_ERR environment where I was doing Perl_op_dump, too. --0OAP2g/MAC+5xKAE Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="0001-sv.c-run-Perl_op_dump-before-failing-assertion.patch" >From 03712da7eb875b18879662079279241cd4fee2d0 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Mon, 1 Feb 2021 05:42:42 +0000 Subject: [PATCH 1/2] sv.c: run Perl_op_dump before failing assertion The debugperl(1) binary packaged by Debian alerted me to an assertion in Perl_sv_clear failing at: assert(SvTYPE(sv) != (svtype)SVTYPEMASK); To get more information about what Perl code is executing, we can run Perl_op_dump on PL_op before triggering the assertion. --- sv.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sv.c b/sv.c index 07865bb..2c950e0 100644 --- a/sv.c +++ b/sv.c @@ -6510,6 +6510,9 @@ Perl_sv_clear(pTHX_ SV *const orig_sv) type = SvTYPE(sv); assert(SvREFCNT(sv) == 0); + if ((SvTYPE(sv) == (svtype)SVTYPEMASK)) { + Perl_op_dump(aTHX_ PL_op); + } assert(SvTYPE(sv) != (svtype)SVTYPEMASK); if (type <= SVt_IV) { --0OAP2g/MAC+5xKAE Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="0002-carp-set-G_ERR-in-environ-before-accessing-DB-args.patch" >From ef7a02def4468ac7301c120ece586caea7351c4d Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Mon, 1 Feb 2021 05:34:12 +0000 Subject: [PATCH 2/2] carp: set G_ERR in environ before accessing @DB::args In case accessing @DB::args causes a segfault and core dump; one can open the core dump with gdb(1) run "p environ[$IDX]" to figure out what message and sub_name is tickling Perl5's stack-not-refcounted behavior and causing a segfault. ($IDX is typically the last environment variable index in the environ(7) array, but it could be another number) --- dist/Carp/lib/Carp.pm | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/dist/Carp/lib/Carp.pm b/dist/Carp/lib/Carp.pm index 109b7fe..c87e373 100644 --- a/dist/Carp/lib/Carp.pm +++ b/dist/Carp/lib/Carp.pm @@ -302,6 +302,8 @@ BEGIN { } } +our $g_err; + sub caller_info { my $i = shift(@_) + 1; my %call_info; @@ -327,6 +329,7 @@ sub caller_info { my $sub_name = Carp::get_subname( \%call_info ); if ( $call_info{has_args} ) { +$ENV{G_ERR} = "$sub_name $g_err"; # Guard our serialization of the stack from stack refcounting bugs # NOTE this is NOT a complete solution, we cannot 100% guard against # these bugs. However in many cases Perl *is* capable of detecting @@ -594,6 +597,7 @@ sub ret_backtrace { $tid_msg = " thread $tid" if $tid; } +$g_err = $err; my %i = caller_info($i); $mess = "$err at $i{file} line $i{line}$tid_msg"; if( $. ) { @@ -635,6 +639,7 @@ sub ret_summary { my $tid = threads->tid; $tid_msg = " thread $tid" if $tid; } +$g_err = $err; my %i = caller_info($i); return "$err at $i{file} line $i{line}$tid_msg\.\n"; --0OAP2g/MAC+5xKAE--