From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH] doc: add technical/whyperl
Date: Tue, 7 Apr 2020 04:49:40 -0500 [thread overview]
Message-ID: <20200407094940.14962-1-e@yhbt.net> (raw)
Some people don't like Perl; but it exists, there's no
avoiding it with everything that depends on it. And
nearly all code still works unmodified after 20 years.
---
Documentation/technical/whyperl.txt | 171 ++++++++++++++++++++++++++++
1 file changed, 171 insertions(+)
create mode 100644 Documentation/technical/whyperl.txt
diff --git a/Documentation/technical/whyperl.txt b/Documentation/technical/whyperl.txt
new file mode 100644
index 00000000..b0a0d16b
--- /dev/null
+++ b/Documentation/technical/whyperl.txt
@@ -0,0 +1,171 @@
+why public-inbox is currently implemented in Perl 5
+---------------------------------------------------
+
+While Perl has many detractors and there's a lot not to like
+about Perl, we use it anyways because it offers benefits not
+(yet) available from other languages.
+
+This document is somewhat inspired by https://sqlite.org/whyc.html
+
+Other languages and runtimes may eventually be a possibility
+for us, and this document can serve as our requirements list
+for possible replacements.
+
+As always, comments and corrections and additions welcome at
+<meta@public-inbox.org>. We're not Perl experts, either.
+
+Good Things
+-----------
+
+* Availability
+
+ Perl 5 is installed on many, if not most GNU/Linux and
+ BSD-based servers and workstations. It is likely the most
+ widely-installed programming environment that offers a
+ significant amount of POSIX functionality. Users won't
+ have to waste bandwidth or space with giant toolchains or
+ architecture-specific binaries.
+
+ Furthermore, Perl documentation is typically installed as
+ manpages, allowing users to quickly access and learn it
+ offline.
+
+* Scripted, always editable by the end user
+
+ Users cannot lose access to the source code. Code written
+ entirely in any scripting language automatically satisfies
+ the GPL-2.0, making it easier to satisfy the AGPL-3.0.
+
+ Use of a scripting language improves auditability for
+ malicious changes. It also reduces storage and bandwidth
+ requirements for distributors, as the same scripts can be
+ shared across multiple OSes and architectures.
+
+ Perl's availability and the low barrier to entry of
+ scripting ensures it's easy for users to exercise their
+ software freedom.
+
+* Predictable performance
+
+ While Perl is neither fast or memory-efficient, its
+ performance and memory use are predictable and does not
+ require GC tuning by the user.
+
+ public-inbox is developed for (and mostly on) old
+ hardware. Perl was fast enough to power the web of the
+ late 1990s, and any cheap VPS today has more than enough
+ RAM and CPU for handling plain-text email.
+
+ Low hardware requirements increases the reach of our software
+ to more users, improving centralization resistance.
+
+* Compatibility
+
+ Unlike similarly powerful scripting languages, there is no
+ forced migration to a major new version. From 2000-2020,
+ Perl had fewer breaking changes than Python or Ruby; we
+ expect that trend to continue given the inertia of Perl 5.
+
+* Built for text processing
+
+ Our focus is plain-text mail, and Perl has many built-ins
+ optimized for text processing. It also has good support
+ for UTF-8 and legacy encodings found in old mail archives.
+
+* Integration with distros and non-Perl libraries
+
+ Perl modules and bindings to common libraries such as
+ SQLite and Xapian are already distributed by many
+ GNU/Linux distros and BSD ports.
+
+ There should be no need to rely on language-specific
+ package managers such as cpan(1), those systems increase
+ the learning curve for users and systems administrators.
+
+* Compactness and terseness
+
+ Less code generally means less bugs. We try to avoid the
+ "line noise" stereotype of some Perl codebases, yet still
+ manage to write less code than one would with
+ non-scripting languages.
+
+* Performance ceiling and escape hatch
+
+ With optional Inline::C, we can be "as fast as C" in some
+ cases. Inline::C is widely-packaged by distros and it
+ gives us an escape hatch for dealing with missing bindings
+ or performance problems should they arise. Inline::C use
+ (as opposed to XS) also preserves the software freedom and
+ auditability benefits to all users.
+
+ Unfortunately, most C toolchains are big; so Inline::C
+ will always be optional for users who cannot afford the
+ bandwidth or space.
+
+
+Bad Things
+----------
+
+* Slow startup time. Tokenization, parsing, and compilation of
+ pure Perl is not cached. Inline::C does cache its results,
+ however.
+
+ We work around slow startup times in tests by preloading
+ code, similar to how mod_perl works for CGI.
+
+* High space overhead and poor locality of small data
+ structures, including the optree. This may not be fixable
+ in Perl itself given compatibility requirements of the C API.
+
+ These problems are exacerbated on modern 64-bit platforms,
+ though the Linux x32 ABI offers promise.
+
+* Lack of vectored I/O support (writev, sendmmsg, etc. syscalls)
+ and "newer" POSIX functions in general. APIs end up being
+ slurpy, favoring large buffers and memory copies for
+ concatenation rather than rope (aka "cord") structures.
+
+* While mmap(2) is available via PerlIO::mmap, string ops
+ (m//, substr(), index(), etc.) still require memory copies
+ into userspace, negating a benefit of zero-copy.
+
+* The XS/C API make it difficult to improve internals while
+ preserving compatibility.
+
+* Lack of optional type checking. This may be a blessing in
+ disguise, though, as it encourages us to simplify our data
+ models and lowers cognitive overhead.
+
+* SMP support is mostly limited to fork(), since many
+ libraries (including much of the standard library) are not
+ thread-safe. Even with threads.pm, sharing data between
+ interpreters within the same process is inefficient due to
+ the lack of lock-free and wait-free data structures from
+ projects such as Userspace RCU.
+
+* Process spawning speed degrades as memory use increases.
+ We work around this optionally via Inline::C and vfork(2),
+ since Perl lacks an approximation of posix_spawn(3).
+
+ We also use `undef' and `delete' ops to free large buffers
+ as soon as we're done using them to save memory.
+
+
+Red herrings to ignore when evaluating other runtimes
+-----------------------------------------------------
+
+These don't discount a language or runtime from being
+being used, they're just not interesting.
+
+* Lightweight threading
+
+ While lightweight threading implementations are
+ convenient, they tend to be significantly heavier than a
+ pure event-loop systems (or multi-threaded event-loop
+ systems)
+
+ Lightweight threading implementations have stack overhead
+ and growth typically measured in kilobytes. The userspace
+ state overhead of event-based systems is an order of
+ magnitude less, and a sunk cost regardless of concurrency
+ model.
next reply other threads:[~2020-04-07 9:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-07 9:49 Eric Wong [this message]
2020-04-08 0:41 ` [PATCH] doc: add technical/whyperl Kyle Meyer
2020-04-08 22:26 ` Eric Wong
2020-04-09 1:14 ` Kyle Meyer
[not found] ` <87r1wyjlop.fsf@vuxu.org>
2020-04-08 22:17 ` Eric Wong
2020-04-08 22:37 ` Leah Neukirchen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200407094940.14962-1-e@yhbt.net \
--to=e@yhbt.net \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).