user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH] doc: add technical/whyperl
@ 2020-04-07  9:49 Eric Wong
  2020-04-08  0:41 ` Kyle Meyer
       [not found] ` <87r1wyjlop.fsf@vuxu.org>
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Wong @ 2020-04-07  9:49 UTC (permalink / raw)
  To: meta

Some people don't like Perl; but it exists, there's no
avoiding it with everything that depends on it.  And
nearly all code still works unmodified after 20 years.
---
 Documentation/technical/whyperl.txt | 171 ++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)
 create mode 100644 Documentation/technical/whyperl.txt

diff --git a/Documentation/technical/whyperl.txt b/Documentation/technical/whyperl.txt
new file mode 100644
index 00000000..b0a0d16b
--- /dev/null
+++ b/Documentation/technical/whyperl.txt
@@ -0,0 +1,171 @@
+why public-inbox is currently implemented in Perl 5
+---------------------------------------------------
+
+While Perl has many detractors and there's a lot not to like
+about Perl, we use it anyways because it offers benefits not
+(yet) available from other languages.
+
+This document is somewhat inspired by https://sqlite.org/whyc.html
+
+Other languages and runtimes may eventually be a possibility
+for us, and this document can serve as our requirements list
+for possible replacements.
+
+As always, comments and corrections and additions welcome at
+<meta@public-inbox.org>.  We're not Perl experts, either.
+
+Good Things
+-----------
+
+* Availability
+
+  Perl 5 is installed on many, if not most GNU/Linux and
+  BSD-based servers and workstations.  It is likely the most
+  widely-installed programming environment that offers a
+  significant amount of POSIX functionality.  Users won't
+  have to waste bandwidth or space with giant toolchains or
+  architecture-specific binaries.
+
+  Furthermore, Perl documentation is typically installed as
+  manpages, allowing users to quickly access and learn it
+  offline.
+
+* Scripted, always editable by the end user
+
+  Users cannot lose access to the source code.  Code written
+  entirely in any scripting language automatically satisfies
+  the GPL-2.0, making it easier to satisfy the AGPL-3.0.
+
+  Use of a scripting language improves auditability for
+  malicious changes.  It also reduces storage and bandwidth
+  requirements for distributors, as the same scripts can be
+  shared across multiple OSes and architectures.
+
+  Perl's availability and the low barrier to entry of
+  scripting ensures it's easy for users to exercise their
+  software freedom.
+
+* Predictable performance
+
+  While Perl is neither fast or memory-efficient, its
+  performance and memory use are predictable and does not
+  require GC tuning by the user.
+
+  public-inbox is developed for (and mostly on) old
+  hardware.  Perl was fast enough to power the web of the
+  late 1990s, and any cheap VPS today has more than enough
+  RAM and CPU for handling plain-text email.
+
+  Low hardware requirements increases the reach of our software
+  to more users, improving centralization resistance.
+
+* Compatibility
+
+  Unlike similarly powerful scripting languages, there is no
+  forced migration to a major new version.  From 2000-2020,
+  Perl had fewer breaking changes than Python or Ruby; we
+  expect that trend to continue given the inertia of Perl 5.
+
+* Built for text processing
+
+  Our focus is plain-text mail, and Perl has many built-ins
+  optimized for text processing.  It also has good support
+  for UTF-8 and legacy encodings found in old mail archives.
+
+* Integration with distros and non-Perl libraries
+
+  Perl modules and bindings to common libraries such as
+  SQLite and Xapian are already distributed by many
+  GNU/Linux distros and BSD ports.
+
+  There should be no need to rely on language-specific
+  package managers such as cpan(1), those systems increase
+  the learning curve for users and systems administrators.
+
+* Compactness and terseness
+
+  Less code generally means less bugs.  We try to avoid the
+  "line noise" stereotype of some Perl codebases, yet still
+  manage to write less code than one would with
+  non-scripting languages.
+
+* Performance ceiling and escape hatch
+
+  With optional Inline::C, we can be "as fast as C" in some
+  cases.  Inline::C is widely-packaged by distros and it
+  gives us an escape hatch for dealing with missing bindings
+  or performance problems should they arise.  Inline::C use
+  (as opposed to XS) also preserves the software freedom and
+  auditability benefits to all users.
+
+  Unfortunately, most C toolchains are big; so Inline::C
+  will always be optional for users who cannot afford the
+  bandwidth or space.
+
+
+Bad Things
+----------
+
+* Slow startup time.  Tokenization, parsing, and compilation of
+  pure Perl is not cached.  Inline::C does cache its results,
+  however.
+
+  We work around slow startup times in tests by preloading
+  code, similar to how mod_perl works for CGI.
+
+* High space overhead and poor locality of small data
+  structures, including the optree.  This may not be fixable
+  in Perl itself given compatibility requirements of the C API.
+
+  These problems are exacerbated on modern 64-bit platforms,
+  though the Linux x32 ABI offers promise.
+
+* Lack of vectored I/O support (writev, sendmmsg, etc. syscalls)
+  and "newer" POSIX functions in general.  APIs end up being
+  slurpy, favoring large buffers and memory copies for
+  concatenation rather than rope (aka "cord") structures.
+
+* While mmap(2) is available via PerlIO::mmap, string ops
+  (m//, substr(), index(), etc.) still require memory copies
+  into userspace, negating a benefit of zero-copy.
+
+* The XS/C API make it difficult to improve internals while
+  preserving compatibility.
+
+* Lack of optional type checking.  This may be a blessing in
+  disguise, though, as it encourages us to simplify our data
+  models and lowers cognitive overhead.
+
+* SMP support is mostly limited to fork(), since many
+  libraries (including much of the standard library) are not
+  thread-safe.  Even with threads.pm, sharing data between
+  interpreters within the same process is inefficient due to
+  the lack of lock-free and wait-free data structures from
+  projects such as Userspace RCU.
+
+* Process spawning speed degrades as memory use increases.
+  We work around this optionally via Inline::C and vfork(2),
+  since Perl lacks an approximation of posix_spawn(3).
+
+  We also use `undef' and `delete' ops to free large buffers
+  as soon as we're done using them to save memory.
+
+
+Red herrings to ignore when evaluating other runtimes
+-----------------------------------------------------
+
+These don't discount a language or runtime from being
+being used, they're just not interesting.
+
+* Lightweight threading
+
+  While lightweight threading implementations are
+  convenient, they tend to be significantly heavier than a
+  pure event-loop systems (or multi-threaded event-loop
+  systems)
+
+  Lightweight threading implementations have stack overhead
+  and growth typically measured in kilobytes.  The userspace
+  state overhead of event-based systems is an order of
+  magnitude less, and a sunk cost regardless of concurrency
+  model.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] doc: add technical/whyperl
  2020-04-07  9:49 [PATCH] doc: add technical/whyperl Eric Wong
@ 2020-04-08  0:41 ` Kyle Meyer
  2020-04-08 22:26   ` Eric Wong
       [not found] ` <87r1wyjlop.fsf@vuxu.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Kyle Meyer @ 2020-04-08  0:41 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

No substantial comments, just some typos spotted while reading through.

Eric Wong <e@yhbt.net> writes:

> +As always, comments and corrections and additions welcome at

s/welcome/are welcome/ ?

> +* Predictable performance
> +
> +  While Perl is neither fast or memory-efficient, its

s/or/nor/

> +  performance and memory use are predictable and does not

s/does/do/

> +  require GC tuning by the user.
> +
> +  public-inbox is developed for (and mostly on) old
> +  hardware.  Perl was fast enough to power the web of the
> +  late 1990s, and any cheap VPS today has more than enough
> +  RAM and CPU for handling plain-text email.
> +
> +  Low hardware requirements increases the reach of our software

s/increases/increase/

> +* Integration with distros and non-Perl libraries
> +
> +  Perl modules and bindings to common libraries such as
> +  SQLite and Xapian are already distributed by many
> +  GNU/Linux distros and BSD ports.
> +
> +  There should be no need to rely on language-specific
> +  package managers such as cpan(1), those systems increase

s/, those/.  Those/

> +* Compactness and terseness
> +
> +  Less code generally means less bugs.  We try to avoid the

s/less bugs/fewer bugs/

> +* Lightweight threading
> +
> +  While lightweight threading implementations are
> +  convenient, they tend to be significantly heavier than a
> +  pure event-loop systems (or multi-threaded event-loop

s/a pure/pure/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] doc: add technical/whyperl
       [not found] ` <87r1wyjlop.fsf@vuxu.org>
@ 2020-04-08 22:17   ` Eric Wong
  2020-04-08 22:37     ` Leah Neukirchen
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-04-08 22:17 UTC (permalink / raw)
  To: Leah Neukirchen; +Cc: meta

Leah Neukirchen <leah@vuxu.org> wrote:

Did you forget reply-all?  Re-adding Cc: meta@public-inbox.org
since you've already posted there in the past.

> Eric Wong <e@yhbt.net> writes:
> 
> > Some people don't like Perl; but it exists, there's no
> > avoiding it with everything that depends on it.  And
> > nearly all code still works unmodified after 20 years.
> 
> > +  Furthermore, Perl documentation is typically installed as
> > +  manpages, allowing users to quickly access and learn it
> > +  offline.
> 
> I absolutely don't mind using Perl, but I cannot say this is true.
> I tried to learn Perl from the man pages, and things got sooo much
> clearer once I picked up the Camel book.  The man pages are good for
> reference, but not for learning from scratch.

Interesting, thanks for the comment.  I'll reword it, how about:

	Furthermore, Perl documentation is typically installed
	locally as manpages, allowing users to quickly refer
	to documentation as needed.

?

Fwiw, I guess my own learning experiences are atypical.  I've
never picked up a book on Perl or any language/skill I'm
familiar with.  Mainly source diving, manpages, lurking on
mailing lists, archives, newsgroups, etc. along with lots of
trial-and-error.

It's all very scattered (like my brain :x), and I suppose it's
reflected in how public-inbox works and gets developed;
including my aversion to centralization and unnecessary
structure in things such as trackers with labels/priorities.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] doc: add technical/whyperl
  2020-04-08  0:41 ` Kyle Meyer
@ 2020-04-08 22:26   ` Eric Wong
  2020-04-09  1:14     ` Kyle Meyer
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-04-08 22:26 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> No substantial comments, just some typos spotted while reading through.

Thanks :>

> Eric Wong <e@yhbt.net> writes:
> 
> > +As always, comments and corrections and additions welcome at
> 
> s/welcome/are welcome/ ?

Umm... I guess?  Would omitting "are" would only be valid
if there were a single item?

	As always, $FOO welcome at

Upon a second read of the original, having "and" twice doesn't
read well to me.  So final form should probably be:

	As always, comments, corrections and additions are welcome at

> > +* Predictable performance
> > +
> > +  While Perl is neither fast or memory-efficient, its
> 
> s/or/nor/

I had to look that up, but yes, "nor" belongs there.

> > +  performance and memory use are predictable and does not
> 
> s/does/do/
> 
> > +  require GC tuning by the user.

I think "does" reads better, there, but the sentence runs on
for too long.  I don't think the GC part needs to be there(*)
Perhaps just:

	While Perl is neither fast nor memory-efficient, its
	performance and memory use are predictable.

(*) I think memory management requires a standalone document.

> > +  Low hardware requirements increases the reach of our software
> 
> s/increases/increase/

Yes.  I think my original wording was more about a continual
process (which it is):

	Lowering hardware requirements increases

But I edited it since requirements have always been fairly low.

> > +  Perl modules and bindings to common libraries such as
> > +  SQLite and Xapian are already distributed by many
> > +  GNU/Linux distros and BSD ports.
> > +
> > +  There should be no need to rely on language-specific
> > +  package managers such as cpan(1), those systems increase
> 
> s/, those/.  Those/

I was trying to decide between them.  I thought a separate
sentence would make it less obvious "those" was referring to
package managers.  In retrospect there's no ambiguity since
the entire paragraph is about language-specific package
managers.

> > +* Compactness and terseness
> > +
> > +  Less code generally means less bugs.  We try to avoid the
> 
> s/less bugs/fewer bugs/

I was thinking repeating "less" might be more memorable.
As in: "mo money, mo problems"; "less code, less bugs" :>
But yeah, "fewer" is probably better in a full sentence.

> > +* Lightweight threading
> > +
> > +  While lightweight threading implementations are
> > +  convenient, they tend to be significantly heavier than a
> > +  pure event-loop systems (or multi-threaded event-loop
> 
> s/a pure/pure/

Oops, yes.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] doc: add technical/whyperl
  2020-04-08 22:17   ` Eric Wong
@ 2020-04-08 22:37     ` Leah Neukirchen
  0 siblings, 0 replies; 6+ messages in thread
From: Leah Neukirchen @ 2020-04-08 22:37 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> writes:

> Leah Neukirchen <leah@vuxu.org> wrote:
>
> Did you forget reply-all?  Re-adding Cc: meta@public-inbox.org
> since you've already posted there in the past.

I was more meant as a off-side remark, but we can also do this on the
list.

>> Eric Wong <e@yhbt.net> writes:
>> 
>> > Some people don't like Perl; but it exists, there's no
>> > avoiding it with everything that depends on it.  And
>> > nearly all code still works unmodified after 20 years.
>> 
>> > +  Furthermore, Perl documentation is typically installed as
>> > +  manpages, allowing users to quickly access and learn it
>> > +  offline.
>> 
>> I absolutely don't mind using Perl, but I cannot say this is true.
>> I tried to learn Perl from the man pages, and things got sooo much
>> clearer once I picked up the Camel book.  The man pages are good for
>> reference, but not for learning from scratch.
>
> Interesting, thanks for the comment.  I'll reword it, how about:
>
> 	Furthermore, Perl documentation is typically installed
> 	locally as manpages, allowing users to quickly refer
> 	to documentation as needed.
>
> ?

This sounds better to me.

> Fwiw, I guess my own learning experiences are atypical.  I've
> never picked up a book on Perl or any language/skill I'm
> familiar with.  Mainly source diving, manpages, lurking on
> mailing lists, archives, newsgroups, etc. along with lots of
> trial-and-error.

So I just looked into perlintro and the word context appears like
three times and is not explained.  I think picking up some core
concepts is super hard from the included docs.

cu,
-- 
Leah Neukirchen  <leah@vuxu.org>  https://leahneukirchen.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] doc: add technical/whyperl
  2020-04-08 22:26   ` Eric Wong
@ 2020-04-09  1:14     ` Kyle Meyer
  0 siblings, 0 replies; 6+ messages in thread
From: Kyle Meyer @ 2020-04-09  1:14 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> writes:

> Kyle Meyer <kyle@kyleam.com> wrote:
>
>> Eric Wong <e@yhbt.net> writes:
>> 
>> > +As always, comments and corrections and additions welcome at
>> 
>> s/welcome/are welcome/ ?
>
> Umm... I guess?  Would omitting "are" would only be valid
> if there were a single item?
>
> 	As always, $FOO welcome at

I guess dropping an "is" in the singular case sounds a little better to
my ears than dropping "are" in the plural case.  I think the original
was fine, though.  I stumbled on it for whatever reason and would write
"are" there myself, but it probably wasn't worth me noting, even if I
hid behind a "?" :>

> Upon a second read of the original, having "and" twice doesn't
> read well to me.  So final form should probably be:
>
> 	As always, comments, corrections and additions are welcome at

Yes, I too would prefer dropping the first "and".

>> > +  performance and memory use are predictable and does not
>> 
>> s/does/do/
>> 
>> > +  require GC tuning by the user.
>
> I think "does" reads better, there, but the sentence runs on
> for too long.  I don't think the GC part needs to be there(*)

Hmm, okay.  I think I misidentified what the intended subject was there.
I took the subject as "performance and memory use", which case there is
a tense mismatch (and it sticks out because "are" is used in the
previous sentence).  And it doesn't matter because it's getting cut :)

Thanks for writing up this rationale.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-07  9:49 [PATCH] doc: add technical/whyperl Eric Wong
2020-04-08  0:41 ` Kyle Meyer
2020-04-08 22:26   ` Eric Wong
2020-04-09  1:14     ` Kyle Meyer
     [not found] ` <87r1wyjlop.fsf@vuxu.org>
2020-04-08 22:17   ` Eric Wong
2020-04-08 22:37     ` Leah Neukirchen

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git