user/dev discussion of public-inbox itself
 help / color / Atom feed
* IMAP server notes, maybe JMAP?
@ 2020-06-09 11:34 Eric Wong
  2020-06-15  6:21 ` Parse::RecDescent dependency (was: IMAP server notes, maybe JMAP?) Eric Wong
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Wong @ 2020-06-09 11:34 UTC (permalink / raw)
  To: meta

OK, so I almost have something that won't kill clients or
trigger OOMs on the server.  I think I'll have to implement MSN
(message sequence numbers) properly for some clients, cheaply.

I know there's also interest in getting search usable via an
HTTP(S) API, so maybe JMAP[1] is worth looking into since it
seems like an easier-to-implement take on IMAP; and both
have search.  We already use JSON for manifest.js.gz...

[1] https://lwn.net/Articles/680057/ - (a look at JMAP).

GraphQL has also come up privately, but at a glance it seems to
suffer some of the same problems with IMAP in offering excessive
granularity and cache-unfriendliness.   But anyways, I'm close
to having a pretty good read-only IMAP server.  JMAP cannot be
any harder than IMAP right? :)

Anyways, the IMAP server is based on the existing NNTP server
code; but also provides a fresh place to test new ideas for
improving scalability and performance:

* It's fair to other clients if using slow blob storage,
  and I've gotten to be fairly happy with the API.  Slow
  Xapian storage isn't accounted for, yet, but will be.
  Aggressive pipelining in mbsync(1) helped me get async
  bugs ironed out.

* LIST wildcard matching/globbing tested with 100K inboxes,
  other pieces still need work.

* Incremental config reloading on SIGHUP works, but
  could be more granular.

* IMAP IDLE supported for EXISTS (but not EXPUNGE for spam).

* Inotify (or EVFILT_VNODE) used for IMAP IDLE notifications.
  More inotify/EVFILT_VNODE usage to come for manifest.js.gz
  updates, git-cat-file restarts, DB reopens, etc.  I've used
  inotify a bunch in the past, but not in Perl.  This is my
  first time with EVFILT_VNODE, but inotify seems more capable.

* --reindex is recommended for the RFC822.SIZE fetch
  attribute to account for CRLF conversion.  Actually,
  we've been wrong about NNTP :bytes reporting for years,
  too, but IMAP clients (at least Mail::IMAPClient) actually
  complain.

* Common cases of Xapian search work, but also require
  --reindex.  Reindex is doable while serving, results just
  won't show until reindexed.

* COMPRESS, STARTTLS, TLS are all inherited from -nntpd

* PublicInbox::Eml can descend in to message/* subparts
  properly, which makes it possible to implement stuff
  like BODYSTRUCTURE properly.  And who uses BODYSTRUCTURE?

And a note:

* Don't create 100K public-inboxes in the worktree itself,
  MakeMaker looks inside all directories when doing
  "perl Makefile.PL" :x

TODO:

* Queries involving OR, NOT, and parentheses don't work, yet,
  since Xapian's default query parser works differently than
  the prefix (Polish) notation of IMAP.

* IMAP extensions are worth looking into, especially ones
  around search.  Some of them look interesting w.r.t. search,
  along with overlap with JMAP.

* There's also thread-related stuff and that may be able to
  implement "Search based on data in follow-ups"
  https://public-inbox.org/meta/20200526191745.34vynfasnf3amyjq@chatter.i7.local/

* An "All Mail" mailbox could be cool for IMAP/JMAP search
  (it's planned for HTTP, anyways)

I hit numerous bugs in 3rd-party libraries while working
on the server, all of them involving around compression:

* Python imaplib2 (via offlineimap)

  - compress timeout: https://bugs.debian.org/961713

* Mail::IMAPClient

  - compress reference cycle:
    https://rt.cpan.org/Ticket/Display.html?id=132654

  - compress read starvation:
    https://rt.cpan.org/Ticket/Display.html?id=132720

* Compress::Raw::Zlib

  - inflate appending to OOK scalars
    https://rt.cpan.org/Ticket/Display.html?id=132734
    mbsync(1) helped me expose this bug

Both Mail::IMAPClient and Mail::IMAPTalk do weird sleeps with
select(2) and SSL; maybe as a holdover from the days when
IO::Socket::SSL lacked SSL_WANT_READ/SSL_WANT_WRITE?


headaches for dealing with MUAs:

* Inboxes are split into 50K slices to avoid overloading MUAs.

  The word "slice" was not previously used in our codebase.
  Conceptually, it's like v2 "epochs".  Epochs are to deal
  with the limitations of git clients, slices are to deal
  with the limitations of IMAP clients.

* HEADER.FIELDS retrieval still requires taking the whole
  blob from git.  Clients can request any header(s) they
  wish; unlike NNTP, where the server defines the overview.
  I've managed to speed this up significantly with a little
  pure-Perl opcode compiler, though :)

* MSNs (message sequence numbers) seem required to get decent
  performance from mutt and maybe other MUAs.  Showing fake
  dummy messages for removed spam is poor UX, too...

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Parse::RecDescent dependency (was: IMAP server notes, maybe JMAP?)
  2020-06-09 11:34 IMAP server notes, maybe JMAP? Eric Wong
@ 2020-06-15  6:21 ` Eric Wong
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Wong @ 2020-06-15  6:21 UTC (permalink / raw)
  To: meta

Eric Wong <e@yhbt.net> wrote:
> I know there's also interest in getting search usable via an
> HTTP(S) API, so maybe JMAP[1] is worth looking into since it
> seems like an easier-to-implement take on IMAP; and both
> have search.  We already use JSON for manifest.js.gz...
> 
> [1] https://lwn.net/Articles/680057/ - (a look at JMAP).
> 
> GraphQL has also come up privately, but at a glance it seems to
> suffer some of the same problems with IMAP in offering excessive
> granularity and cache-unfriendliness.   But anyways, I'm close
> to having a pretty good read-only IMAP server.  JMAP cannot be
> any harder than IMAP right? :)

<snip>

> TODO:
> 
> * Queries involving OR, NOT, and parentheses don't work, yet,
>   since Xapian's default query parser works differently than
>   the prefix (Polish) notation of IMAP.

So I've been giving Parse::RecDescent a try and it seems like an
acceptable dependency for IMAP (and possibly other) search query
parsing if we need it.  It's widely-packaged by distros with
many dependents, including Inline::C and Mail::IMAPClient; so
it's likely already installed.  I've been aware of it since
~2004 but it's my first time actually writing code to use it.

I also took a look at Regexp::Grammars, but it's unsupported on
Perl 5.18.[0-3] (I'm not sure if any distros package those versions)
and has fewer users.

Perl 5.10 regexps already support recursive descent with
(?<name>...) and (?&name) but it seems like a pain to use and
(?{ code }) behavior isn't very stable and IIRC a somewhat
common source of bugs in Perl itself.

I've also heard good things about Marpa, but it's not
packaged for CentOS 7.x and FreeBSD is missing the latest
release (or Debian is missing the stable release :x)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-09 11:34 IMAP server notes, maybe JMAP? Eric Wong
2020-06-15  6:21 ` Parse::RecDescent dependency (was: IMAP server notes, maybe JMAP?) Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git