user/dev discussion of public-inbox itself
 help / color / Atom feed
* [RFC] make external urls user configurable
@ 2019-06-04 22:41 Ali Alnubani
  2019-06-05  0:58 ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Ali Alnubani @ 2019-06-04 22:41 UTC (permalink / raw)
  To: meta

The configuration variable publicinbox.exturls will
hold a comma-delimited list of external urls to point
to in case a thread wasn't found.
This configuration will replace the default hard-coded
list in the module 'PublicInbox::ExtMsg' (lib/PublicInbox/ExtMsg.pm:17).

An example configuration:

[publicinbox]
	exturls=//marc.info/?i=%s,\
	//www.mail-archive.com/search?l=mid&q=%s,\
	nntp://news.gmane.org/%s,\
	https://lists.debian.org/msgid-search/%s,\
	//docs.FreeBSD.org/cgi/mid.cgi?db=mid&id=%s,\
	https://www.w3.org/mid/%s,\
	http://www.postgresql.org/message-id/%s,\
	; The following entry has to be a single line
	https://lists.debconf.org/cgi-lurker/keyword.cgi?doc-url=/lurker&
	format=en.html&query=id:%s

---
We started using public-inbox for dpdk.org (http://inbox.dpdk.org/dev/),
and most of our mailing lists aren't archived by these external
websites.

This still needs further improvements.

 lib/PublicInbox/ExtMsg.pm | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/ExtMsg.pm b/lib/PublicInbox/ExtMsg.pm
index d07d5a7..b1f6528 100644
--- a/lib/PublicInbox/ExtMsg.pm
+++ b/lib/PublicInbox/ExtMsg.pm
@@ -13,20 +13,6 @@ use PublicInbox::MID qw/mid2path/;
 use PublicInbox::WwwStream;
 our $MIN_PARTIAL_LEN = 16;
 
-# TODO: user-configurable
-our @EXT_URL = map { ascii_html($_) } (
-	# leading "//" denotes protocol-relative (http:// or https://)
-	'//marc.info/?i=%s',
-	'//www.mail-archive.com/search?l=mid&q=%s',
-	'nntp://news.gmane.org/%s',
-	'https://lists.debian.org/msgid-search/%s',
-	'//docs.FreeBSD.org/cgi/mid.cgi?db=mid&id=%s',
-	'https://www.w3.org/mid/%s',
-	'http://www.postgresql.org/message-id/%s',
-	'https://lists.debconf.org/cgi-lurker/keyword.cgi?'.
-		'doc-url=/lurker&format=en.html&query=id:%s'
-);
-
 sub PARTIAL_MAX () { 100 }
 
 sub search_partial ($$) {
@@ -166,7 +152,9 @@ sub ext_urls {
 	if (@EXT_URL && index($mid, '@') >= 0) {
 		my $env = $ctx->{env};
 		my $e = "\nPerhaps try an external site:\n\n";
-		foreach my $url (@EXT_URL) {
+		my @exturls = grep { /\S/ } map { ascii_html($_) } (
+			split(/[\s,]+/, $ctx->{www}->{pi_config}->{'publicinbox.exturls'}));
+		foreach my $url (@exturls) {
 			my $u = PublicInbox::Hval::prurl($env, $url);
 			my $r = sprintf($u, $href);
 			my $t = sprintf($u, $html);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] make external urls user configurable
  2019-06-04 22:41 [RFC] make external urls user configurable Ali Alnubani
@ 2019-06-05  0:58 ` Eric Wong
  2019-06-09  7:16   ` Ali Alnubani
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2019-06-05  0:58 UTC (permalink / raw)
  To: Ali Alnubani; +Cc: meta

Ali Alnubani <alialnu@mellanox.com> wrote:
> We started using public-inbox for dpdk.org (http://inbox.dpdk.org/dev/),

Cool!  I'm actually subscribed to some DPDK lists from a
previous life and haven't checked those inboxes in a while :x

> and most of our mailing lists aren't archived by these external
> websites.

But I figure there'll be some amount of cross-posting and
references across projects.  I'm planning to expand the
Linkify component to be able to URL-ify <$MESSAGE_ID>-looking
things (some newsreaders already do that).

And I really want to encourage more cross-communication
between projects via email :)

Case in point: you could discuss this configuration/patch
with DPDK colleagues and link to the inbox to this project;
or something Linux-related with LKML; or FreeBSD-specific
stuff with FreeBSD folks.

Eventually, it would be nice to have some sort of decentralized
Message-ID lookup service which works across many archives w/o a
central point of failure or carrying around a giant list of
inboxes.

> This still needs further improvements.

Yup :>

>  lib/PublicInbox/ExtMsg.pm | 18 +++---------------
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/PublicInbox/ExtMsg.pm b/lib/PublicInbox/ExtMsg.pm
> index d07d5a7..b1f6528 100644
> --- a/lib/PublicInbox/ExtMsg.pm
> +++ b/lib/PublicInbox/ExtMsg.pm
> @@ -13,20 +13,6 @@ use PublicInbox::MID qw/mid2path/;
>  use PublicInbox::WwwStream;
>  our $MIN_PARTIAL_LEN = 16;
>  
> -# TODO: user-configurable
> -our @EXT_URL = map { ascii_html($_) } (
> -	# leading "//" denotes protocol-relative (http:// or https://)
> -	'//marc.info/?i=%s',
> -	'//www.mail-archive.com/search?l=mid&q=%s',
> -	'nntp://news.gmane.org/%s',
> -	'https://lists.debian.org/msgid-search/%s',
> -	'//docs.FreeBSD.org/cgi/mid.cgi?db=mid&id=%s',
> -	'https://www.w3.org/mid/%s',
> -	'http://www.postgresql.org/message-id/%s',
> -	'https://lists.debconf.org/cgi-lurker/keyword.cgi?'.
> -		'doc-url=/lurker&format=en.html&query=id:%s'
> -);
> -

The default needs to remain; and may be further expanded.

Instead of forcing existing users to reconfigure, I suggest
allowing a "-" prefix to remove unwanted exturls.  git-config
also allows multi-value keys, so no need for '\' continuations.

How about allowing "= +$URL" to prepend to the default list,
and "= -$URL" to remove from the list.

[publicinbox]
	exturls = -//marc.info/?i=%s
	exturls = -//www.mail-archive.com/search?l=mid&q=%s
	...

	exturls = +//mid.dpdk.org/%s

Maybe exturls could accept an empty value at the top to clobber
all previous values, but I'm not a fan of supporting this:

	exturls =

I also think it should be possible to configure these overrides
on a per-inbox basis (but global overrides would still be respected)

>  sub search_partial ($$) {
> @@ -166,7 +152,9 @@ sub ext_urls {
>  	if (@EXT_URL && index($mid, '@') >= 0) {
>  		my $env = $ctx->{env};
>  		my $e = "\nPerhaps try an external site:\n\n";
> -		foreach my $url (@EXT_URL) {
> +		my @exturls = grep { /\S/ } map { ascii_html($_) } (
> +			split(/[\s,]+/, $ctx->{www}->{pi_config}->{'publicinbox.exturls'}));
> +		foreach my $url (@exturls) {
>  			my $u = PublicInbox::Hval::prurl($env, $url);
>  			my $r = sprintf($u, $href);
>  			my $t = sprintf($u, $html);

Line is too long (<80 columns).  I'm sensitive to this since I
need giant fonts and can't fit much more on screen; and my
vision will only get worse as I age :<

Anyways, thanks for taking a look at this and looking forward
to future revisions :>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [RFC] make external urls user configurable
  2019-06-05  0:58 ` Eric Wong
@ 2019-06-09  7:16   ` Ali Alnubani
  0 siblings, 0 replies; 3+ messages in thread
From: Ali Alnubani @ 2019-06-09  7:16 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

> -----Original Message-----
> From: Eric Wong <e@80x24.org>
> Sent: Wednesday, June 5, 2019 3:59 AM
> To: Ali Alnubani <alialnu@mellanox.com>
> Cc: meta@public-inbox.org
> Subject: Re: [RFC] make external urls user configurable
> 
> Ali Alnubani <alialnu@mellanox.com> wrote:
> > We started using public-inbox for dpdk.org
> >
<removed>
> 
> Cool!  I'm actually subscribed to some DPDK lists from a previous life and
> haven't checked those inboxes in a while :x
> 
> > and most of our mailing lists aren't archived by these external
> > websites.
> 
> But I figure there'll be some amount of cross-posting and references across
> projects.  I'm planning to expand the Linkify component to be able to URL-ify
> <$MESSAGE_ID>-looking things (some newsreaders already do that).
> 
> And I really want to encourage more cross-communication between projects
> via email :)
> 
> Case in point: you could discuss this configuration/patch with DPDK
> colleagues and link to the inbox to this project; or something Linux-related
> with LKML; or FreeBSD-specific stuff with FreeBSD folks.
> 
> Eventually, it would be nice to have some sort of decentralized Message-ID
> lookup service which works across many archives w/o a central point of
> failure or carrying around a giant list of inboxes.
> 

Agree. I will discuss it with the maintainer.

> > This still needs further improvements.
> 
> Yup :>
> 
> >  lib/PublicInbox/ExtMsg.pm | 18 +++---------------
> >  1 file changed, 3 insertions(+), 15 deletions(-)
> >
> > diff --git a/lib/PublicInbox/ExtMsg.pm b/lib/PublicInbox/ExtMsg.pm
> > index d07d5a7..b1f6528 100644
> > --- a/lib/PublicInbox/ExtMsg.pm
> > +++ b/lib/PublicInbox/ExtMsg.pm
> > @@ -13,20 +13,6 @@ use PublicInbox::MID qw/mid2path/;  use
> > PublicInbox::WwwStream;  our $MIN_PARTIAL_LEN = 16;
> >
<removed>
> 
> The default needs to remain; and may be further expanded.
> 
> Instead of forcing existing users to reconfigure, I suggest allowing a "-" prefix
> to remove unwanted exturls.  git-config also allows multi-value keys, so no
> need for '\' continuations.
> 
> How about allowing "= +$URL" to prepend to the default list, and "= -$URL"
> to remove from the list.
> 
> [publicinbox]
> 	exturls = -//marc.info/?i=%s
> 	exturls = -
<removed>
> 	...
> 
> 	exturls = +//mid.dpdk.org/%s
> 
> Maybe exturls could accept an empty value at the top to clobber all previous
> values, but I'm not a fan of supporting this:
> 
> 	exturls =
> 
> I also think it should be possible to configure these overrides on a per-inbox
> basis (but global overrides would still be respected)
> 
> >  sub search_partial ($$) {
> > @@ -166,7 +152,9 @@ sub ext_urls {
> >  	if (@EXT_URL && index($mid, '@') >= 0) {
> >  		my $env = $ctx->{env};
> >  		my $e = "\nPerhaps try an external site:\n\n";
> > -		foreach my $url (@EXT_URL) {
> > +		my @exturls = grep { /\S/ } map { ascii_html($_) } (
> > +			split(/[\s,]+/, $ctx->{www}->{pi_config}-
> >{'publicinbox.exturls'}));
> > +		foreach my $url (@exturls) {
> >  			my $u = PublicInbox::Hval::prurl($env, $url);
> >  			my $r = sprintf($u, $href);
> >  			my $t = sprintf($u, $html);
> 
> Line is too long (<80 columns).  I'm sensitive to this since I need giant fonts
> and can't fit much more on screen; and my vision will only get worse as I age
> :<

Thanks for the suggestions, I will update and send a v2.

> 
> Anyways, thanks for taking a look at this and looking forward to future
> revisions :>

Thank you for your time reviewing this, I really appreciate it.
-Ali

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-04 22:41 [RFC] make external urls user configurable Ali Alnubani
2019-06-05  0:58 ` Eric Wong
2019-06-09  7:16   ` Ali Alnubani

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox