user/dev discussion of public-inbox itself
 help / color / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
						download: 
* [PATCH] import: drop '<' and '>' characters in addresses
  2020-02-25  9:28  0% ` weird From: lines [was: Two small issues when importing old archives] Eric Wong
@ 2020-02-26 10:21  0%   ` Eric Wong
  0 siblings, 0 replies; 6+ results
From: Eric Wong @ 2020-02-26 10:21 UTC (permalink / raw)
  To: Leah Neukirchen; +Cc: meta

Eric Wong <e@yhbt.net> wrote:
> Leah Neukirchen <leah@vuxu.org> wrote:
> > 2) Weird From: lines crash the whole import
> > 
> > From: "=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de
> > 
> > This funny line broke import_maildir:
> > 
> > fatal: Missing > in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet <"=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de> 1101853296 +0100
> > fast-import: dumping crash report to /var/lib/public-inbox/repositories/ding.git/fast_import_crash_31402
> > EOF from fast-import:  at /usr/share/perl5/vendor_perl/PublicInbox/Import.pm line 96, <$r> line 54681.
> > 
> > I fixed it manually.  (But I think it's actually a valid mail address,
> > even in this botched state.)  I'm not sure what added the ">", it's
> > not in the original mail.
> > 
> > (I use public-inbox-1.3.0/git-2.25.0 on Void Linux.)
> 
> Gah, this looks like it's because Email::Address::XS leaves a
> "<" in the name...   Perhaps Import should delete all [<>]
> characters unconditionally? (or swap in appropriate Unicode
> homographs and assume users have the necessary glyphs...)

So we already do `$name =~ tr/<>//d', so I think doing the same
with `$email' is appropiate for fast-import.  The "correct"
address featuring '<' will still be indexed in Xapian, at least.

-------------8<-------------
Subject: [PATCH] import: drop '<' and '>' characters in addresses

Some strange "From:" lines will cause Email::Address::XS to
leave '<' (and presumably '>') in the address which
git-fast-import won't accept even if quoted.  Workaround this
problem by deleting '<' and '>' the same way we delete them for
the ident name.

Reported-by: Leah Neukirchen <leah@vuxu.org>
Link: https://public-inbox.org/meta/87h7zfemur.fsf@vuxu.org/
---
 lib/PublicInbox/Import.pm | 4 ++++
 t/import.t                | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index d8dc49b8..68dc0c7e 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -293,6 +293,10 @@ sub extract_cmt_info ($) {
 		}
 	}
 	if (defined $email) {
+		# Email::Address::XS may leave quoted '<' in addresses,
+		# which git-fast-import doesn't like
+		$email =~ tr/<>//d;
+
 		# quiet down wide character warnings with utf8::encode
 		utf8::encode($email);
 	} else {
diff --git a/t/import.t b/t/import.t
index e71dd714..b88d308e 100644
--- a/t/import.t
+++ b/t/import.t
@@ -55,6 +55,8 @@ $im->done;
 my @revs = $git->qx(qw(rev-list HEAD));
 is(scalar @revs, 1, 'one revision created');
 
+my $odd = '"=?iso-8859-1?Q?J_K=FCpper?= <usenet"@example.de';
+$mime->header_set('From', $odd);
 $mime->header_set('Message-ID', '<b@example.com>');
 $mime->header_set('Subject', 'msg2');
 like($im->add($mime, sub { $mime }), qr/\A:\d+\z/, 'added 2nd message');

^ permalink raw reply	[relevance 0%]

* weird From: lines [was: Two small issues when importing old archives]
  2020-02-24 20:45  5% Two small issues when importing old archives Leah Neukirchen
@ 2020-02-25  9:28  0% ` Eric Wong
  2020-02-26 10:21  0%   ` [PATCH] import: drop '<' and '>' characters in addresses Eric Wong
  0 siblings, 1 reply; 6+ results
From: Eric Wong @ 2020-02-25  9:28 UTC (permalink / raw)
  To: Leah Neukirchen; +Cc: meta

Leah Neukirchen <leah@vuxu.org> wrote:
> 2) Weird From: lines crash the whole import
> 
> From: "=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de
> 
> This funny line broke import_maildir:
> 
> fatal: Missing > in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet <"=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de> 1101853296 +0100
> fast-import: dumping crash report to /var/lib/public-inbox/repositories/ding.git/fast_import_crash_31402
> EOF from fast-import:  at /usr/share/perl5/vendor_perl/PublicInbox/Import.pm line 96, <$r> line 54681.
> 
> I fixed it manually.  (But I think it's actually a valid mail address,
> even in this botched state.)  I'm not sure what added the ">", it's
> not in the original mail.
> 
> (I use public-inbox-1.3.0/git-2.25.0 on Void Linux.)

Gah, this looks like it's because Email::Address::XS leaves a
"<" in the name...   Perhaps Import should delete all [<>]
characters unconditionally? (or swap in appropriate Unicode
homographs and assume users have the necessary glyphs...)

---------8<----------
Subject: [RFC] t/address.t: dump failing case

"PublicInbox::Address" (w/o "PP") is Email::Address::XS 1.04
from Debian 10:

PublicInbox::Address names: $VAR1 = [
          '=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet'
        ];
PublicInbox::Address emails: $VAR1 = [
          '"=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@example.de'
        ];
PublicInbox::AddressPP names: $VAR1 = [
          '=?iso-8859-1?Q?Jochen_K=FCpper?='
        ];
PublicInbox::AddressPP emails: $VAR1 = [
          'usenet"@example.de'
        ];
---
 t/address.t | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/address.t b/t/address.t
index 6f4bff6c..8c39f04b 100644
--- a/t/address.t
+++ b/t/address.t
@@ -14,6 +14,11 @@ sub test_pkg {
 		[$emails->('User <e@example.com>, e@example.org')],
 		'address extraction works as expected');
 
+	my $odd = '"=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@example.de';
+	use Data::Dumper;
+	diag "$pkg names: " . Dumper([$names->($odd)]);
+	diag "$pkg emails: " . Dumper([$emails->($odd)]);
+
 	is_deeply(['user@example.com'],
 		[$emails->('<user@example.com (Comment)>')],
 		'comment after domain accepted before >');

^ permalink raw reply	[relevance 0%]

* Two small issues when importing old archives
@ 2020-02-24 20:45  5% Leah Neukirchen
  2020-02-25  9:28  0% ` weird From: lines [was: Two small issues when importing old archives] Eric Wong
  0 siblings, 1 reply; 6+ results
From: Leah Neukirchen @ 2020-02-24 20:45 UTC (permalink / raw)
  To: meta

Hi,

I've recently imported some sizable archives (~100k messages) of old
mailing lists and noticed some slight inconveniences:

1) RFC5322/822 invalid Date: headers should be parsed more gracefully

Some old mails had Date: headers without time zones, e.g.
Date: Sat, 27 Sep 1997 10:02:32

This results in public-inbox asserting this is the current date.
But this assumption makes no sense (literally every other guess
would be more likely), and also results in these messages showing up
on the first page of the archive.  Furthermore, sorting is then not
stable, pressing F5 make the threads jump around.  I'd recommend
falling back to +0000 instead.

2) Weird From: lines crash the whole import

From: "=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de

This funny line broke import_maildir:

fatal: Missing > in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet <"=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de> 1101853296 +0100
fast-import: dumping crash report to /var/lib/public-inbox/repositories/ding.git/fast_import_crash_31402
EOF from fast-import:  at /usr/share/perl5/vendor_perl/PublicInbox/Import.pm line 96, <$r> line 54681.

I fixed it manually.  (But I think it's actually a valid mail address,
even in this botched state.)  I'm not sure what added the ">", it's
not in the original mail.

(I use public-inbox-1.3.0/git-2.25.0 on Void Linux.)

thx,
-- 
Leah Neukirchen  <leah@vuxu.org>  https://leahneukirchen.org/

^ permalink raw reply	[relevance 5%]

* [ANNOUNCE] public-inbox 1.3.0
@ 2020-02-10  5:52 23% Eric Wong
  0 siblings, 0 replies; 6+ results
From: Eric Wong @ 2020-02-10  5:52 UTC (permalink / raw)
  To: meta

Many internal improvements to improve the developer experience,
long-term maintainability, ease-of-installation and compatibility.
There are also several bugfixes.

Some of the internal improvements involve avoiding Perl startup
time in tests.  "make check" now runs about 50% faster than
before, and the new "make check-run" can be around 30% faster
than "make check" after being primed by "make check".

Most closures (anonymous subroutines) are purged from the
-nntpd, -httpd and WWW code paths to make checking for memory
leaks easier.

* documentation now builds on BSD make

* Date::Parse (TimeDate CPAN distribution) is now optional, allowing
  installation from OpenBSD systems via "pkg".

* the work-in-progress Xapian.pm SWIG bindings are now supported
  in addition to the traditional Search::Xapian XS bindings.
  Only the SWIG bindings are packaged for OpenBSD.

* Plack is optional for users who wish to avoid web-related components

* Filesys::Notify::Simple is optional for non-watch users
  (but Plack will still pull it in)

* improved internal error checking and reporting in numerous places

* fixed Perl 5.10.1 compatibility (tested with Devel::PatchPerl)

* IPC::Run and XML::Feed are no longer used in tests,
  though XML::TreePP becomes an optional test dependency.

* Email::Address::XS used if available (newer Email::MIME
  requires it), it should handle more corner cases.

* PublicInbox::WWW:
  - "nested" search results page now shows relevancy percentages
  - many solver bugs fixed
  - solver works on "-U0" patches using "git apply --unidiff-zero"
  - solver now compatible with git < v1.8.5 (but >= v1.8.0)
  - raw HTML no longer shown inline in multipart/alternative messages
    (v1.2.0 regression)
  - reduced memory usage for displaying multipart messages
  - static file responses support Last-Modified/If-Modified-Since
  - avoid trailing underlines in diffstat linkification
  - more consistent handling of messages without Subjects

* public-inbox-httpd / public-inbox-nntpd:
  - MSG_MORE used consistently in long responses
  - fixed IO::KQueue usage on *BSDs
  - listen sockets are closed immediately on graceful shutdown
  - missed signals avoided with signalfd or EVFILT_SIGNAL
  - Linux x32 ABI support

* public-inbox-nntpd:
  - Y2020 workaround for Time::Local

* public-inbox-watch:
  - avoid memory leak from cyclic reference on SIGHUP
  - fix documentation of publicinboxwatch.watchspam

* public-inbox-convert:
  - avoid article number jumps when converting indexed v1 inboxes

* public-inbox-compact / public-inbox-xcpdb:
  - concurrent invocations of -compact and -xcpdb commands,
    not just -mda, -watch, -learn, -purge

* examples/unsubscribe.milter:
  - support unique mailto: unsubscribe

Release tarball available for download at:

https://public-inbox.org/public-inbox.git/snapshot/public-inbox-1.3.0.tar.gz

Please report bugs via plain-text mail to: meta@public-inbox.org

See archives at https://public-inbox.org/meta/ for all history.
See https://public-inbox.org/TODO for what the future holds.

^ permalink raw reply	[relevance 23%]

* [PATCH] doc: more 1.3.0 release notes updates
@ 2020-01-31 23:45  7% Eric Wong
  0 siblings, 0 replies; 6+ results
From: Eric Wong @ 2020-01-31 23:45 UTC (permalink / raw)
  To: meta

Some updates with recent bugfixes and a few wording/formatting
improvements.
---
 I'm thinking it's time for a release, soon; before new features
 creep in...

 Documentation/RelNotes/v1.3.0.eml | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/Documentation/RelNotes/v1.3.0.eml b/Documentation/RelNotes/v1.3.0.eml
index 9000ccaf..cbf7438b 100644
--- a/Documentation/RelNotes/v1.3.0.eml
+++ b/Documentation/RelNotes/v1.3.0.eml
@@ -3,8 +3,9 @@ To: meta@public-inbox.org
 Subject: [WIP] public-inbox 1.3.0
 Content-Type: text/plain; charset=utf-8
 
-Many internal improvements to improve the developer experience
-and long-term maintainability.
+Many internal improvements to improve the developer experience,
+long-term maintainability, ease-of-installation and compatibility.
+There are also several bugfixes.
 
 Some of the internal improvements involve avoiding Perl startup
 time in tests.  "make check" now runs about 50% faster than
@@ -27,16 +28,18 @@ leaks easier.
 * Plack is optional for users who wish to avoid web-related components
 
 * Filesys::Notify::Simple is optional for non-watch users
-  (but Plack will pull it in)
+  (but Plack will still pull it in)
 
 * improved internal error checking and reporting in numerous places
 
+* fixed Perl 5.10.1 compatibility (tested with Devel::PatchPerl)
+
 * IPC::Run is no longer used in tests
 
 * Email::Address::XS used if available (newer Email::MIME
   requires it), it should handle more corner cases.
 
-* PublicInbox::WWW
+* PublicInbox::WWW:
   - "nested" search results page now shows relevancy percentages
   - many solver bugs fixed
   - solver works on "-U0" patches using "git apply --unidiff-zero"
@@ -46,6 +49,7 @@ leaks easier.
   - reduced memory usage for displaying multipart messages
   - static file responses support Last-Modified/If-Modified-Since
   - avoid trailing underlines in diffstat linkification
+  - more consistent handling of messages without Subjects
 
 * public-inbox-httpd / public-inbox-nntpd:
   - MSG_MORE used consistently in long responses
@@ -54,15 +58,27 @@ leaks easier.
   - missed signals avoided with signalfd or EVFILT_SIGNAL
 
 * public-inbox-nntpd:
-  Y2020 workaround for Time::Local
+  - Y2020 workaround for Time::Local
 
-* public-inbox-watch
+* public-inbox-watch:
   - avoid memory leak from cyclic reference on SIGHUP
   - fix documentation of publicinboxwatch.watchspam
 
+* public-inbox-convert:
+  - avoid article number jumps when converting indexed v1 inboxes
+
+* public-inbox-compact / public-inbox-xcpdb:
+  - concurrent invocations of -compact and -xcpdb commands,
+    not just -mda, -watch, -learn, -purge
+
+* examples/unsubscribe.milter:
+  - support unique mailto: unsubscribe
+
 Release tarballs will be available for download at
 
 	https://public-inbox.org/public-inbox.git
 
+Please report bugs via plain-text mail to: meta@public-inbox.org
+
 See archives at https://public-inbox.org/meta/ for all history.
 See https://public-inbox.org/TODO for what the future holds.

^ permalink raw reply	[relevance 7%]

* [PATCH 3/6] doc: release notes: set Date for 1.2.0, start 1.3.0
  @ 2020-01-01  9:57  7% ` Eric Wong
  0 siblings, 0 replies; 6+ results
From: Eric Wong @ 2020-01-01  9:57 UTC (permalink / raw)
  To: meta

Seems like a lot's happened since 1.2, but it's mostly
internal stuff...
---
 Documentation/RelNotes/v1.2.0.eml |  9 ++++++
 Documentation/RelNotes/v1.3.0.eml | 50 +++++++++++++++++++++++++++++++
 MANIFEST                          |  1 +
 3 files changed, 60 insertions(+)
 create mode 100644 Documentation/RelNotes/v1.3.0.eml

diff --git a/Documentation/RelNotes/v1.2.0.eml b/Documentation/RelNotes/v1.2.0.eml
index 2eeb0de0..d8b8d2b6 100644
--- a/Documentation/RelNotes/v1.2.0.eml
+++ b/Documentation/RelNotes/v1.2.0.eml
@@ -1,3 +1,5 @@
+From e@80x24.org Sun Nov  3 03:12:41 2019
+Date: Sun, 3 Nov 2019 03:12:41 +0000
 From: Eric Wong <e@80x24.org>
 To: meta@public-inbox.org
 Subject: [ANNOUNCE] public-inbox 1.2.0
@@ -73,5 +75,12 @@ for their sponsorship and support over the past two years.
 
 https://public-inbox.org/releases/public-inbox-1.2.0.tar.gz
 
+SHA256: dabc735a5cfe396f457ac721559de26ae38abbaaa74612eb786e9e2e1ca94269
+
+  Chances are: You don't know me and never will.  Everybody else
+  can verify the tarball and sign a reply saying they've
+  verified it, instead.  The more who do this, the better, but
+  don't trust the BOFH :P
+
 See archives at https://public-inbox.org/meta/ for all history.
 See https://public-inbox.org/TODO for what the future holds.
diff --git a/Documentation/RelNotes/v1.3.0.eml b/Documentation/RelNotes/v1.3.0.eml
new file mode 100644
index 00000000..11806ccd
--- /dev/null
+++ b/Documentation/RelNotes/v1.3.0.eml
@@ -0,0 +1,50 @@
+From: Eric Wong <e@80x24.org>
+To: meta@public-inbox.org
+Subject: [WIP] public-inbox 1.3.0
+Content-Type: text/plain; charset=utf-8
+
+Many internal improvements to improve the developer experience
+and long-term maintainability.
+
+Many of the internal improvements focused on being able to avoid
+Perl startup time in tests.  "make check" now runs about 50%
+faster than before, and the new "make check-run" can be around
+30% faster after being primed by "make check".
+
+Most closures (anonymous subroutines) are purged from the
+-nntpd, -httpd and WWW code paths to make checking for memory
+leaks easier.
+
+* documentation now builds on BSD make
+
+* Date::Parse (TimeDate CPAN distribution) is now optional, allowing
+  installation from OpenBSD systems via "pkg".
+
+* the work-in-progress Xapian.pm SWIG bindings are now supported
+  in addition to the traditional Search::Xapian XS bindings.
+  Only SWIG bindings are packaged for OpenBSD.
+
+* IPC::Run is no longer used in tests
+
+* improved internal error checking and reporting in numerous places
+  
+* PublicInbox::WWW
+  - "nested" search results page now shows relevancy percentages
+  - solver works on "-U0" patches using "git apply --unidiff-zero"
+  - raw HTML no longer shown inline in multipart/alternative messages
+    (v1.2.0 regression)
+
+* public-inbox-httpd / public-inbox-nntpd:
+  - MSG_MORE used consistently in long responses
+  - fixed IO::KQueue usage on *BSDs
+
+* public-inbox-watch
+  - avoid memory leak from cyclic reference on SIGHUP
+  - fix documentation of publicinboxwatch.watchspam
+
+Release tarballs will be available for download at
+
+	https://public-inbox.org/public-inbox.git
+
+See archives at https://public-inbox.org/meta/ for all history.
+See https://public-inbox.org/TODO for what the future holds.
diff --git a/MANIFEST b/MANIFEST
index f649bbef..59716adf 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -6,6 +6,7 @@ Documentation/.gitignore
 Documentation/RelNotes/v1.0.0.eml
 Documentation/RelNotes/v1.1.0-pre1.eml
 Documentation/RelNotes/v1.2.0.eml
+Documentation/RelNotes/v1.3.0.eml
 Documentation/dc-dlvr-spam-flow.txt
 Documentation/design_notes.txt
 Documentation/design_www.txt

^ permalink raw reply	[relevance 7%]

Results 1-6 of 6 | reverse results
2020-01-01  9:57     [PATCH 0/6] doc updates and such Eric Wong
2020-01-01  9:57  7% ` [PATCH 3/6] doc: release notes: set Date for 1.2.0, start 1.3.0 Eric Wong
2020-01-31 23:45  7% [PATCH] doc: more 1.3.0 release notes updates Eric Wong
2020-02-10  5:52 23% [ANNOUNCE] public-inbox 1.3.0 Eric Wong
2020-02-24 20:45  5% Two small issues when importing old archives Leah Neukirchen
2020-02-25  9:28  0% ` weird From: lines [was: Two small issues when importing old archives] Eric Wong
2020-02-26 10:21  0%   ` [PATCH] import: drop '<' and '>' characters in addresses Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git