From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <e@80x24.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN:  
X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from localhost (dcvr.yhbt.net [127.0.0.1])
	by dcvr.yhbt.net (Postfix) with ESMTP id 4C7901F461;
	Thu, 27 Jun 2019 19:33:32 +0000 (UTC)
Date: Thu, 27 Jun 2019 19:33:32 +0000
From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org
Subject: Re: RFC: marking spam via refs/notes/spam to hide it
Message-ID: <20190627193332.gjzwkuiotp6fgmcf@whir>
References: <20190627184251.GC14570@chatter.i7.local>
 <20190627185236.2mwuoygclytf5m7x@whir>
 <20190627185723.GE14570@chatter.i7.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20190627185723.GE14570@chatter.i7.local>
List-Id: <meta.public-inbox.org>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jun 27, 2019 at 06:52:36PM +0000, Eric Wong wrote:
> > > I'm reluctant to delete spam because it rebases the repository --
> > > for large
> > > ones this can cause excessive downloads to mirrors. A thought occurred to me
> > > -- would it make sense to just hide spam from the frontend? E.g.:
> > > 
> > > public-inbox-hide linux-kernel message@id
> > > 
> > > This would do the following:
> > > 
> > > - remove that message from search databases
> > > - attach a refs/notes/spam git-note to that commit
> > > - tell public-inbox-init/reindex to ignore this commit in the future
> > 
> > Aside from the git note, public-inbox-learn already does that:
> > 
> >   public-inbox-learn spam </path/to/message
> > 
> >   (scans everything in ~/.public-inbox/config since spam is
> >   frequently cross-posted)
> 
> Ah, that shows how carefully I read docs, I guess. :) Is it possible to just
> specify a message-id, so that there's no extra step to dump the spam message
> into a file?

Not exactly with the Message-ID arg.  It would be dangerous if
somebody malicious wanted to get you to remove a legit message
by sending a spam message which reuses a Message-ID of a legit
message.  I'd definitely want to verify a message is what I'd
want to remove, first.

In theory, you could: "curl $URL_MESSAGE_ID/raw | public-inbox-learn spam";
but that's still dangerous because there are/were legit bots
(and IIRC, old git-send-email) which reused Message-IDs, too.