user/dev discussion of public-inbox itself
 help / color / Atom feed
* RFC: marking spam via refs/notes/spam to hide it
@ 2019-06-27 18:42 Konstantin Ryabitsev
  2019-06-27 18:52 ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-27 18:42 UTC (permalink / raw)
  To: meta

Greetings:

I'm reluctant to delete spam because it rebases the repository -- for 
large ones this can cause excessive downloads to mirrors. A thought 
occurred to me -- would it make sense to just hide spam from the 
frontend? E.g.:

public-inbox-hide linux-kernel message@id

This would do the following:

- remove that message from search databases
- attach a refs/notes/spam git-note to that commit
- tell public-inbox-init/reindex to ignore this commit in the future

Seems like it would be easy to do and would give a way to remove spam 
without needing to edit git history.

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: marking spam via refs/notes/spam to hide it
  2019-06-27 18:42 RFC: marking spam via refs/notes/spam to hide it Konstantin Ryabitsev
@ 2019-06-27 18:52 ` Eric Wong
  2019-06-27 18:57   ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2019-06-27 18:52 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Greetings:
> 
> I'm reluctant to delete spam because it rebases the repository -- for large
> ones this can cause excessive downloads to mirrors. A thought occurred to me
> -- would it make sense to just hide spam from the frontend? E.g.:
> 
> public-inbox-hide linux-kernel message@id
> 
> This would do the following:
> 
> - remove that message from search databases
> - attach a refs/notes/spam git-note to that commit
> - tell public-inbox-init/reindex to ignore this commit in the future

Aside from the git note, public-inbox-learn already does that:

   public-inbox-learn spam </path/to/message

   (scans everything in ~/.public-inbox/config since spam is
   frequently cross-posted)

I've been using it since the earliest days of the project and
frequently need it for the git@vger mirror.

It's also wired into -watch via watchspam (but the sa-learn step
to train spamassassin is broken atm).

> Seems like it would be easy to do and would give a way to remove spam
> without needing to edit git history.

It appends to git history, v2 changes the 'm' file to a 'd'
file with the corresponding blob; v1 removes the file from the
tree.  It doesn't add blobs to git history, but there'll be
new tree and commit objects.  There's no rebasing at all.

public-inbox-index has always handled unindexing it in mirrors,
too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: marking spam via refs/notes/spam to hide it
  2019-06-27 18:52 ` Eric Wong
@ 2019-06-27 18:57   ` Konstantin Ryabitsev
  2019-06-27 19:33     ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-27 18:57 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Jun 27, 2019 at 06:52:36PM +0000, Eric Wong wrote:
>> I'm reluctant to delete spam because it rebases the repository -- for 
>> large
>> ones this can cause excessive downloads to mirrors. A thought occurred to me
>> -- would it make sense to just hide spam from the frontend? E.g.:
>>
>> public-inbox-hide linux-kernel message@id
>>
>> This would do the following:
>>
>> - remove that message from search databases
>> - attach a refs/notes/spam git-note to that commit
>> - tell public-inbox-init/reindex to ignore this commit in the future
>
>Aside from the git note, public-inbox-learn already does that:
>
>   public-inbox-learn spam </path/to/message
>
>   (scans everything in ~/.public-inbox/config since spam is
>   frequently cross-posted)

Ah, that shows how carefully I read docs, I guess. :) Is it possible to 
just specify a message-id, so that there's no extra step to dump the 
spam message into a file?

>> Seems like it would be easy to do and would give a way to remove spam
>> without needing to edit git history.
>
>It appends to git history, v2 changes the 'm' file to a 'd'
>file with the corresponding blob; v1 removes the file from the
>tree.  It doesn't add blobs to git history, but there'll be
>new tree and commit objects.  There's no rebasing at all.
>
>public-inbox-index has always handled unindexing it in mirrors,
>too.

Right on, thanks! This is certainly along the same lines that I was 
thinking.

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: marking spam via refs/notes/spam to hide it
  2019-06-27 18:57   ` Konstantin Ryabitsev
@ 2019-06-27 19:33     ` Eric Wong
  2019-06-27 19:45       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2019-06-27 19:33 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jun 27, 2019 at 06:52:36PM +0000, Eric Wong wrote:
> > > I'm reluctant to delete spam because it rebases the repository --
> > > for large
> > > ones this can cause excessive downloads to mirrors. A thought occurred to me
> > > -- would it make sense to just hide spam from the frontend? E.g.:
> > > 
> > > public-inbox-hide linux-kernel message@id
> > > 
> > > This would do the following:
> > > 
> > > - remove that message from search databases
> > > - attach a refs/notes/spam git-note to that commit
> > > - tell public-inbox-init/reindex to ignore this commit in the future
> > 
> > Aside from the git note, public-inbox-learn already does that:
> > 
> >   public-inbox-learn spam </path/to/message
> > 
> >   (scans everything in ~/.public-inbox/config since spam is
> >   frequently cross-posted)
> 
> Ah, that shows how carefully I read docs, I guess. :) Is it possible to just
> specify a message-id, so that there's no extra step to dump the spam message
> into a file?

Not exactly with the Message-ID arg.  It would be dangerous if
somebody malicious wanted to get you to remove a legit message
by sending a spam message which reuses a Message-ID of a legit
message.  I'd definitely want to verify a message is what I'd
want to remove, first.

In theory, you could: "curl $URL_MESSAGE_ID/raw | public-inbox-learn spam";
but that's still dangerous because there are/were legit bots
(and IIRC, old git-send-email) which reused Message-IDs, too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: marking spam via refs/notes/spam to hide it
  2019-06-27 19:33     ` Eric Wong
@ 2019-06-27 19:45       ` Konstantin Ryabitsev
  2019-06-27 19:50         ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-27 19:45 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Jun 27, 2019 at 07:33:32PM +0000, Eric Wong wrote:
>> > Aside from the git note, public-inbox-learn already does that:
>> >
>> >   public-inbox-learn spam </path/to/message
>> >
>> >   (scans everything in ~/.public-inbox/config since spam is
>> >   frequently cross-posted)
>>
>> Ah, that shows how carefully I read docs, I guess. :) Is it possible to just
>> specify a message-id, so that there's no extra step to dump the spam message
>> into a file?
>
>Not exactly with the Message-ID arg.  It would be dangerous if
>somebody malicious wanted to get you to remove a legit message
>by sending a spam message which reuses a Message-ID of a legit
>message.  I'd definitely want to verify a message is what I'd
>want to remove, first.

This makes sense, thanks. I tried it out and it works to remove spam 
from the frontend, but spamc step seems to fail with a somewhat 
incongruous error code:

spamc failed with: 18944

Any pointers where I should look to figure out which part is failing?

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: marking spam via refs/notes/spam to hide it
  2019-06-27 19:45       ` Konstantin Ryabitsev
@ 2019-06-27 19:50         ` Eric Wong
  2019-06-27 20:18           ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2019-06-27 19:50 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jun 27, 2019 at 07:33:32PM +0000, Eric Wong wrote:
> > > > Aside from the git note, public-inbox-learn already does that:
> > > >
> > > >   public-inbox-learn spam </path/to/message
> > > >
> > > >   (scans everything in ~/.public-inbox/config since spam is
> > > >   frequently cross-posted)
> > > 
> > > Ah, that shows how carefully I read docs, I guess. :) Is it possible to just
> > > specify a message-id, so that there's no extra step to dump the spam message
> > > into a file?
> > 
> > Not exactly with the Message-ID arg.  It would be dangerous if
> > somebody malicious wanted to get you to remove a legit message
> > by sending a spam message which reuses a Message-ID of a legit
> > message.  I'd definitely want to verify a message is what I'd
> > want to remove, first.
> 
> This makes sense, thanks. I tried it out and it works to remove spam from
> the frontend, but spamc step seems to fail with a somewhat incongruous error
> code:
> 
> spamc failed with: 18944

Oops, might be $? in Perl needs to be >> 8 to get the exit code.
That gives 74, which spamc(1) says is EX_IOERR

> Any pointers where I should look to figure out which part is failing?

Anything in syslog?  you can also check spamc or sa-learn on
the message directly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: marking spam via refs/notes/spam to hide it
  2019-06-27 19:50         ` Eric Wong
@ 2019-06-27 20:18           ` Konstantin Ryabitsev
  0 siblings, 0 replies; 7+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-27 20:18 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Jun 27, 2019 at 07:50:11PM +0000, Eric Wong wrote:
>> This makes sense, thanks. I tried it out and it works to remove spam 
>> from
>> the frontend, but spamc step seems to fail with a somewhat incongruous error
>> code:
>>
>> spamc failed with: 18944
>
>Oops, might be $? in Perl needs to be >> 8 to get the exit code.
>That gives 74, which spamc(1) says is EX_IOERR

That's indeed 74, and poking there led me to discover that spamd needed 
--allow-tell to permit this sort of thing.

All's good now, thanks!

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27 18:42 RFC: marking spam via refs/notes/spam to hide it Konstantin Ryabitsev
2019-06-27 18:52 ` Eric Wong
2019-06-27 18:57   ` Konstantin Ryabitsev
2019-06-27 19:33     ` Eric Wong
2019-06-27 19:45       ` Konstantin Ryabitsev
2019-06-27 19:50         ` Eric Wong
2019-06-27 20:18           ` Konstantin Ryabitsev

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox