user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Rob Herring <robh@kernel.org>
Cc: meta@public-inbox.org
Subject: Re: lei missing mails
Date: Wed, 29 Jun 2022 17:27:42 +0000	[thread overview]
Message-ID: <20220629172742.M978900@dcvr> (raw)
In-Reply-To: <CAL_Jsq+W4KxRjR+r7ydGj_4T-oxcD370EqUp0KrBngMZnpDTgA@mail.gmail.com>

Rob Herring <robh@kernel.org> wrote:
> On Wed, Jun 29, 2022 at 10:30 AM Eric Wong <e@80x24.org> wrote:
> >
> > Rob Herring <robh@kernel.org> wrote:
> > > Hi,
> > >
> > > I'm using lei with lore where I have 2 queries which overlap. Really,
> > > one is a subset of the other. On those overlapping threads, I'm
> > > finding that sometimes new messages are written to one mailbox and not
> > > the other. (At least sometimes, the messages may be missing from all
> > > mailboxes sometimes too. I'm not certain.) Using --remote-fudge-time
> > > to force refetching seems to get the missing mails. I haven't found
> > > anything strange in timestamps of the missing mails, but otherwise am
> > > not sure how to debug this further. The queries are retrieving full
> > > threads and the missing mails are in the threads, but not direct
> > > matches to the queries. I realize that's not a lot of detail to go on.
> > > Suggestions on debugging this further?
> >
> > Is this with 1.8 or 1.7?
> 
> Commit 68b53c888911 actually. So post 1.8.

OK, thanks for that info.

> > I forgot to note in the release notes, but there were some
> > SQLite usage-related fixes which could avoid missing messages.
> >
> > You'll need "lei daemon-kill" after upgrading to 1.8 to ensure
> > the new code is running.
> 
> It's possible I haven't done that since updating though I do vaguely
> recall seeing something about needing to do that. Is there any way to
> tell before I restart it?

Not really, but it's pretty cheap to restart (assuming there's no
long-running jobs).

> > What might be interesting is to use the URLs lei prints and
> > comparing the results w/o lei.
> >
> > I'll have to double-check if overlapping affects things, but it
> > shouldn't; since the dedupe logic is per-output.
> >
> > Is this exclusively with HTTPS endpoints and writing to Maildirs
> > (or something else?)
> 
> Yes. It's querying lore and writing to a maildir. Here's one of the queries:
> 
> [lei]
>         q = (dfn:drivers OR dfn:arch OR dfn:Documentation/* OR
> dfn:include OR dfn:scripts) AND \
>          f:robh@kernel.org AND rt:6.month.ago..
> [lei "q"]
>         include = https://lore.kernel.org/all/
>         external = 1
>         local = 1
>         remote = 1
>         threads = 1
>         dedupe = mid
>         output = maildir:/home/rob/Mail/my-patches

Fwiw, dedupe based on mid could be vulnerable to spoofing, which
is why `content' is the default.  But yes, in the past, I've
noticed some messages to meta@public-inbox.org not showing up,
though not recently (I guess lack of activity here is a culprit :x)

I also just noticed an inotify-related bug deadlocking the whole
lei-deamon while looking into this :<

> > > It might be helpful if lei could print out message-ids of messages
> > > written to mailboxes.
> >
> > That could get very noisy, especially as mailboxes are written
> > in parallel.
> 
> Verbose mode already is. Maybe specifying what info you want to be
> verbose would help. The network side is mostly uninteresting in this
> case for example.

Yes, I've been struggling with the verbosity, too; and many
other things :<

> Is there any tool to list new messages in a maildir? I could do that
> before and after. I've done the clearing the new flag in mutt between
> runs, but that's not really ideal.

I suppose `ls'.  There are likely other tools more suited for Maildirs
but I'm not familiar with them off the top of my head.

Maybe lei could grow yet another command.

  reply	other threads:[~2022-06-29 17:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 16:15 lei missing mails Rob Herring
2022-06-29 16:30 ` Eric Wong
2022-06-29 16:53   ` Rob Herring
2022-06-29 17:27     ` Eric Wong [this message]
2022-06-29 22:01       ` Rob Herring
2022-06-30  8:55         ` Eric Wong
2022-07-07  9:48           ` Eric Wong
2022-07-11 21:17             ` Rob Herring
2022-07-11 21:59           ` Rob Herring
2022-07-18 23:41             ` Eric Wong
2022-07-20 22:57               ` [PATCH] www: note "x=m" and "t=1" (mis)use for GET requests Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220629172742.M978900@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).