user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: public-inbox + mlmmj best practices?
Date: Mon, 28 Dec 2020 11:22:18 -0500	[thread overview]
Message-ID: <20201228162218.zcnqxkgwa2i3nt66@chatter.i7.local> (raw)
In-Reply-To: <20201222062808.GA4522@dcvr>

On Tue, Dec 22, 2020 at 06:28:08AM +0000, Eric Wong wrote:
> Eric Wong <e@80x24.org> wrote:
> > 
> > There's scripts/ssoma-replay which was v1-only and dependent on
> > ssoma.  I've been meaning to convert into something that reads
> > NNTP so it's not locked into public-inbox.  Maybe it could be
> > part of `lei', too, for piping to arbitrary commands, dunno...

I wrote grok-pi-piper a while back for the purpose of piping from git to
patchwork.kernel.org. It's not complete yet, because we currently do not
handle situations with rewritten history, but it's been working well enough. I
have a write-up here:

https://people.kernel.org/monsieuricon/subscribing-to-lore-lists-with-grokmirror

What is the sanest way to recognize and handle history rewrites? Right now, we
just keep track of the latest tip hash. On each subsequent run, we just iterate
all commits between the recorded hash and the newest tip. My current thoughts
are:

- in addition to the latest tip hash, keep track of author, authordate and
  message-id of the last processed message
- if we no longer find the tracked hash in the repo, use author+authordate to
  find the new hash of the latest message we processed, and verify with
  message-id
- if we cannot find the exact match (i.e. our latest processed message is gone
  from history), find the first commit that happens before our recorded
  authordate and use that as the "latest processed" jump-off point

This should do the right thing in most situations except for when the message
that was deleted from history was sent with a bogus Date: header with a date
in the future. In this case, we can miss valid messages in the queue.

Any suggestions on how this can be improved?

-K

  reply	other threads:[~2020-12-28 16:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-21 21:20 public-inbox + mlmmj best practices? Konstantin Ryabitsev
2020-12-21 21:39 ` Eric Wong
2020-12-22  6:28   ` Eric Wong
2020-12-28 16:22     ` Konstantin Ryabitsev [this message]
2020-12-28 21:31       ` Eric Wong
2021-01-04 20:12         ` Konstantin Ryabitsev
2021-01-05  1:06           ` Eric Wong
2021-01-05  1:29             ` [PATCH] v2writable: exact discontiguous history handling Eric Wong
2021-01-09 22:21               ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201228162218.zcnqxkgwa2i3nt66@chatter.i7.local \
    --to=konstantin@linuxfoundation.org \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).