From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.3 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_RED shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id E126A1F4B4 for ; Mon, 28 Dec 2020 16:22:22 +0000 (UTC) Received: by mail-qk1-x735.google.com with SMTP id 19so9174749qkm.8 for ; Mon, 28 Dec 2020 08:22:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to; bh=HBUQet1Gz4PSk0831uVJtIxW6Dz24j+5uX2H78Vy2Yw=; b=Fseda4yJVtWDyri/HWrTjjwsP1iLnugI/K+i/aWCaWNupGYhFQOD2rj2ZVy8g5dmqQ 0LO2C4SF+CLm/N3mzQEEolln/0+OVGAf9CXMAHa74OvEIizXWIw3xxKMV4pp1OlyTW2i /E7LwNIr99dAsdZPIiyfgx1rrlwOiNoR7qG9M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to; bh=HBUQet1Gz4PSk0831uVJtIxW6Dz24j+5uX2H78Vy2Yw=; b=M+RMaAj4d04kAQg8DRAH5WSY8K3ju56GBEatd9kPwi/BMzyfw0qhFTN85ZsuL3Nlet nosMaWC5b/FplCf29Jm8/m6ycS30FwrFmbztt9V11unNgwCC5Rxs1Ys4+KxNG4GE/RCG a4nZwtdG6u3dojqiBXslSlWwbD5mG3L0Zo8NmqOBnDKLPTA/rsRJQf3MaztLDh6kFlFk SgEhNyKPuSJIy6LtClDy7eTINKdNkBrTis2weQYx8XTaLnztNxSmymzzxSclf1x5K08d fp5jm0ZOA1VIpaCXSK7Vu1wCUvvchr3uiHXSlIEFi6ze0etonfOZ8pGH36Hi+0DAWx4/ EEZQ== X-Gm-Message-State: AOAM531NBdvasaRnRLdZc+kq1951OxGeWf+wfpFlOEyZLrUZc17N6SOZ JaFVC3Mcu4sayiPKCpbtXMt/wg== X-Google-Smtp-Source: ABdhPJyGw2REJn97JXzz0OIl52Bh4ggIAj9OG2T14iFNWaV9SOPraLxBobh7TG78pB5VtKo7de82yw== X-Received: by 2002:a37:a651:: with SMTP id p78mr44842357qke.293.1609172541592; Mon, 28 Dec 2020 08:22:21 -0800 (PST) Received: from chatter.i7.local ([89.36.78.230]) by smtp.gmail.com with ESMTPSA id n3sm23447786qtp.72.2020.12.28.08.22.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Dec 2020 08:22:21 -0800 (PST) Date: Mon, 28 Dec 2020 11:22:18 -0500 From: Konstantin Ryabitsev To: Eric Wong Cc: meta@public-inbox.org Subject: Re: public-inbox + mlmmj best practices? Message-ID: <20201228162218.zcnqxkgwa2i3nt66@chatter.i7.local> Mail-Followup-To: Eric Wong , meta@public-inbox.org References: <20201221212032.syunaxzrvcqcrose@chatter.i7.local> <20201221213914.GA9374@dcvr> <20201222062808.GA4522@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201222062808.GA4522@dcvr> List-Id: On Tue, Dec 22, 2020 at 06:28:08AM +0000, Eric Wong wrote: > Eric Wong wrote: > > > > There's scripts/ssoma-replay which was v1-only and dependent on > > ssoma. I've been meaning to convert into something that reads > > NNTP so it's not locked into public-inbox. Maybe it could be > > part of `lei', too, for piping to arbitrary commands, dunno... I wrote grok-pi-piper a while back for the purpose of piping from git to patchwork.kernel.org. It's not complete yet, because we currently do not handle situations with rewritten history, but it's been working well enough. I have a write-up here: https://people.kernel.org/monsieuricon/subscribing-to-lore-lists-with-grokmirror What is the sanest way to recognize and handle history rewrites? Right now, we just keep track of the latest tip hash. On each subsequent run, we just iterate all commits between the recorded hash and the newest tip. My current thoughts are: - in addition to the latest tip hash, keep track of author, authordate and message-id of the last processed message - if we no longer find the tracked hash in the repo, use author+authordate to find the new hash of the latest message we processed, and verify with message-id - if we cannot find the exact match (i.e. our latest processed message is gone from history), find the first commit that happens before our recorded authordate and use that as the "latest processed" jump-off point This should do the right thing in most situations except for when the message that was deleted from history was sent with a bogus Date: header with a date in the future. In this case, we can miss valid messages in the queue. Any suggestions on how this can be improved? -K