From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 8015F1F915; Sat, 14 Jul 2018 19:02:07 +0000 (UTC) Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fePnx-0000Xi-Vh; Sat, 14 Jul 2018 13:02:06 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fePnx-00021P-4q; Sat, 14 Jul 2018 13:02:05 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Eric Wong Cc: meta@public-inbox.org References: <87k1q1bky6.fsf@xmission.com> <20180712014715.dn5aouayoa3uejp4@dcvr> <87k1q07dyc.fsf@xmission.com> <20180712230946.mqv3yjw4aabf7xrf@dcvr.yhbt.net> <878t6f1ch7.fsf@xmission.com> <87h8l2ykb4.fsf@xmission.com> <20180713222200.GB27845@dcvr> Date: Sat, 14 Jul 2018 14:01:58 -0500 In-Reply-To: <20180713222200.GB27845@dcvr> (Eric Wong's message of "Fri, 13 Jul 2018 22:22:00 +0000") Message-ID: <87a7qtwsih.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1fePnx-00021P-4q;;;mid=<87a7qtwsih.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+2XpLIpnHBx6AEdQILmr7gg/6M0bFoN1g= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: msgmap serial number regeneration [was: Q: V2 format] X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) List-Id: Eric Wong writes: > "Eric W. Biederman" wrote: >> ebiederm@xmission.com (Eric W. Biederman) writes: >> > Eric Wong writes: >> >> "Eric W. Biederman" wrote: >> >>> >> >>> Because of the parallelism in V2 I have noticed messages in numbered >> >>> in an order that does not correspond to their commit order. So the >> >>> SQLite database isn't as recoverable as it might be. Especially as the >> >>> parallelism introduces an element of non-determinancy. >> >> >> >> *puzzled* were you able to reproduce that? The serial number >> >> generation + threading happens in the main process and the >> >> parallelism is limited to Xapian text indexing. -index >> >> generates serial numbers by walking backwards with v2, and >> >> complains on unexpected results. >> >> Digging into this I have found consistenly non-reproducible numbering, >> because of deleted files. Apparently in both V1 and V2 an a worst-case >> estimate is made of the total numbers that are going to be needed and >> numbers are assigned backwards from there. >> >> A fresh indexing of the git mailling list archive on v1 gives me numbers >> starting with 360 and on v2 numbers starting with 355. Which >> corresponds with the number of deleted messages. >> >> I am still looking to see if there are any other weird things here. > > Ah, yes, you're correct deletes don't get accounted for when > regenerating. Oh well. I guess it was correct to document msgmap > as something important to backup and not break for instances of > particular servers. (emphasis on "particular servers") > > So I think you'd need to walk revision history twice to account > for deleted messages... > > Across different machines, it should not matter to preserve > serials. I believe we can modify the msg number assignment to assign numbers to deletes as well as adds. Short of the same Message-ID coming up twice that should be enough for the current backwards loop to assign message ids reliably. And even Message-IDs comming up twice is handle-able. >> I definitely do not like not being able to reconstruct message numbers >> from a backup. > > For v2, I see serial numbers are an internal optimization which > happens to map to NNTP. > > If the git repo is cloned and the cloner sets up a different > server, it'll have a different address and clients won't know to > deduplicate them anyways. I suppose it makes the load-balanced > case a little more complex to sync(*) But if the server hardware fails. The case I am dealing with at the moment I can stand up a new server with the same ip address. Further if we can make everything but the git repository non-essential it yields more flexibility for changing and optimizing things in the future. > (*) But optimizing for load-balanced instances isn't ideal, > I'd rather see more independently-run servers than giant > load-balanced instances which everybody relies on. True. At this point I am just optimizing for my own operational simplicity of my own indpendentyly-run server. Eric