user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: msgmap serial number regeneration [was: Q: V2 format]
Date: Sat, 14 Jul 2018 14:01:58 -0500	[thread overview]
Message-ID: <87a7qtwsih.fsf@xmission.com> (raw)
In-Reply-To: <20180713222200.GB27845@dcvr> (Eric Wong's message of "Fri, 13 Jul 2018 22:22:00 +0000")

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>> > Eric Wong <e@80x24.org> writes:
>> >> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> >>> 
>> >>> Because of the parallelism in V2 I have noticed messages in numbered
>> >>> in an order that does not correspond to their commit order.  So the
>> >>> SQLite database isn't as recoverable as it might be.  Especially as the
>> >>> parallelism introduces an element of non-determinancy.
>> >>
>> >> *puzzled* were you able to reproduce that?  The serial number
>> >> generation + threading happens in the main process and the
>> >> parallelism is limited to Xapian text indexing.  -index
>> >> generates serial numbers by walking backwards with v2, and
>> >> complains on unexpected results.
>> 
>> Digging into this I have found consistenly non-reproducible numbering,
>> because of deleted files.  Apparently in both V1 and V2 an a worst-case
>> estimate is made of the total numbers that are going to be needed and
>> numbers are assigned backwards from there.
>> 
>> A fresh indexing of the git mailling list archive on v1 gives me numbers
>> starting with 360 and on v2 numbers starting with 355.  Which
>> corresponds with the number of deleted messages.
>> 
>> I am still looking to see if there are any other weird things here.
>
> Ah, yes, you're correct deletes don't get accounted for when
> regenerating.  Oh well.  I guess it was correct to document msgmap
> as something important to backup and not break for instances of
> particular servers.  (emphasis on "particular servers")
>
> So I think you'd need to walk revision history twice to account
> for deleted messages...
>
> Across different machines, it should not  matter to preserve
> serials.

I believe we can modify the msg number assignment to assign numbers to
deletes as well as adds.   Short of the same Message-ID coming up twice
that should be enough for the current backwards loop to assign message
ids reliably.  And even Message-IDs comming up twice is handle-able.

>> I definitely do not like not being able to reconstruct message numbers
>> from a backup.
>
> For v2, I see serial numbers are an internal optimization which
> happens to map to NNTP.
>
> If the git repo is cloned and the cloner sets up a different
> server, it'll have a different address and clients won't know to
> deduplicate them anyways.  I suppose it makes the load-balanced
> case a little more complex to sync(*)

But if the server hardware fails.  The case I am dealing with at the
moment I can stand up a new server with the same ip address.

Further if we can make everything but the git repository non-essential
it yields more flexibility for changing and optimizing things in the
future.

> (*) But optimizing for load-balanced instances isn't ideal,
>     I'd rather see more independently-run servers than giant
>     load-balanced instances which everybody relies on.

True.

At this point I am just optimizing for my own operational simplicity
of my own indpendentyly-run server.

Eric


  reply	other threads:[~2018-07-14 19:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 20:01 Q: V2 format Eric W. Biederman
2018-07-11 21:18 ` Konstantin Ryabitsev
2018-07-11 21:41   ` Eric W. Biederman
2018-07-12  1:47 ` Eric Wong
2018-07-12 13:58   ` Eric W. Biederman
2018-07-12 23:09     ` Eric Wong
2018-07-13 13:39       ` Eric W. Biederman
2018-07-13 20:03         ` Eric W. Biederman
2018-07-13 22:22           ` msgmap serial number regeneration [was: Q: V2 format] Eric Wong
2018-07-14 19:01             ` Eric W. Biederman [this message]
2018-07-15  3:18               ` Eric Wong
2018-07-16 15:20                 ` Eric W. Biederman
2018-07-13 22:02         ` bug: v2 deletes on incremental fetch " Eric Wong
2018-07-13 22:51           ` Eric W. Biederman
2018-07-14  0:46           ` [PATCH] v2writable: unindex deleted messages after incremental fetch Eric Wong
2018-07-13 23:07         ` IMAP server [was: Q: V2 format] Eric Wong
2018-07-13 23:12           ` Eric W. Biederman
2018-09-28 20:10           ` Johannes Berg
2018-09-28 21:01             ` Eric W. Biederman
2018-10-01  7:46               ` Johannes Berg
2018-10-01  8:51                 ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a7qtwsih.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).