user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Eric Wong <e@80x24.org>
Cc: <meta@public-inbox.org>
Subject: Q: V2 format
Date: Wed, 11 Jul 2018 15:01:53 -0500	[thread overview]
Message-ID: <87k1q1bky6.fsf@xmission.com> (raw)


I have been digging through the code looking so I can understand the v2
format and I have some ideas on how things might be improved, and some
questions so that I understand.

V1 supported the concept of messages being added and deleted from
the git repository all while keeping a full history of everything that
went on.  The V2 code appears to have the name 'm' for added and 'd' for
deleted, but the public-inbox-index code appears to expect deletes to
happen by way of an altered history that totally purge the commits,
and does not process the 'd' entries.

What is the thinking about deleted entries, and for v2 what is the
preferred way to delete mail from a public inbox git repository and why?



Size.  Reading the history of the public inbox meta mailling list and
playing around I discovered that I can shave off about 100M of the V2
size of the git public inbox git repository but pushing all of the
messages into a single commit.  Not great for day to day operation,
but if rebasses are part of the plan, and old archives part of the
challenge I see quite a lot of potential for old archives to be reduced
to a git repository with a single commit.


Names.  Is there a good reason not to use message numbers as the names
in the git repositories?  (Other than the cost to change the code?) That
would remove the need for treat the sqlite msgmap database as precious,
and it would make it easier to recover if an nntp server goes away.  In
V2 format the git mailing list git repository is only about 2M larger if
each message has it's msg number as it's name.  Plus the git log
is easier to read as messages are all + or -.


xapian.  Can the Xapian database be made optional in V2?  I absolutely
think a quick search for terms and other things very valuable, so I
would never suggest giving up Xapian.  On the other hand on my personal
laptop the xapian database for lkml takes ages and ages to build, and it
pushes the system into swap.  Which is all around unpleasant.  That
seems to eat into the distributed nature of the goal of public inbox.

I have tried to see what could be done that might shrink the size of
the xapian database.  The only think I could think of is perhaps
sharding the xapian database by time/msgnum ranges.   That would allow
the old xapians databases to be compacted and forgotten about, and I
think it would allow less wastage in the current xapian database as it
would be smaller, so wasting 50% space (or whatever the btrees waste)
would be less of an issue.  And as smaller databases are faster I think
that would in general be a help.

Time permitting I am willing to do some of this work so that
public-inbox works well for me.  I want to see what your vision is for
the code before I start anything.

Eric













             reply	other threads:[~2018-07-11 20:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 20:01 Eric W. Biederman [this message]
2018-07-11 21:18 ` Q: V2 format Konstantin Ryabitsev
2018-07-11 21:41   ` Eric W. Biederman
2018-07-12  1:47 ` Eric Wong
2018-07-12 13:58   ` Eric W. Biederman
2018-07-12 23:09     ` Eric Wong
2018-07-13 13:39       ` Eric W. Biederman
2018-07-13 20:03         ` Eric W. Biederman
2018-07-13 22:22           ` msgmap serial number regeneration [was: Q: V2 format] Eric Wong
2018-07-14 19:01             ` Eric W. Biederman
2018-07-15  3:18               ` Eric Wong
2018-07-16 15:20                 ` Eric W. Biederman
2018-07-13 22:02         ` bug: v2 deletes on incremental fetch " Eric Wong
2018-07-13 22:51           ` Eric W. Biederman
2018-07-14  0:46           ` [PATCH] v2writable: unindex deleted messages after incremental fetch Eric Wong
2018-07-13 23:07         ` IMAP server [was: Q: V2 format] Eric Wong
2018-07-13 23:12           ` Eric W. Biederman
2018-09-28 20:10           ` Johannes Berg
2018-09-28 21:01             ` Eric W. Biederman
2018-10-01  7:46               ` Johannes Berg
2018-10-01  8:51                 ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1q1bky6.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).