user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: internal format
Date: Thu, 15 Mar 2018 14:49:35 -0400	[thread overview]
Message-ID: <jwvzi39xkj6.fsf-monnier+Inbox@gnu.org> (raw)
In-Reply-To: <20180315164012.GA20246@whir> (Eric Wong's message of "Thu, 15 Mar 2018 16:40:12 +0000")

> v1 or v2?  Some of the reasoning for v2 was here:
>   https://public-inbox.org/meta/20180209205140.GA11047@dcvr/

IIUC, the issues you consider important are:

- Size
- Time to perform "git rev-list --objects --all"
- Flexibility, e.g. to be able to remove messages.

For size your benchmarks seem to indicate that as long as it's kept
inside Git, the choice of format doesn't actually affect it
significantly (and this matches my expectations).
Tho I guess it's probably possible to improve on it with enough efforts
(e.g. storing attachments separately, or splitting large messages into
chunks, e.g. like `bup` does), but I doubt it's worth the effort
(especially if you assume that the mailing-list imposes a limit on
message size).

For timing, I'm curious why you only consider
"git rev-list --objects --all".  Which operation does this corresponds
to in public-inbox and is that really the only one that is
performance-sensitive?

> As for git itself: reliability, ease-of-replication, storage
> efficiency.

Yes, that part I totally understand (same reason I used Git in BuGit
https://gitlab.com/monnier/bugit).  Part of my question was related to
the fact that in BuGit I store the messages in the commit-object rather
than in files (which trivially gives me conflict-free merges as well as
"discussion threads") so I was wondering if it would make sense in the
case of public-inbox to keep the email messages in the commit objects
rather than in files, but since I don't really know which operations are
frequent/important I really have no idea.

One thing that strikes me is that you don't seem to use its
"decentralization": IIUC public-inbox always assumes one of the
repositories is the "master" and others are mirrors (or mirrors of
mirrors), so you get efficient "fast-forward" updates, but you
don't do "merges".

This probably means that keeping the email messages in commit objects
wouldn't bring any benefits.

Also this means that public-inbox could freely rewrite history, for
example (which you'll need to really expunge messages) and just use
"forced updates" in mirrors.

Now I'm left wondering what it would mean for something like
public-inbox to support merging.


        Stefan

  reply	other threads:[~2018-03-15 18:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-05  0:54 Relationship between public-inbox and ssoma? Nicolás Ojeda Bär
2018-03-05  2:07 ` Eric Wong
2018-03-05 11:45   ` Nicolás Ojeda Bär
2018-03-05 17:50     ` Eric Wong
2018-03-05 18:06       ` Nicolás Ojeda Bär
2018-03-19  7:43         ` watch performance [was: Relationship between public-inbox and ssoma?] Eric Wong
2018-03-15 15:30   ` internal format (was: Relationship between public-inbox and ssoma?) Stefan Monnier
2018-03-15 16:40     ` Eric Wong
2018-03-15 18:49       ` Stefan Monnier [this message]
2018-03-15 20:14         ` internal format Eric Wong
2018-03-15 21:05           ` Stefan Monnier
2018-03-15 21:21             ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvzi39xkj6.fsf-monnier+Inbox@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).