user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: internal format
Date: Thu, 15 Mar 2018 17:05:03 -0400	[thread overview]
Message-ID: <jwvin9xqct2.fsf-monnier+Inbox@gnu.org> (raw)
In-Reply-To: <20180315201420.GA30804@whir> (Eric Wong's message of "Thu, 15 Mar 2018 20:14:20 +0000")

>> For timing, I'm curious why you only consider
>> "git rev-list --objects --all".  Which operation does this corresponds
>> to in public-inbox and is that really the only one that is
>> performance-sensitive?
> That traverses the object graph (same walk used for repacking
> where bitmaps don't help).

Yes, I understand what it does in Git, but I wonder why a full traversal
of the graph is the only/main operation you care about.

Hmm... I guess your other operations are:
- lookup by message-id (which is made efficient because you index files
  by the message-id).
- everything else is done by keeping another index (from NNTP article
  number to message-id (or to blob?)), as in the case of Xapian.

Actually, if you directly index the blobs, you don't really need to
index your file by message-id (you could keep the index from message-id
to blobs external, just as is done for Xapian, right?).

> We currently store blob SHA-1s in Xapian to avoid tree lookups
> in git.  Having a history rewrite can break an entire chain of
> unrelated messages if we store commit SHA-1 in Xapian instead of
> blobs.

Ah, indeed, keeping them as files means that the file's own SHA won't
change when you rewrite history so it makes it much easier to rewrite
history if you rely on this (also probably a lot more efficient within
Git).

>> Now I'm left wondering what it would mean for something like
>> public-inbox to support merging.
> I consider it a waste of effort to maintain an authoritive
> commit history when archiving mail.

Indeed, as long as we're left wondering what good it would do to be able
to merge, we're left with its downsides.


        Stefan

  reply	other threads:[~2018-03-15 21:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-05  0:54 Relationship between public-inbox and ssoma? Nicolás Ojeda Bär
2018-03-05  2:07 ` Eric Wong
2018-03-05 11:45   ` Nicolás Ojeda Bär
2018-03-05 17:50     ` Eric Wong
2018-03-05 18:06       ` Nicolás Ojeda Bär
2018-03-19  7:43         ` watch performance [was: Relationship between public-inbox and ssoma?] Eric Wong
2018-03-15 15:30   ` internal format (was: Relationship between public-inbox and ssoma?) Stefan Monnier
2018-03-15 16:40     ` Eric Wong
2018-03-15 18:49       ` internal format Stefan Monnier
2018-03-15 20:14         ` Eric Wong
2018-03-15 21:05           ` Stefan Monnier [this message]
2018-03-15 21:21             ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvin9xqct2.fsf-monnier+Inbox@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).