user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Stefan Monnier <monnier@IRO.UMontreal.CA>
Cc: meta@public-inbox.org
Subject: Re: internal format
Date: Thu, 15 Mar 2018 21:21:44 +0000	[thread overview]
Message-ID: <20180315212144.GA3032@whir> (raw)
In-Reply-To: <jwvin9xqct2.fsf-monnier+Inbox@gnu.org>

Stefan Monnier <monnier@IRO.UMontreal.CA> wrote:
> >> For timing, I'm curious why you only consider
> >> "git rev-list --objects --all".  Which operation does this corresponds
> >> to in public-inbox and is that really the only one that is
> >> performance-sensitive?
> > That traverses the object graph (same walk used for repacking
> > where bitmaps don't help).
> 
> Yes, I understand what it does in Git, but I wonder why a full traversal
> of the graph is the only/main operation you care about.
> 
> Hmm... I guess your other operations are:
> - lookup by message-id (which is made efficient because you index files
>   by the message-id).
> - everything else is done by keeping another index (from NNTP article
>   number to message-id (or to blob?)), as in the case of Xapian.
> 
> Actually, if you directly index the blobs, you don't really need to
> index your file by message-id (you could keep the index from message-id
> to blobs external, just as is done for Xapian, right?).

Right, storing blob OIDs in Xapian means tree lookups are irrelevant
to read performance.  Since we can rely on Xapian for v2, we can
fix the graph traversal problem by simplifying the trees and
speed up writes by having smaller trees.

The only remaining performance pain point is the overall size of
repos (which we work around by partitioning).

      reply	other threads:[~2018-03-15 21:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-05  0:54 Relationship between public-inbox and ssoma? Nicolás Ojeda Bär
2018-03-05  2:07 ` Eric Wong
2018-03-05 11:45   ` Nicolás Ojeda Bär
2018-03-05 17:50     ` Eric Wong
2018-03-05 18:06       ` Nicolás Ojeda Bär
2018-03-19  7:43         ` watch performance [was: Relationship between public-inbox and ssoma?] Eric Wong
2018-03-15 15:30   ` internal format (was: Relationship between public-inbox and ssoma?) Stefan Monnier
2018-03-15 16:40     ` Eric Wong
2018-03-15 18:49       ` internal format Stefan Monnier
2018-03-15 20:14         ` Eric Wong
2018-03-15 21:05           ` Stefan Monnier
2018-03-15 21:21             ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180315212144.GA3032@whir \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    --cc=monnier@IRO.UMontreal.CA \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).