user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* Threading in git repo?
@ 2019-03-13 23:07 Bjorn Helgaas
  2019-03-14  7:44 ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2019-03-13 23:07 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Hi Eric,

As far as I can tell, pi git repos have no branching: each new message
is added as a child commit of the most recent message, even if it is a
response to an older message.  Have you considered making the new
message a child of the message it is responding to?

I'm fiddling with making neomutt read a pi git repo.  Currently I only
read the git log info (not the commit bodies).  It's pretty fast to
read the author, date, and subject (since you conveniently stash them
in the commit metadata), but since I'm not reading the mail headers,
neomutt can't do all its threading magic.

It seems like working out the threading could be done once at the time
the message is added to the git repo, and threads could appear as
branches in the repo.

Bjorn

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Threading in git repo?
  2019-03-13 23:07 Threading in git repo? Bjorn Helgaas
@ 2019-03-14  7:44 ` Eric Wong
  2019-03-18 21:38   ` Bjorn Helgaas
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2019-03-14  7:44 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: meta

Bjorn Helgaas <helgaas@kernel.org> wrote:
> Hi Eric,
> 
> As far as I can tell, pi git repos have no branching: each new message
> is added as a child commit of the most recent message, even if it is a
> response to an older message.  Have you considered making the new
> message a child of the message it is responding to?

Correct, there is no branching.  Doing threading in git does not
work because of out-of-order message delivery (which is common
in SMTP).  public-inbox-index scanning (along with notmuch and
mairix) are all resilient to out-of-order message delivery when
doing threading.

> I'm fiddling with making neomutt read a pi git repo.  Currently I only
> read the git log info (not the commit bodies).  It's pretty fast to
> read the author, date, and subject (since you conveniently stash them
> in the commit metadata), but since I'm not reading the mail headers,
> neomutt can't do all its threading magic.

neomutt could read the over.sqlite3 database...
However, I can't guarantee it's stability, either (since it's
in the "xap$VER" directory where $VER is 15, now).

Perhaps improving NNTP support in neomutt is the best way to go?

public-inbox-nntpd has room for improvement, too (see TODO)

> It seems like working out the threading could be done once at the time
> the message is added to the git repo, and threads could appear as
> branches in the repo.

Not really.  It'd still have to support "ghost" messages to
account for out-of-order message delivery; and the threading
logic can be improved and tweaked:

  https://public-inbox.org/meta/20190129075644.3917-1-e@80x24.org/

If the git commit messages all had key headers
(Message-ID/From/To/Cc/References/In-Reply-To/Subject), then
yes; then a SQLite/Xapian-agnostic client could be taught to
read and do threading based on that; with fewer git ODB
accesses.  I don't think it's worth introducing at this
time, though.

NNTP seems the best and least-fragile way forward.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Threading in git repo?
  2019-03-14  7:44 ` Eric Wong
@ 2019-03-18 21:38   ` Bjorn Helgaas
  2019-03-18 23:04     ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2019-03-18 21:38 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Mar 14, 2019 at 07:44:47AM +0000, Eric Wong wrote:
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> > As far as I can tell, pi git repos have no branching: each new message
> > is added as a child commit of the most recent message, even if it is a
> > response to an older message.  Have you considered making the new
> > message a child of the message it is responding to?
> 
> Correct, there is no branching.  Doing threading in git does not
> work because of out-of-order message delivery (which is common
> in SMTP).  public-inbox-index scanning (along with notmuch and
> mairix) are all resilient to out-of-order message delivery when
> doing threading.

Oh, I hadn't thought about out-of-order delivery.  That definitely is
a problem.

> > I'm fiddling with making neomutt read a pi git repo.  Currently I only
> > read the git log info (not the commit bodies).  It's pretty fast to
> > read the author, date, and subject (since you conveniently stash them
> > in the commit metadata), but since I'm not reading the mail headers,
> > neomutt can't do all its threading magic.
> 
> neomutt could read the over.sqlite3 database...
> However, I can't guarantee it's stability, either (since it's
> in the "xap$VER" directory where $VER is 15, now).
> 
> Perhaps improving NNTP support in neomutt is the best way to go?

I'm still hoping to get to a solution using a local public-inbox
archive, without requiring a network connection or even additional
local servers.

> If the git commit messages all had key headers
> (Message-ID/From/To/Cc/References/In-Reply-To/Subject), then
> yes; then a SQLite/Xapian-agnostic client could be taught to
> read and do threading based on that; with fewer git ODB
> accesses.  I don't think it's worth introducing at this
> time, though.

If I understand correctly (sorry, I'm a newbie to public-inbox and git
internals) you're saying that one approach would be to copy some of
the headers from the message body into the commit message, which means
they'd be in the git "commit" object in addition to being in the blob,
which in turn would mean a client could do threading by reading only
the commit objects without reading the tree and blob objects.

I agree that's probably not worthwhile because it seems a little
kludgy and would only reduce the number of objects to read by a factor
of three.

Bjorn

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Threading in git repo?
  2019-03-18 21:38   ` Bjorn Helgaas
@ 2019-03-18 23:04     ` Eric Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2019-03-18 23:04 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: meta

Bjorn Helgaas <helgaas@kernel.org> wrote:
> I'm still hoping to get to a solution using a local public-inbox
> archive, without requiring a network connection or even additional
> local servers.

Would a Perl script which works like mairix(*) be acceptable to you?

I have something planned along those lines...
scripts/dupe-finder could be used as a starting point.

Maybe a FUSE filesystem to export an mbox as a Maildir or
mbox would work, too.  But it's probably too expensive and
slow, especially for Maildirs.

* mairix indexes Maildirs/mboxes, and dumps search results to
  a new Maildir/mbox which mutt can understand:
  git clone https://github.com/vandry/mairix.git
  git clone https://github.com/rc0/mairix.git

> On Thu, Mar 14, 2019 at 07:44:47AM +0000, Eric Wong wrote:
> > If the git commit messages all had key headers
> > (Message-ID/From/To/Cc/References/In-Reply-To/Subject), then
> > yes; then a SQLite/Xapian-agnostic client could be taught to
> > read and do threading based on that; with fewer git ODB
> > accesses.  I don't think it's worth introducing at this
> > time, though.
> 
> If I understand correctly

<snip>

Yup :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-19  2:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-13 23:07 Threading in git repo? Bjorn Helgaas
2019-03-14  7:44 ` Eric Wong
2019-03-18 21:38   ` Bjorn Helgaas
2019-03-18 23:04     ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).