From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS2044 198.145.29.0/24 X-Spam-Status: No, score=-3.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 59EF920248 for ; Tue, 19 Mar 2019 02:26:08 +0000 (UTC) Received: from localhost (odyssey.drury.edu [64.22.249.253]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3083D20850; Mon, 18 Mar 2019 21:38:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1552945098; bh=4vHjszRejFLlhXNVupe6OsHnnaXX4UAYoCN5ZM3Fato=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BFZKJIkJD8g93YHPg7SAWeAG0maOC5/sXvVPWRDmNo88ACXbbgVb+IRqjmmEkNmfQ LicHo5hbNrarYHdVJsV1FPAAUE2oqc18aIw8EaaocE46F9exunNHCz9LQ8dk9aT359 SpXtVIUlJemYMa+j26zFQVldC1Y5j/Zi4+8CejoU= Date: Mon, 18 Mar 2019 16:38:17 -0500 From: Bjorn Helgaas To: Eric Wong Cc: meta@public-inbox.org Subject: Re: Threading in git repo? Message-ID: <20190318213817.GA88541@google.com> References: <20190313230707.GB210027@google.com> <20190314074447.GA8156@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190314074447.GA8156@dcvr> User-Agent: Mutt/1.10.1 (2018-07-13) List-Id: On Thu, Mar 14, 2019 at 07:44:47AM +0000, Eric Wong wrote: > Bjorn Helgaas wrote: > > As far as I can tell, pi git repos have no branching: each new message > > is added as a child commit of the most recent message, even if it is a > > response to an older message. Have you considered making the new > > message a child of the message it is responding to? > > Correct, there is no branching. Doing threading in git does not > work because of out-of-order message delivery (which is common > in SMTP). public-inbox-index scanning (along with notmuch and > mairix) are all resilient to out-of-order message delivery when > doing threading. Oh, I hadn't thought about out-of-order delivery. That definitely is a problem. > > I'm fiddling with making neomutt read a pi git repo. Currently I only > > read the git log info (not the commit bodies). It's pretty fast to > > read the author, date, and subject (since you conveniently stash them > > in the commit metadata), but since I'm not reading the mail headers, > > neomutt can't do all its threading magic. > > neomutt could read the over.sqlite3 database... > However, I can't guarantee it's stability, either (since it's > in the "xap$VER" directory where $VER is 15, now). > > Perhaps improving NNTP support in neomutt is the best way to go? I'm still hoping to get to a solution using a local public-inbox archive, without requiring a network connection or even additional local servers. > If the git commit messages all had key headers > (Message-ID/From/To/Cc/References/In-Reply-To/Subject), then > yes; then a SQLite/Xapian-agnostic client could be taught to > read and do threading based on that; with fewer git ODB > accesses. I don't think it's worth introducing at this > time, though. If I understand correctly (sorry, I'm a newbie to public-inbox and git internals) you're saying that one approach would be to copy some of the headers from the message body into the commit message, which means they'd be in the git "commit" object in addition to being in the blob, which in turn would mean a client could do threading by reading only the commit objects without reading the tree and blob objects. I agree that's probably not worthwhile because it seems a little kludgy and would only reduce the number of objects to read by a factor of three. Bjorn