From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 585A91F453 for ; Sun, 27 Jan 2019 18:37:58 +0000 (UTC) Received: by mail-qk1-x742.google.com with SMTP id g125so8232164qke.4 for ; Sun, 27 Jan 2019 10:37:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; h=mime-version:from:date:message-id:subject:to; bh=510gdYyTRdBnLXKETW3mlWPRK0KXTs7I147kPHAVk1o=; b=HD/4TbD8XREgc5D64D+JVkFuR2IYlN7qmQftN7hFFnBf2HaTD8QFQpb5q/PJk0XDwX ZKMZb0mZKsaABcVK3BT6u9mhya3nb9MGwpnThg2zZm2O0MnwMk/CSWD325po4EgnSTQD BVNpcqCdUTY05MhC5wo+f4tk9+7NRI13C9wis= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=510gdYyTRdBnLXKETW3mlWPRK0KXTs7I147kPHAVk1o=; b=bdmUKrW8HyuVz+H6zrHuFKytDlsbM41tUcnK5eic3Mf50d6Bc2Jo3WpjsEK8rI0Uiz dICSi/Vv91wDvh5tRstAlofJzmZfBLB/13GC4Tbl9TKyyhBphxf6pWX5g4hWfJWHwyb/ L8NpUgd3Q3/9c4ViUw+IAi6v7Wh8stHvoPeVUidxMeqLHHQE+c5Y1h2ReAY9jvO6Vxt9 oBc3VMZcZ/mAw1470kX8ATmh+5ST3JpqU5rFralj8oHEBdA2ApEjtPNw5HFRaxZ6V46R nxD+eCtJh6pWsGxG7WKUD/uxo3xo3rHitP2pZaPf3jY775FBymU/yUUrMVNFJvdKCOJr LGzg== X-Gm-Message-State: AJcUukcZH1IWAfzOklVWMsOOJejXse+AgTcmoS6pueglUGv7oxwx1GXS +1yiJlXeUXYkT9wB0NIpidupu0+svXoTB1XNS4BeHIOggXg= X-Google-Smtp-Source: ALg8bN7crVAUTKmWZA0SEnZ+P1iJrm7aNkHTwRHT8qEwFDdGa1X3lCn6laJN4o5z3Nr3vlH5NXKUE7cLktrmEdN/xos= X-Received: by 2002:a37:a43:: with SMTP id 64mr17169405qkk.345.1548614276647; Sun, 27 Jan 2019 10:37:56 -0800 (PST) MIME-Version: 1.0 From: Konstantin Ryabitsev Date: Sun, 27 Jan 2019 20:37:44 +0200 Message-ID: Subject: RFC: Using public-inbox v2 repos for distributed patch lifecycle tracking To: meta@public-inbox.org Content-Type: text/plain; charset="UTF-8" List-Id: Hi, all: Here's something I've been mulling over for a while, and sorta goes hand-in-hand with what Eric has been doing with diff highlights work. We have significant duplication of functionality on lore.kernel.org between the public-inbox repository for LKML and the patchwork instance for the same list (https://lore.kernel.org/patchwork). I am currently working on a tool that would move a lot of patchwork's functionality to the developers' workstation by using the public-inbox archive of the mailing list that receives patches. After it is done, the tool should be able to do all of the following: - Keep track of patches and patch series, including series revisions - Allow developers to easily apply patch series directly from the public-inbox repository (by creating a mbox file of the series behind the scenes with adjusted git by-llines -- e.g. if someone replies to a patch with a "Acked-by: Foo Dev" or "Tested-by: Foo Bot", the original patch trailers are adjusted to reflect that new data) - Automatically recognising when patches have been applied to a repo and auto-"accepting" them - Allowing developers to see changes between series using interdiff Currently, the tool sticks various patch tracking information into a custom sqlite3 db, but I'm increasingly wondering if it makes more sense to have this machine-parseable data available as part of the repository itself -- say, as a git note on the commit-id of the message. In other words, if the patch is in a message in abcd1234:m, a refs/notes/patches entry for abcd1234 would have json-formatted data with patch tracking information similar to what patchwork has for it (see, for example, https://lore.kernel.org/patchwork/api/1.1/patches/1035986/), but without patchwork-specific bits and duplicate info like headers and patch contents. If we add these notes on lore.kernel.org itself, then we save a lot of redundant data processing on the client end. A developer who has mirror-cloned a public-inbox archive of a development mailing list would be able to start applying patches and series without having to parse tens of thousands of messages. I have limited experience with notes, however, and I'm curious if they are a good candidate for such task. A public-inbox repo of LKML, even after sharding, contains hundreds of thousands of messages. If many of them carry such notes, would that significantly increase the repository size and reduce its performance? I have similar thoughts about publishing CI-related information as notes to public-inbox commits, as well. This would allow centralised archival of distributed CI efforts, which is a common problem in distributed projects. Currently, bots flood lists with automated email as patch follow-ups, but if they could publish their CI information as refs/notes/ci/projectname in a public repository, we could mirror that back to lore.kernel.org via regular pulls of that ref for each defined projects and developers would be able to view CI reports from multiple distributed projects inside the same interface. Again, I'm not sure if git notes is the right tool for this. Another way to go about it would be to use something like a custom blockchain ledger, but I'm afraid that picking that would result in lower adoption rate due to general unease people have about blockchain (it's new, it's overhyped, and it's mired by association with cryptocoins). I'd love to hear your thoughts. One of my goals is to find a way to keep the distributed nature of Linux Kernel development without locking it into a single vendor (like GitHub) or a suite of tools (like GitLab, CircleCI, whatnot). We need to find a way to preserve and archive the data generated by such tools in a way that is easy to replicate and verify. -K