From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 8147F1F8C7 for ; Sun, 1 Aug 2021 20:40:28 +0000 (UTC) Received: by mail-qk1-x72d.google.com with SMTP id k7so14812998qki.11 for ; Sun, 01 Aug 2021 13:40:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=gH6avhootlzLcMnBYQ0XIpMXhO47hUVqrkH5ZHTBn/Y=; b=A0y4w7X7MBw8y2syZaEr73uTLkhJKN10oOfhZ7NVtwMsUAgShTx59CuyDxIVpLmDey pDpQTUQ3ruSBPEu1r8EJZwauIeCoFWxVnprrEjN2PPk6lVButrye+Uh7Hg7ql/Sc1HmC CxOYEAav2IUsSHJqkR/UN0SO1M3CUhKp15xnc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=gH6avhootlzLcMnBYQ0XIpMXhO47hUVqrkH5ZHTBn/Y=; b=Ryz7ZTr+vzvdEvKVjh6xTejlHFeBfBuNpVgGkWGPkzS4hWCiOffadZvXh1js9SAzDR 0nd+/dS22zzLoEWfENFc3aP3UUQIKnBM/gXVUzJKhwR6itUrkS0w74+kftFj/60Tpb41 /IzocFW0NgNzuviAl6BvR+2KK1vi7xHhnxcSiZTgpuKaSeVJtiG8rpUHaQ80HV3Yfmso eltx3FCyIlKvixPucbEhlC9SP0nprTN2f39UAgwiIQve2OO+eRTRZorO1eymNDj5U9oS sB37aR+3gWduSRoVCpIzUVohFVp5Vs0gXI3Pl4e3TH5SCN2z219uS/sUxKoPt7BBTWEi luhQ== X-Gm-Message-State: AOAM531M/QQ5gpcMlrY5bHpXJ1vKCpxs0vsOAjn/7w8IDFCMvDGti8yR 44j7PRrof+7YDtfiy+Xnzhho3w== X-Google-Smtp-Source: ABdhPJxaYWAJyh253UDBI/KnDOalPSud5PlH2A1ncqxR2uaip7tJdIllmFnFM3nyNh+pXBS03ezGcg== X-Received: by 2002:a05:620a:1124:: with SMTP id p4mr12247319qkk.143.1627850427331; Sun, 01 Aug 2021 13:40:27 -0700 (PDT) Received: from nitro.local ([89.36.78.230]) by smtp.gmail.com with ESMTPSA id 5sm4726388qko.53.2021.08.01.13.40.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Aug 2021 13:40:26 -0700 (PDT) Date: Sun, 1 Aug 2021 16:40:24 -0400 From: Konstantin Ryabitsev To: Eric Wong Cc: meta@public-inbox.org Subject: Re: --batch-size and --jobs combination Message-ID: <20210801204024.ieab3lr6yl2yqpsd@nitro.local> References: <20210729202836.7qdwxojjel6jmxh6@nitro.local> <20210729211321.GA23521@dcvr> <20210729212444.3jnmaq4vo4dnudk3@nitro.local> <20210729220629.GA29593@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210729220629.GA29593@dcvr> List-Id: On Thu, Jul 29, 2021 at 10:06:29PM +0000, Eric Wong wrote: > My gut says 1g batch-size seems too high (Xapian has extra > overhead) and could still eat too much into the kernel cache > (and slow down reads). 100m might be a more reasonable limit > for jobs=4 and 128G RAM. Okay, I have things up and running on one of the 4 edge nodes. You can access it and kick the tires at https://x-lore.kernel.org/. Initial observations: - I can't give any kind of reliable numbers for initial importing/indexing, as I was doing it piecemeal for a while to make sure that the indexer hooks were doing the right thing. Besides, this is a live system serving a lot of (static) content from the same partition where the indexing was done, so I/O was routinely under high and unpredictable load. Final import/index took 40+ hours, but I'll have more reliable numbers once I do it on 3 other systems. - Performance in /all/ seems laggy at times, probably depending on whether lvmcache has Xapian DBs in SSD cache or not. After a period of laggy performance, speed seems to dramatically improve, which is probably when most of the backend is in cache. - URLs are mapped a bit wonkily right now -- / redirects to /all/, since I expect that would be what most devs would want (pending feedback, I could be totally wrong). Wwwlisting is mapped to https://x-lore.kernel.org/lists.html since that's the URL currently containing the full archive. All of this may and will probably change. - I will bring up the rest of the nodes throughout the week, so x-lore.kernel.org will become more geoip-balanced. I will share any other observations once I have more data. Once all 4 nodes are up, I will share this more widely with kernel devs so they can kick some tires and report whether they are seeing decreased performance compared to current lore.kernel.org. It's entirely possible that my plan to use mirrors.edge.kernel.org nodes for this isn't one of my brightest ideas, in which case I may bring up several dedicated instances in multiple clouds instead. Thanks for all your work, Eric. Best regards, -K