From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A1E241F44D; Mon, 1 Apr 2024 13:21:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1711977705; bh=frxMCD1QABC9887pw8dB3oAIXrxjgS3iQ5zZkrBS/+8=; h=Date:From:To:Subject:From; b=zGajcS0J+OnZn7+ZznilkxFmkZPO3thrkpTg1tJ9OyipTBF47u5xwwA+97QnI09VW ZjkYiPa0xc5+Vl6I4IstB/8059iVcPaHvIqk0m+n+U+SyLrbtT2EUi7F96Z++Gml5y K8IjNpenrZibbu6eAE2Z2WY+lnh4q3YjhxrwmGuY= Date: Mon, 1 Apr 2024 13:21:45 +0000 From: Eric Wong To: meta@public-inbox.org Subject: sample robots.txt to reduce WWW load Message-ID: <20240401132145.M567778@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline List-Id: Performance is still slow, and crawler traffic patterns tend to do bad things with caches at all levels, so I've regretfully had to experiment with robots.txt to mitigate performance problems. The /s/ solver endpoint remains expensive but commit 8d6a50ff2a44 (www: use a dedicated limiter for blob solver, 2024-03-11) seems to have helped significantly. All the multi-message endpoints (/[Tt]*) are of course expensive and have always been. git blob access over SATA 2 SSD isn't too fast, and HTML rendering is quite expensive in Perl. Keeping multiple zlib contexts for HTTP gzip also hurts memory usage, so we want to minimize the amount of time clients keep longer-lived allocations. Anyways, this robots.txt is what I've been experimenting with and (after a few days when bots pick it up) it seems to have significantly cut load on my system so I can actually work on performance problems[1] which show up. ==> robots.txt <== User-Agent: * Disallow: /*/s/ Disallow: /*/T/ Disallow: /*/t/ Disallow: /*/t.atom Disallow: /*/t.mbox.gz Allow: / I also disable git-archive snapshots for cgit || WwwCoderepo: Disallow: /*/snapshot/* [1] I'm testing a glibc patch which hopefully reduces fragmentation. I've gotten rid of many of the Disallow: entries temporarily since