From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A16481F5AE for ; Fri, 17 Jul 2020 03:57:24 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] doc: add some recommendations around slow HDDs Date: Fri, 17 Jul 2020 03:57:24 +0000 Message-Id: <20200717035724.18244-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: grok-pull is still painful with serialization on an old USB 2.0 HDD, but at least it can finish with flock(1) and disabling parallelization. While parallel "git fetch" doesn't seem so bad, slow seeks are exacerbated by parallel reads in Xapian. That means some updates can take days instead of hours. The same updates take only seconds or minutes on an SSD. --- Documentation/public-inbox-index.pod | 10 ++++++++++ examples/grok-pull.post_update_hook.sh | 6 ++++++ 2 files changed, 16 insertions(+) diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod index b1b24917b..ff2e54867 100644 --- a/Documentation/public-inbox-index.pod +++ b/Documentation/public-inbox-index.pod @@ -32,6 +32,16 @@ normal search functionality. =over +=item --jobs=JOBS, -j + +Control the number of Xapian indexing jobs in a +(L) inbox. + +C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING) +to disable parallel indexing. + +Default: the number of existing Xapian shards + =item --compact / -c Compacts the Xapian DBs after indexing. This is recommended diff --git a/examples/grok-pull.post_update_hook.sh b/examples/grok-pull.post_update_hook.sh index 3ead39440..ec4ae93e8 100755 --- a/examples/grok-pull.post_update_hook.sh +++ b/examples/grok-pull.post_update_hook.sh @@ -1,4 +1,9 @@ #!/bin/sh + +# use flock(1) from util-linux to avoid seek contention on slow HDDs +# when using multiple `pull_threads' with grok-pull: +# [ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock "$0" "$0" "$@" || : + # post_update_hook for repos.conf as used by grok-pull, takes a full # git repo path as it's first and only arg. full_git_dir="$1" @@ -119,6 +124,7 @@ then : v2 inboxes may be init-ed with an empty msgmap ;; *) + # if on HDD and limited RAM, add `-j0' w/ public-inbox 1.6.0+ $EATMYDATA public-inbox-index -v "$inbox_dir" ;; esac