about summary refs log tree commit homepage
diff options
context:
space:
mode:
authorEric Wong <e@yhbt.net>2020-07-17 03:57:24 +0000
committerEric Wong <e@yhbt.net>2020-07-17 18:54:56 +0000
commit2ca7db34a51b858c9d7f6f7366afb9fffee86b6e (patch)
tree7b95d39de34415f423b116835939a812fefda9c0
parentd87dd0e6795870439422ee4f0039d0d76d1974b3 (diff)
downloadpublic-inbox-2ca7db34a51b858c9d7f6f7366afb9fffee86b6e.tar.gz
grok-pull is still painful with serialization on an old USB 2.0
HDD, but at least it can finish with flock(1) and disabling
parallelization.  While parallel "git fetch" doesn't seem so
bad, slow seeks are exacerbated by parallel reads in Xapian.
That means some updates can take days instead of hours.  The
same updates take only seconds or minutes on an SSD.
-rw-r--r--Documentation/public-inbox-index.pod10
-rwxr-xr-xexamples/grok-pull.post_update_hook.sh6
2 files changed, 16 insertions, 0 deletions
diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod
index b1b24917..ff2e5486 100644
--- a/Documentation/public-inbox-index.pod
+++ b/Documentation/public-inbox-index.pod
@@ -32,6 +32,16 @@ normal search functionality.
 
 =over
 
+=item --jobs=JOBS, -j
+
+Control the number of Xapian indexing jobs in a
+(L<public-inbox-v2-format(5)>) inbox.
+
+C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
+to disable parallel indexing.
+
+Default: the number of existing Xapian shards
+
 =item --compact / -c
 
 Compacts the Xapian DBs after indexing.  This is recommended
diff --git a/examples/grok-pull.post_update_hook.sh b/examples/grok-pull.post_update_hook.sh
index 3ead3944..ec4ae93e 100755
--- a/examples/grok-pull.post_update_hook.sh
+++ b/examples/grok-pull.post_update_hook.sh
@@ -1,4 +1,9 @@
 #!/bin/sh
+
+# use flock(1) from util-linux to avoid seek contention on slow HDDs
+# when using multiple `pull_threads' with grok-pull:
+# [ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock "$0" "$0" "$@" || :
+
 # post_update_hook for repos.conf as used by grok-pull, takes a full
 # git repo path as it's first and only arg.
 full_git_dir="$1"
@@ -119,6 +124,7 @@ then
                 : v2 inboxes may be init-ed with an empty msgmap
                 ;;
         *)
+                # if on HDD and limited RAM, add `-j0' w/ public-inbox 1.6.0+
                 $EATMYDATA public-inbox-index -v "$inbox_dir"
                 ;;
         esac