From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.5 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 553D01F910 for ; Mon, 28 Nov 2022 05:32:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1669613552; bh=/qYyPnYv1J7FLXpA7Nzq8NMk05pkO4TPi5CnwnyAZG4=; h=From:To:Subject:Date:From; b=Gct5xPOTF4ki760kgd38QnGC6X5OekO7DJZr0e2g71NLV2D/ON3mz0/hq9pbn/9kR cmpbdWfreUvypuGoZq08/u3H09Utb/N6v6sRlrIOMOal2cyNKYE/n2nsX+4OLrNzyU JLuWzO6iqgBBNDdYDVzY/AnL/blwdvcrEx4DPqC0= From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 00/95] clone: multi-inbox/repo support... Date: Mon, 28 Nov 2022 05:30:57 +0000 Message-Id: <20221128053232.291618-1-e@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit List-Id: A large patchset, and not done, yet :P It's only tested live, but it seems to work reasonably well against live hosts... Behavior changes to public-inbox-clone are NOT final; but public-inbox-fetch|PublicInbox::Fetch will probably become thin wrappers around LeiMirror. --include=/--exclude= support now exists with glob support --keep-going and --dry-run support added, too, since it's make(1) influenced (more below) It supports coderepos, too, using --inbox-config=never (default: always); --project-list=, --manifest=, --objstore=, and --prune. key differences from grok-pull (grokmirror) for coderepos: * uses relative paths on the FS (dumb HTTP untested, but dumb HTTP is a goal for memory-constrained hosts). This means I can relocate coderepos freely within my FS or do sneakernet transfers across machines without having to `perl -ipe s/x/y/' on hundreds of info/alternates and config files. * CLI-only, no extra config files (may generate a Makefile, like individual inbox clones) * objstore repos fetches from remotes directly (does not need, use, nor benefit from hardlinks at all) * no sleep states It is not a full replacement for grokmirror * reliant on default `git gc' behavior for repack. This is OK since it's only one-way relationships between objstore and non-objstore repos. * no fsck support (probably will be in generated Makefile) * doesn't generate forkgroups nor manifest.js.gz (I may do this for coderepo Xapian indexing) It relies on parallel git-fetch for objstores, so `-j $NUM' calculations may end up being ($NUM * $NUM) in the worst case. Not sure how to best approach this... Maybe `-j $M,$N' similar to `lei q -j$M,$N` is a solution... Design note: This is an exercise in building make(1)-like parallelism using ->DESTROY callbacks for prerequisites; so it's a newish paradigm for me. It forced me to fix a reference cycle, already. TODO: repo|symlink pruning, --exit-code, retry/refetch, manpage updates Eric Wong (95): clone: support multi-inbox clone clone: support --include and --exclude with multi-clone clone: parallelize v2 epoch clones lei_mirror: async config retrieval for v2 w/ manifest lei_mirror: rely on DESTROY to index v2 inbox lei_mirror: rely on global process reaper clone: support parallel v1 clones lei_mirror: default to single job by default lei_mirror: move directory creation to v2-only path lei_mirror: retrieve description text asynchronously, too switch inotify/kevent stuff to v5.12 manifest: update module blurb + v5.12 lei_mirror: simplify _get_txt_start callers lei_mirror: elide description retrieval for v1|coderepo lei_mirror: add a hint for skipped epoch permissions lei_mirror: consolidate clone process management lei_mirror: load File::Path unconditionally lei_mirror: load most modules up-front lei_mirror: set gitweb.owner from manifest clone: support --dry-run / -n flag lei_mirror: initialize placeholders with "head" from manifest lei_mirror: support {reference} for v1 manifest clones lei_mirror: reduce noise on interrupted clones clone: support --inbox-config option lei_mirror: retrieve v2 description properly lei_mirror: reduce scope of v2 lock lei_mirror: allow --epoch on mixed v1/v2 clones lei_mirror: fix infinite loop in dependency resolution lei_mirror: defend against infinite loops lei_mirror: do not fetch descriptions if using manifest lei_mirror: require PublicInbox::Lock at use lei_mirror: fix glob semantics to match end-of-path lei_mirror: differentiate -entv vs -ent lei_mirror: support manifest {references} for v2 epochs lei_mirror: simplify v2 code paths clone: support --inbox-version lei_mirror: require Perl v5.12+ lei_mirror: ensure curl exits 22 on HTTP 404 responses lei_mirror: cleanup File::Temp OO usage lei_mirror: add `index' target to generated Makefile lei_mirror: do not write Makefile for --inbox-config=never lei_mirror: hoist out dump_manifest sub lei_mirror: avoid convoluted lazy_cb usage lei_mirror: simplify clone_v2_prep lei_mirror: support --objstore and forkgroups lei_mirror: cleanup process reaping logic lei_mirror: ensure git <1.8.5 fallback can use torsocks clone: flesh out --objstore behavior and document lei_mirror: always pack refs for coderepos lei_mirror: set description for non-inboxes, too lei_mirror: force --no-tags when fetching forkgroups lei_mirror: preserve permissions of existing alternates file lei_mirror: do not show ref updates w/o --verbose lei_mirror: drop git <1.8.5 support lei_mirror: make basename more descriptive lei_mirror: fix --dry-run for forkgroups lei_mirror: forkgroups use `git fetch --multiple' clone: move --dry-run handling to lei_mirror clone: drop unnecessary requires clone: use v5.12 clone: require `--objstore=' for default location lei_mirror: shorten remote names fetch: use v5.12 fetch: eliminate File::Temp->filename var lei_mirror: properly pack-refs in non-forkgroup repos lei_mirror: show child error error code on_destroy: support ->cancel callback lei_mirror: support resuming multi-repo clones lei_mirror: check fingerprints before fetching clone: support loading manifest.js.gz from destination lei_mirror: delay configuring forkgroups clone: canonicalize destination path from CLI clone|fetch: support passing --prune(-tags) to `git fetch' lei_mirror: avoid needless FD passing clone: support --keep-going/-k like make(1) lei_mirror: don't warn on missing manifest on initial clone lei_mirror: respect `./' and `../' prefixes for CLI args lei_mirror: --manifest= affects destination, too lei_mirror: update fingerprints when writing local manifest.js.gz lei_mirror: remove janky mirror.done stamp file lei_mirror: simplify most process spawning lei_mirror: run v1_done earlier on forkgroup done lei_mirror: simplify forkgroup-related subs lei_mirror: shorten scope mirror objects lei_mirror: set {head} from manifest lei_mirror: support {symlinks} from manifest lei_mirror: eliminate circular references lei_mirror: use curl -z/--timecond if manifest exists lei_mirror: avoid redundant curl `-f' use lei_mirror: omit trailing slash for git remote.*.url lei_mirror: set info/web/last-modified from manifest lei_mirror: don't clobber inbox.config.example if it exists lei_mirror: break out of fgrp fetch iteration early clone: support --project-list= for cgit lei_mirror: handle forkgroup changes Documentation/lei-add-external.pod | 4 +- Documentation/public-inbox-clone.pod | 76 ++ Documentation/public-inbox-fetch.pod | 6 + lib/PublicInbox/DSKQXS.pm | 5 +- lib/PublicInbox/DirIdle.pm | 4 +- lib/PublicInbox/FakeInotify.pm | 13 +- lib/PublicInbox/Fetch.pm | 50 +- lib/PublicInbox/In2Tie.pm | 4 +- lib/PublicInbox/InboxIdle.pm | 2 +- lib/PublicInbox/KQNotify.pm | 12 +- lib/PublicInbox/LEI.pm | 5 +- lib/PublicInbox/LeiMirror.pm | 1104 +++++++++++++++++++++----- lib/PublicInbox/ManifestJsGz.pm | 8 +- lib/PublicInbox/OnDestroy.pm | 5 +- lib/PublicInbox/TestCommon.pm | 1 + script/public-inbox-clone | 23 +- script/public-inbox-fetch | 4 +- t/on_destroy.t | 8 +- t/www_listing.t | 71 +- 19 files changed, 1148 insertions(+), 257 deletions(-)