about summary refs log tree commit homepage
path: root/script/public-inbox-clone
DateCommit message (Collapse)
2024-06-07treewide: use \*STD(IN|OUT|ERR) consistently
Referencing the {IO} slot may not always be populated or work (e.g. with `-t' filetest) if there's no IO handle. Using merely using `\*' is shorter than typing out `{GLOB}', so just use the shortest form consistently. This may fix occasional and difficult-to-reproduce failures from redirecting STDERR in t/imap_searchqp.t
2023-03-18clone: support --purge to delete remotely-deleted repos
This lets us clean up disk space when repos are removed on the remote side.
2023-03-07doc: update public-inbox-clone examples and help
Basically, public-inbox-clone has become grok-pull without config files nor absolute paths.
2023-02-21lei_mirror: support --remote-manifest=URL
Since PublicInbox::WWW already generates manifest.js.gz, I'm using an alternate path with PublicInbox::WwwStatic to host the manifest.js.gz for coderepos at an alternate location. The following snippet lets me host https://yhbt.net/lore/pub/manifest.js.gz for mirrored git repositories, while https://yhbt.net/lore/manifest.js.gz (no `pub') remains for inbox mirroring. ==> sample.psgi <== use PublicInbox::WWW; use PublicInbox::WwwStatic; my $www = PublicInbox::WWW->new; # use default PI_CONFIG my $st = PublicInbox::WwwStatic->new(docroot => '/path/to/code'); my $www_cb = sub { my ($env) = @_; if ($env->{PATH_INFO} eq '/pub/manifest.js.gz') { local $env->{PATH_INFO} = '/manifest.js.gz'; my $res = $st->call($env); return $res if $res->[0] != 404; } $www->call($env); }; builder { enable 'ReverseProxy'; enable 'Head'; mount '/lore' => $www_cb; }
2023-01-18ipc+lei: switch to awaitpid
This avoids awkwardly stuffing an arrayref into callbacks which expect multiple arguments. IPC->awaitpid_init now allows pre-registering callbacks before spawning workers.
2023-01-06clone: implement --exit-code
Since public-inbox-clone is now useful for incremental updates with manifest, --exit-code belongs here, too.
2022-12-30clone: support --post-update-hook= from grokmirror
This should be compatible with both grokmirror 1 and 2 behavior and serialized on a per-repo basis.
2022-11-28clone: support --project-list= for cgit
grokmirror supports it, and we also support cgit, so this should make running mirrors easier. This will be useful for scripting purposes, too.
2022-11-28clone: support --keep-going/-k like make(1)
This can be useful for intermittent network errors, and the required code changes makes it less dependent on global state.
2022-11-28clone|fetch: support passing --prune(-tags) to `git fetch'
We need to be able to get rid of removed branches and tags on the remote. --prune-tags is implied for non-objstore repos, and incompatible with objstore repos.
2022-11-28clone: canonicalize destination path from CLI
We'll probably save the destination path somewhere, so ensure the path doesn't have redundant slashes and such
2022-11-28clone: support loading manifest.js.gz from destination
This will allow us to quickly check fingerprints against remotes with a single HTTP(S) request, saving us numerous `git show-refs' invocations.
2022-11-28clone: require `--objstore=' for default location
Allowing just `--objstore' without `=' was confusing, since it could eat one of the required parameters (URL or DESTINATION).
2022-11-28clone: use v5.12
Another small step in what will probably a be a decades-long quest to reduce startup time by a few milliseconds.
2022-11-28clone: drop unnecessary requires
These packages are all require-ed elsewhere.
2022-11-28clone: move --dry-run handling to lei_mirror
lei will probably support dry-run in more places, too.
2022-11-28lei_mirror: support --objstore and forkgroups
The {forkgroup} directive of grokmirror 2.x manifest.js.gz can facilitate more space savings and improved pack performance with pack.islands.
2022-11-28clone: support --inbox-version
This is part of `lei add-external --mirror', and it makes sense to have for development and testing. We'll also add a fallback in case somebody tries --inbox-version and fails due to a newer remote instances of public-inbox.
2022-11-28clone: support --inbox-config option
This allows avoiding 404s when trying _/text/config/raw on code repositories.
2022-11-28clone: support --dry-run / -n flag
It still makes HTTP(S) requests to retrieve the manifest or scrape HTML, but doesn't make permanent changes to the FS (aside from modifying {acm}time of ${TMPDIR-/tmp}).
2022-11-28clone: parallelize v2 epoch clones
This is a first step in supporting completely parallelized clones. Eventually, everything will be parallelized and dependencies will be managed via callbacks.
2022-11-28clone: support --include and --exclude with multi-clone
These will be handy when someone is interested in a subset of inboxes on a large hosting site.
2021-09-24clone|--mirror: support --epoch=RANGE for partial clones
Partial (v2) clones should be useful addition for users wanting to conserve storage while having fast access to recent messages. Continuing work started in 876e74283ff3 (fetch: ignore non-writable epoch dirs, 2021-09-17), this creates bare, read-only epoch git repos. These git repos have the remotes pre-configured, but does not fetch any objects. The goal is to allow users to set the writable bit on a previously-skipped epoch and start fetching it. Shell completion support may not be necessary given how short the epoch ranges are, here. Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u
2021-09-22lei: drop redundant WQ EOF callbacks
Redundant code is noise and therefore confusing :<
2021-09-12new public-inbox-{clone,fetch} commands
Setting up and maintaining git-only mirrors of v2 inboxes is complex since multiple commands are required to clone and fetch into epochs. Unlike grokmirror, these commands do not require any configuration. Instead, they rely on existing git config files and work like "git clone --mirror" and "git fetch", respectively. Like grokmirror, they use manifest.js.gz, but only on a per-inbox basis so users won't have to clone every inbox of a large instance nor edit config files to include/exclude inboxes they're interested in.