public-inbox.git  about / heads / tags
an "archives first" approach to mailing lists
blob 64ee3138cf92c52736e70d08fb9e615625a785cd 8410 bytes (raw)
$ git show HEAD:Documentation/public-inbox-clone.pod	# shows this blob on the CLI

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
 
=head1 NAME

public-inbox-clone - "git clone --mirror" wrapper

=head1 SYNOPSIS

public-inbox-clone [OPTIONS] INBOX_URL [INBOX_DIR]

public-inbox-clone [OPTIONS] ROOT_URL [DESTINATION] # public-inbox 2.0+

=head1 DESCRIPTION

public-inbox-clone is a wrapper around C<git clone --mirror> for
making the initial clone of a remote HTTP(S) public-inbox.  It
allows cloning multi-epoch v2 inboxes with a single command and
zero configuration.

In public-inbox 2.0+, public-inbox-clone can create and maintain
a mirror of multiple inboxes or code repositories using manifest.js.gz
files like L<grok-pull(1)> from grokmirror.  L<public-inbox-fetch(1)> is
NOT required when using this mode.

It does not run L<public-inbox-init(1)> nor
L<public-inbox-index(1)>.  Those commands must be run separately
if serving/searching the mirror is required.  As-is,
public-inbox-clone is suitable for creating a git-only backup
without Xapian and SQLite indices.

When cloning a single inbox, public-inbox-clone creates a Makefile
with handy targets to update the inbox once indexed.
This Makefile may be edited by the user; it will
not be rewritten by L<public-inbox-fetch(1)> unless it is removed
completely.

public-inbox-clone does not use nor require any extra
configuration files (not even C<~/.public-inbox/config>),
but it can download snippets suitable for adding to any
L<public-inbox-config(5)> file.

L<public-inbox-fetch(1)> may be used to keep a single C<INBOX_DIR>
up-to-date.

For v2 inboxes, it will create a C<$INBOX_DIR/manifest.js.gz>
file to speed up subsequent L<public-inbox-fetch(1)>.

=head1 OPTIONS

=over

=item --epoch=RANGE

Restrict clones of L<public-inbox-v2-format(5)> inboxes to the
given range of epochs.  The range may be a single non-negative
integer or a (possibly open-ended) C<LOW..HIGH> range of
non-negative integers.  C<~> may be prefixed to either (or both)
integer values to represent the offset from the maximum possible
value.

For example, C<--epoch=~0> alone clones only the latest epoch,
C<--epoch=~2..> clones the three latest epochs.

Default: C<0..~0> or C<0..> or C<..~0>
(all epochs, all three examples are equivalent)

=item -I PATTERN

=item --include=PATTERN

When cloning a top-level with multiple inboxes via manifest,
only clone inboxes and repositories matching a given wildcard pattern
(using C<*?> and C<[]> is supported).

This is a new option in public-inbox 2.0+

=item --exclude=PATTERN

When cloning a top-level with multiple inboxes via manifest,
ignore inboxes and repositories matching the given wildcard pattern.
Supports the same wildcards as L</--include>

This is a new option in public-inbox 2.0+

=item --inbox-config=always|v2|v1|never

Whether or not to retrieve the C<$INBOX/_/text/config/raw> HTTP(S)
endpoint when cloning.

Since we can't deduce v1 inboxes from code repositories, setting this
to C<v2> or C<never> can allow faster clones of code repositories if
no v1 inboxes are present.

Default: C<always>

This is a new option in public-inbox 2.0+

=item --inbox-version=NUM

Force a remote public-inbox version (must be C<1> or C<2>).
This is auto-detected by default, and this option exists mainly
for testing.

This is a new option in public-inbox 2.0+

=item --objstore=DIR

Enables space savings when the remote C<manifest.js.gz>
includes C<forkgroup> entries as generated by grokmirror 2.x.

If C<DIR> does not start with C</>, C<./>, or C<../>, it is treated
as relative to the C<DESTINATION> directory.  If only C<--objstore=>
is specified where C<DIR> is an empty string (C<"">), then C<objstore>
(C<$DESTINATION/objstore>) is the implied value of C<DIR>.

This is a new option in public-inbox 2.0+

=item --manifest=FILE

When incrementally updating an existing mirror, load the given
manifest (typically C<manifest.js.gz>) to speed up updates.

By default, public-inbox writes the retrieved manifest to
C<$DESTINATION/manifest.js.gz>, this directive also
changes the destination to the specified C<FILE>

If C<FILE> does not start with C</>, C<./>, or C<../>, it is treated
as relative to the C<DESTINATION> directory.  If only C<--manifest=>
is specified where C<FILE> is an empty string (C<"">), then C<manifest.js.gz>
(C<$DESTINATION/manifest.js.gz>) is the implied value of C<FILE>.

When updating manifests with many forks using the same objstore,
git 2.41+ is highly recommended for performance as we automatically
use the C<fetch.hideRefs> feature to speed up negotiation.

C<--manifest=> is a new option in public-inbox 2.0+

=item --remote-manifest=URL|RELATIVE_PATH

Use an alternate location for the remote manifest.js.gz file.
This may be specified as a full absolute URL (e.g
C<--remote-manifest=https://80x24.org/lore/pub/manifest.js.gz>),
or a pathname relative to the ROOT_URL (e.g
C<--remote-manifest=pub/manifest.js.gz> when ROOT_URL is
C<https://80x24.org/lore/>

By default, C<ROOT_URL/manifest.js.gz> is used.

This is a new option in public-inbox 2.0+

=item --project-list=FILE

When cloning code repos from a manifest, generate a cgit-compatible
project list.

If C<FILE> does not start with C</>, C<./>, or C<../>, it is treated
as relative to the C<DESTINATION> directory.  If only C<--project-list=>
is specified where C<FILE> is an empty string (C<"">), then C<projects.list>
(C<$DESTINATION/projects.list>) is the implied value of C<FILE>.

This is a new option in public-inbox 2.0+

=item --post-update-hook=COMMAND

Hooks to run after a repository is cloned or updated, C<COMMAND> will
have the bare git repository destination given as its first and only
argument.

For v2 inboxes, this operates on a per-epoch basis.

May be specified multiple times to run multiple commands in the
order specified on the command-line.

This is a new option in public-inbox 2.0+

=item -p

=item --prune

Pass the C<--prune> and C<--prune-tags> flags to L<git-fetch(1)>
calls on incremental clones.

This is a new option in public-inbox 2.0+

=item --purge

Deletes entire repos which no longer exist in the remote manifest,
or are filtered out by C<--include=> or C<--exclude=>.

This is only useful when using C<--manifest>

This is a new option in public-inbox 2.0+

=item --exit-code

Exit with C<127> if no updates are done when relying on a manifest.
Updates include fingerprint mismatches in the manifest, new symlinks,
new repositories, and removed repositories from the L<--project-list>

This is a new option in public-inbox 2.0+

=item -k

=item --keep-going

Continue as much as possible after an error.

This is a new option in public-inbox 2.0+

=item -n

=item --dry-run

Show what would be done, without making any changes.

This is a new option in public-inbox 2.0+

=item -q

=item --quiet

Quiets down progress messages, also passed to L<git-fetch(1)>.

=item -v

=item --verbose

Increases verbosity, also passed to L<git-fetch(1)>.

=item --torsocks=auto|no|yes

=item --no-torsocks

Whether to wrap L<git(1)> and L<curl(1)> commands with L<torsocks(1)>.

Default: C<auto>

=item -j JOBS

=item --jobs=JOBS

The number of parallel processes to spawn at once for various network
operations using L<git(1)> and/or L<curl(1)>.

=back

=head1 EXAMPLES

=for comment
Sticking to smaller projects in examples to minimize load on servers

=over

=item To mirror the most recent epochs of dwarves and LTTng inboxes:

  public-inbox-clone --epoch=~0 \
	--include='*lttng*' --include='*dwarves' \
	https://80x24.org/lore/ /path/to/inbox-mirror

C<https://lore.kernel.org/> may be used instead of C<https://80x24.org/lore/>

=item To mirror all code repos of the sparse project:

  public-inbox-clone --objstore= --project-list= --prune \
	--include='*sparse*' --inbox-config=never \
	--remote-manifest=https://80x24.org/lore/pub/manifest.js.gz \
	https://80x24.org/lore/ /path/to/code-mirror

C<https://git.kernel.org/> may be used instead of C<https://80x24.org/lore/>
and the C<--remote-manifest> option can be omitted.

=back

=head1 CONTACT

Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>

The mail archives are hosted at L<https://public-inbox.org/meta/> and
L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>

=head1 COPYRIGHT

Copyright all contributors L<mailto:meta@public-inbox.org>

License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>

=head1 SEE ALSO

L<public-inbox-fetch(1)>, L<public-inbox-init(1)>, L<public-inbox-index(1)>

git clone https://public-inbox.org/public-inbox.git
git clone http://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/public-inbox.git