user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 5/5] fetch: support v2 w/o manifest on old WWW
  2021-09-24 10:56  7% [PATCH 0/5] clone|fetch: flesh out partial mirror support Eric Wong
@ 2021-09-24 10:56  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2021-09-24 10:56 UTC (permalink / raw)
  To: meta

There may still be pre-manifest.js.gz versions of
PublicInbox::WWW running and serving v2 inboxes.

While -clone and "add-external --mirror" were working, -fetch
was failing due to 301 redirect to $INBOX_URL/manifest.js.gz/
and not the expected 404.  Update the code to deal with a JSON
decode error (from the 301) and ensure v2 epochs detection is
correct (and not using a shadowed variable).
---
 lib/PublicInbox/Fetch.pm | 12 +++++++-----
 t/v2mirror.t             |  8 ++++++++
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/Fetch.pm b/lib/PublicInbox/Fetch.pm
index 7f60b619..7881b402 100644
--- a/lib/PublicInbox/Fetch.pm
+++ b/lib/PublicInbox/Fetch.pm
@@ -60,11 +60,13 @@ sub do_manifest ($$$) {
 	$opt->{$_} = $lei->{$_} for (0..2);
 	my $cerr = PublicInbox::LeiMirror::run_reap($lei, $curl_cmd, $opt);
 	if ($cerr) {
-		return [ 404 ] if ($cerr >> 8) == 22; # 404 Missing
+		return [ 404, $muri ] if ($cerr >> 8) == 22; # 404 Missing
 		$lei->child_error($cerr, "@$curl_cmd failed");
 		return;
 	}
-	my $m1 = PublicInbox::LeiMirror::decode_manifest($ft, $fn, $muri);
+	my $m1 = eval {
+		PublicInbox::LeiMirror::decode_manifest($ft, $fn, $muri);
+	} or return [ 404, $muri ];
 	my $mdiff = { %$m1 };
 
 	# filter out unchanged entries.  We check modified, too, since
@@ -83,7 +85,7 @@ sub do_manifest ($$$) {
 	}
 	my (undef, $v1_path, @v2_epochs) =
 		PublicInbox::LeiMirror::deduce_epochs($mdiff, $ibx_uri->path);
-	[ 200, $v1_path, \@v2_epochs, $muri, $ft, $mf, $m1 ];
+	[ 200, $muri, $v1_path, \@v2_epochs, $ft, $mf, $m1 ];
 }
 
 sub get_fingerprint2 {
@@ -106,7 +108,7 @@ sub do_fetch { # main entry point
 	} else { # v2:
 		require PublicInbox::MultiGit;
 		$mg = PublicInbox::MultiGit->new($dir, 'all.git', 'git');
-		my @epochs = $mg->git_epochs;
+		@epochs = $mg->git_epochs;
 		my ($git_url, $epoch);
 		for my $nr (@epochs) { # try newest epoch, first
 			my $edir = "$dir/git/$nr.git";
@@ -135,7 +137,7 @@ EOM
 	PublicInbox::LeiMirror::write_makefile($dir, $ibx_ver);
 	$lei->qerr("# inbox URL: $ibx_uri/");
 	my $res = do_manifest($lei, $dir, $ibx_uri) or return;
-	my ($code, $v1_path, $v2_epochs, $muri, $ft, $mf, $m1) = @$res;
+	my ($code, $muri, $v1_path, $v2_epochs, $ft, $mf, $m1) = @$res;
 	if ($code == 404) {
 		# any pre-manifest.js.gz instances running? Just fetch all
 		# existing ones and unconditionally try cloning the next
diff --git a/t/v2mirror.t b/t/v2mirror.t
index fa4a717d..a625646d 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -376,6 +376,14 @@ EOM
 	my @g_last = grep { -w $_ } glob("$dst/git/*.git");
 	is_deeply(\@g_last, [ $g_all[-1] ], 'partial clone of ~0 worked');
 
+	chmod(0755, $g_all[0]) or xbail "chmod $!";
+	my @before = glob("$g_all[0]/objects/*/*");
+	run_script([qw(-fetch -v)], undef, { -C => $dst, 2 => \($err = '') });
+	is($?, 0, 'scraping fetch on old PublicInbox::WWW') or diag $err;
+	my @after = glob("$g_all[0]/objects/*/*");
+	ok(scalar(@before) < scalar(@after),
+		'fetched 0.git after enabling write-bit');
+
 	$td->join('TERM');
 }
 

^ permalink raw reply related	[relevance 7%]

* [PATCH 0/5] clone|fetch: flesh out partial mirror support
@ 2021-09-24 10:56  7% Eric Wong
  2021-09-24 10:56  7% ` [PATCH 5/5] fetch: support v2 w/o manifest on old WWW Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2021-09-24 10:56 UTC (permalink / raw)
  To: meta

The --epoch=RANGE feature discussed last week[1] is implemented.
There's also a bunch of fixes and improvements for handling
partial fetches from work started last week.

There's also a significant amount of work done to ensure the
client-side code works on servers running old, pre-manifest
versions of public-inbox.

I'm not sure if there's pre-manifest.js.gz versions of
public-inbox out there, but it's only ~2 years old and I can
understand if some admins have been preoccupied with the
pandemic and unable to upgrade :/

[1] https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u

Eric Wong (5):
  clone|--mirror: support --epoch=RANGE for partial clones
  fetch: fix skipping with multi-epoch inboxes
  clone|--mirror: fix and test against pre-manifest WWW
  clone|fetch|--mirror: cull manifest in partial mirrors
  fetch: support v2 w/o manifest on old WWW

 Documentation/lei-add-external.pod   |  15 +++
 Documentation/public-inbox-clone.pod |  15 +++
 lib/PublicInbox/Fetch.pm             |  27 ++++--
 lib/PublicInbox/LEI.pm               |   2 +-
 lib/PublicInbox/LeiMirror.pm         | 130 +++++++++++++++++++++++---
 lib/PublicInbox/TestCommon.pm        |   1 +
 script/public-inbox-clone            |   3 +-
 t/lei-mirror.t                       |   8 ++
 t/v2mirror.t                         | 135 +++++++++++++++++++++++++--
 9 files changed, 306 insertions(+), 30 deletions(-)

^ permalink raw reply	[relevance 7%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-09-24 10:56  7% [PATCH 0/5] clone|fetch: flesh out partial mirror support Eric Wong
2021-09-24 10:56  7% ` [PATCH 5/5] fetch: support v2 w/o manifest on old WWW Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).