* [PATCH 2/2] lei_mirror: fetch most-recently-updated repos, first
2023-02-12 23:18 6% [PATCH 0/2] lei_mirror: more tweaks Eric Wong
@ 2023-02-12 23:18 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2023-02-12 23:18 UTC (permalink / raw)
To: meta
Within the same forkgroup, we can assume the most recently updated
repo has the most data, so fetch those, first. We'll save new clones
for last since we can preserve {reference} ordering for them.
---
lib/PublicInbox/LeiMirror.pm | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index dd6356bb..4dedac9b 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -23,7 +23,7 @@ use PublicInbox::SHA qw(sha256_hex sha1_hex);
use POSIX qw(strftime);
our $LIVE; # pid => callback
-our $FGRP_TODO; # objstore -> [ fgrp mirror objects ]
+our $FGRP_TODO; # objstore -> [[ to resume ], [ to clone ]]
our $TODO; # reference => [ non-fgrp mirror objects ]
our @PUH; # post-update hooks
@@ -404,9 +404,12 @@ sub fgrp_fetch_all {
(fetch_args($self->{lei}, $opt), qw(--no-tags --multiple));
};
push(@fetch, "-j$j") if $j;
- while (my ($osdir, $fgrpv) = each %$todo) {
+ while (my ($osdir, $fgrp_old_new) = each %$todo) {
my $f = "$osdir/config";
return if !keep_going($self);
+ my ($fgrpv, $new) = @$fgrp_old_new;
+ @$fgrpv = sort { $b->{-sort} <=> $a->{-sort} } @$fgrpv;
+ push @$fgrpv, @$new; # $new is ordered by references
my $cmd = ['git', "--git-dir=$osdir", qw(config -f), $f ];
# clobber group from previous run atomically
@@ -568,7 +571,8 @@ sub fgrp_enqueue {
my ($fgrp, $end) = @_; # $end calls fgrp_fetch_all
return if !keep_going($fgrp);
++$fgrp->{chg}->{nr_chg};
- push @{$FGRP_TODO->{$fgrp->{-osdir}}}, $fgrp;
+ my $dst = $FGRP_TODO->{$fgrp->{-osdir}} //= [ [], [] ]; # [ old, new ]
+ push @{$dst->[defined($fgrp->{-sort} ? 0 : 1)]}, $fgrp;
}
sub clone_v1 {
@@ -586,8 +590,12 @@ sub clone_v1 {
my $resume = -d $dst;
if (my $fgrp = forkgroup_prep($self, $uri)) {
$fgrp->{-fini} = $fini;
- $resume ? cmp_fp_do($fgrp, \&fgrp_enqueue, $end)
- : fgrp_enqueue($fgrp, $end);
+ if ($resume) {
+ $fgrp->{-sort} = $fgrp->{-ent}->{modified};
+ cmp_fp_do($fgrp, \&fgrp_enqueue, $end);
+ } else { # new repo, save for last
+ fgrp_enqueue($fgrp, $end);
+ }
} elsif ($resume) {
cmp_fp_do($self, \&resume_fetch, $uri, $fini);
} else { # normal clone
^ permalink raw reply related [relevance 7%]
* [PATCH 0/2] lei_mirror: more tweaks
@ 2023-02-12 23:18 6% Eric Wong
2023-02-12 23:18 7% ` [PATCH 2/2] lei_mirror: fetch most-recently-updated repos, first Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2023-02-12 23:18 UTC (permalink / raw)
To: meta
The proposed-for-git `fetch.hideRefs' isn't supported, yet;
I'm still testing to see if it's harmful for new clones
(I suspect so), and how to reduce it's impact while still
being able to clone all kernel forks on kernel.org
supporting RAM-constrained systems.
https://public-inbox.org/git/20230212090426.M558990@dcvr/
("fetch: support hideRefs to speed up connectivity checks")
Eric Wong (2):
lei_mirror: further reduce `git config' calls
lei_mirror: fetch most-recently-updated repos, first
lib/PublicInbox/LeiMirror.pm | 80 ++++++++++++++++++++++--------------
1 file changed, 49 insertions(+), 31 deletions(-)
^ permalink raw reply [relevance 6%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2023-02-12 23:18 6% [PATCH 0/2] lei_mirror: more tweaks Eric Wong
2023-02-12 23:18 7% ` [PATCH 2/2] lei_mirror: fetch most-recently-updated repos, first Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).