* [PATCH] extsearchidx: deduplicate alternates based on st_dev + st_ino
@ 2020-11-23 23:32 Eric Wong
0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2020-11-23 23:32 UTC (permalink / raw)
To: meta
This allows us to filter out duplicate alternates entries in case
there's symlinks or bind mounts in play, as I (and perhaps some
other users) tend to use symlinks and/or bind mounts heavily.
---
lib/PublicInbox/ExtSearchIdx.pm | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 2cdc31cb..7ab0c4af 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -396,18 +396,28 @@ sub idx_init { # similar to V2Writable
my $info_dir = "$ALL/objects/info";
my $alt = "$info_dir/alternates";
my $mode = 0644;
- my (%old, @old, %new, @new);
+ my (@old, @new, %seen); # seen: st_dev + st_ino
if (-e $alt) {
open(my $fh, '<', $alt) or die "open $alt: $!";
$mode = (stat($fh))[2] & 07777;
- while (<$fh>) {
- push @old, $_ if !$old{$_}++;
+ while (my $line = <$fh>) {
+ chomp(my $d = $line);
+ if (my @st = stat($d)) {
+ next if $seen{"$st[0]\0$st[1]"}++;
+ } else {
+ warn "W: stat($d) failed (from $alt): $!\n";
+ }
+ push @old, $line;
}
}
for my $ibx (@{$self->{ibx_list}}) {
my $line = $ibx->git->{git_dir} . "/objects\n";
- next if $old{$line};
- $new{$line} = 1;
+ chomp(my $d = $line);
+ if (my @st = stat($d)) {
+ next if $seen{"$st[0]\0$st[1]"}++;
+ } else {
+ warn "W: stat($d) failed (from $ibx->{inboxdir}): $!\n";
+ }
push @new, $line;
}
if (scalar @new) {
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2020-11-23 23:32 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-23 23:32 [PATCH] extsearchidx: deduplicate alternates based on st_dev + st_ino Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).