user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH] extsearchidx: deduplicate alternates based on st_dev + st_ino
Date: Mon, 23 Nov 2020 23:32:29 +0000	[thread overview]
Message-ID: <20201123233229.17125-1-e@80x24.org> (raw)

This allows us to filter out duplicate alternates entries in case
there's symlinks or bind mounts in play, as I (and perhaps some
other users) tend to use symlinks and/or bind mounts heavily.
---
 lib/PublicInbox/ExtSearchIdx.pm | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 2cdc31cb..7ab0c4af 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -396,18 +396,28 @@ sub idx_init { # similar to V2Writable
 	my $info_dir = "$ALL/objects/info";
 	my $alt = "$info_dir/alternates";
 	my $mode = 0644;
-	my (%old, @old, %new, @new);
+	my (@old, @new, %seen); # seen: st_dev + st_ino
 	if (-e $alt) {
 		open(my $fh, '<', $alt) or die "open $alt: $!";
 		$mode = (stat($fh))[2] & 07777;
-		while (<$fh>) {
-			push @old, $_ if !$old{$_}++;
+		while (my $line = <$fh>) {
+			chomp(my $d = $line);
+			if (my @st = stat($d)) {
+				next if $seen{"$st[0]\0$st[1]"}++;
+			} else {
+				warn "W: stat($d) failed (from $alt): $!\n";
+			}
+			push @old, $line;
 		}
 	}
 	for my $ibx (@{$self->{ibx_list}}) {
 		my $line = $ibx->git->{git_dir} . "/objects\n";
-		next if $old{$line};
-		$new{$line} = 1;
+		chomp(my $d = $line);
+		if (my @st = stat($d)) {
+			next if $seen{"$st[0]\0$st[1]"}++;
+		} else {
+			warn "W: stat($d) failed (from $ibx->{inboxdir}): $!\n";
+		}
 		push @new, $line;
 	}
 	if (scalar @new) {

                 reply	other threads:[~2020-11-23 23:32 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201123233229.17125-1-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).