* [PATCH 0/4] favor shorter binary OID comparisons
@ 2021-07-25 0:43 Eric Wong
2021-07-25 0:43 ` [PATCH 1/4] extsearchidx: favor binary comparison in common case Eric Wong
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25 0:43 UTC (permalink / raw)
To: meta
We were doing unneccessary 40-byte hex comparisons for SHA-1s
in our code when 20-byte comparisons would've been sufficient.
3/4 is a typo fix which didn't hit any warnings before,
either...
Eric Wong (4):
extsearchidx: favor binary comparison in common case
lei_search: favor binary OID comparisons
lei_inspect: fix typo
lei_mail_sync: locations_for API uses oidbin for comparisons
lib/PublicInbox/ExtSearchIdx.pm | 4 ++--
lib/PublicInbox/LeiExportKw.pm | 7 +++----
lib/PublicInbox/LeiInspect.pm | 7 ++++---
lib/PublicInbox/LeiMailSync.pm | 7 ++++---
lib/PublicInbox/LeiSearch.pm | 6 +++---
t/lei_mail_sync.t | 4 ++--
6 files changed, 18 insertions(+), 17 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] extsearchidx: favor binary comparison in common case
2021-07-25 0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
@ 2021-07-25 0:43 ` Eric Wong
2021-07-25 0:43 ` [PATCH 2/4] lei_search: favor binary OID comparisons Eric Wong
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25 0:43 UTC (permalink / raw)
To: meta
We'll use 20-byte SHA-1 comparisons instead of 40-byte
hex representations for a minor reduction in memory
traffic.
---
lib/PublicInbox/ExtSearchIdx.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 51dbf54f..fb1f511e 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -784,16 +784,16 @@ ORDER BY docid,xnum ASC LIMIT 10000
$fetching = $min = $docid;
my $smsg = $ibx->over->get_art($xnum);
- my $oidhex = unpack('H*', $oidbin);
my $err;
if (!$smsg) {
$err = 'stale';
- } elsif ($smsg->{blob} ne $oidhex) {
+ } elsif (pack('H*', $smsg->{blob}) ne $oidbin) {
$err = "mismatch (!= $smsg->{blob})";
} else {
next; # likely, all good
}
# current_info already has eidx_key
+ my $oidhex = unpack('H*', $oidbin);
warn "$xnum:$oidhex (#$docid): $err\n";
my $del = $self->{oidx}->dbh->prepare_cached(<<'');
DELETE FROM xref3 WHERE ibx_id = ? AND xnum = ? AND oidbin = ?
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] lei_search: favor binary OID comparisons
2021-07-25 0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
2021-07-25 0:43 ` [PATCH 1/4] extsearchidx: favor binary comparison in common case Eric Wong
@ 2021-07-25 0:43 ` Eric Wong
2021-07-25 0:43 ` [PATCH 3/4] lei_inspect: fix typo Eric Wong
2021-07-25 0:43 ` [PATCH 4/4] lei_mail_sync: locations_for API uses oidbin for comparisons Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25 0:43 UTC (permalink / raw)
To: meta
Reduce memory traffic and code, too.
---
lib/PublicInbox/LeiExportKw.pm | 7 +++----
lib/PublicInbox/LeiSearch.pm | 6 +++---
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/LeiExportKw.pm b/lib/PublicInbox/LeiExportKw.pm
index 671a84df..42a5ff22 100644
--- a/lib/PublicInbox/LeiExportKw.pm
+++ b/lib/PublicInbox/LeiExportKw.pm
@@ -10,8 +10,7 @@ use Errno qw(EEXIST ENOENT);
sub export_kw_md { # LeiMailSync->each_src callback
my ($oidbin, $id, $self, $mdir) = @_;
- my $oidhex = unpack('H*', $oidbin);
- my $sto_kw = $self->{lse}->oid_keywords($oidhex) or return;
+ my $sto_kw = $self->{lse}->oidbin_keywords($oidbin) or return;
my $bn = $$id;
my ($md_kw, $unknown, @try);
if ($bn =~ s/:2,([a-zA-Z]*)\z//) {
@@ -57,13 +56,13 @@ sub export_kw_md { # LeiMailSync->each_src callback
# both tries failed
my $e = $!;
my $orig = '['.join('|', @fail).']';
+ my $oidhex = unpack('H*', $oidbin);
$lei->child_error(1, "link($orig, $dst) ($oidhex): $e");
}
sub export_kw_imap { # LeiMailSync->each_src callback
my ($oidbin, $id, $self, $mic) = @_;
- my $oidhex = unpack('H*', $oidbin);
- my $sto_kw = $self->{lse}->oid_keywords($oidhex) or return;
+ my $sto_kw = $self->{lse}->oidbin_keywords($oidbin) or return;
$self->{imap_mod_kw}->($self->{nwr}, $mic, $id, [ keys %$sto_kw ]);
}
diff --git a/lib/PublicInbox/LeiSearch.pm b/lib/PublicInbox/LeiSearch.pm
index 37bfc65e..79b2fd7d 100644
--- a/lib/PublicInbox/LeiSearch.pm
+++ b/lib/PublicInbox/LeiSearch.pm
@@ -42,9 +42,9 @@ sub _oid_kw { # retry_reopen callback
}
# returns undef if blob is unknown
-sub oid_keywords {
- my ($self, $oidhex) = @_;
- my @num = $self->over->blob_exists($oidhex) or return;
+sub oidbin_keywords {
+ my ($self, $oidbin) = @_;
+ my @num = $self->over->oidbin_exists($oidbin) or return;
$self->retry_reopen(\&_oid_kw, \@num);
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] lei_inspect: fix typo
2021-07-25 0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
2021-07-25 0:43 ` [PATCH 1/4] extsearchidx: favor binary comparison in common case Eric Wong
2021-07-25 0:43 ` [PATCH 2/4] lei_search: favor binary OID comparisons Eric Wong
@ 2021-07-25 0:43 ` Eric Wong
2021-07-25 0:43 ` [PATCH 4/4] lei_mail_sync: locations_for API uses oidbin for comparisons Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25 0:43 UTC (permalink / raw)
To: meta
Not sure how this wasn't caught, earlier...
---
lib/PublicInbox/LeiInspect.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/LeiInspect.pm b/lib/PublicInbox/LeiInspect.pm
index 574da7a7..c277520c 100644
--- a/lib/PublicInbox/LeiInspect.pm
+++ b/lib/PublicInbox/LeiInspect.pm
@@ -74,7 +74,7 @@ sub inspect_docid ($$;$) {
my $data = $doc->get_data;
$ent->{docid} = $docid;
$ent->{data_length} = length($data);
- $ent->{description} => $doc->get_description;
+ $ent->{description} = $doc->get_description;
$ent->{$_} = $doc->$_ for (qw(termlist_count values_count));
my $cur = $doc->termlist_begin;
my $end = $doc->termlist_end;
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] lei_mail_sync: locations_for API uses oidbin for comparisons
2021-07-25 0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
` (2 preceding siblings ...)
2021-07-25 0:43 ` [PATCH 3/4] lei_inspect: fix typo Eric Wong
@ 2021-07-25 0:43 ` Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25 0:43 UTC (permalink / raw)
To: meta
Favor oidbin use internally to reduce internal memory traffic.
---
lib/PublicInbox/LeiInspect.pm | 5 +++--
lib/PublicInbox/LeiMailSync.pm | 7 ++++---
t/lei_mail_sync.t | 4 ++--
3 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/LeiInspect.pm b/lib/PublicInbox/LeiInspect.pm
index c277520c..bf7a4836 100644
--- a/lib/PublicInbox/LeiInspect.pm
+++ b/lib/PublicInbox/LeiInspect.pm
@@ -14,10 +14,11 @@ sub inspect_blob ($$) {
my ($lei, $oidhex) = @_;
my $ent = {};
if (my $lse = $lei->{lse}) {
- my @docids = $lse ? $lse->over->blob_exists($oidhex) : ();
+ my $oidbin = pack('H*', $oidhex);
+ my @docids = $lse ? $lse->over->oidbin_exists($oidbin) : ();
$ent->{'lei/store'} = \@docids if @docids;
my $lms = $lse->lms;
- if (my $loc = $lms ? $lms->locations_for($oidhex) : undef) {
+ if (my $loc = $lms ? $lms->locations_for($oidbin) : undef) {
$ent->{'mail-sync'} = $loc;
}
}
diff --git a/lib/PublicInbox/LeiMailSync.pm b/lib/PublicInbox/LeiMailSync.pm
index 49e521da..82740d59 100644
--- a/lib/PublicInbox/LeiMailSync.pm
+++ b/lib/PublicInbox/LeiMailSync.pm
@@ -206,16 +206,16 @@ SELECT $op(uid) FROM blob2num WHERE fid = ?
# returns a { location => [ list-of-ids-or-names ] } mapping
sub locations_for {
- my ($self, $oidhex) = @_;
+ my ($self, $oidbin) = @_;
my ($fid, $sth, $id, %fid2id);
my $dbh = $self->{dbh} //= dbh_new($self);
$sth = $dbh->prepare('SELECT fid,uid FROM blob2num WHERE oidbin = ?');
- $sth->execute(pack('H*', $oidhex));
+ $sth->execute($oidbin);
while (my ($fid, $uid) = $sth->fetchrow_array) {
push @{$fid2id{$fid}}, $uid;
}
$sth = $dbh->prepare('SELECT fid,name FROM blob2name WHERE oidbin = ?');
- $sth->execute(pack('H*', $oidhex));
+ $sth->execute($oidbin);
while (my ($fid, $name) = $sth->fetchrow_array) {
push @{$fid2id{$fid}}, $name;
}
@@ -225,6 +225,7 @@ sub locations_for {
$sth->execute($fid);
my ($loc) = $sth->fetchrow_array;
unless (defined $loc) {
+ my $oidhex = unpack('H*', $oidbin);
warn "E: fid=$fid for $oidhex unknown:\n", map {
'E: '.(ref() ? $$_ : "#$_")."\n";
} @$ids;
diff --git a/t/lei_mail_sync.t b/t/lei_mail_sync.t
index f0605092..a5e5f5d3 100644
--- a/t/lei_mail_sync.t
+++ b/t/lei_mail_sync.t
@@ -24,7 +24,7 @@ is_deeply([$ro->folders($imap)], [$imap], 'IMAP folder with full GLOB');
is_deeply([$ro->folders('imaps://bob@[::1]/INBOX')], [$imap],
'IMAP folder with partial GLOB');
-is_deeply($ro->locations_for('deadbeef'),
+is_deeply($ro->locations_for("\xde\xad\xbe\xef"),
{ $imap => [ 1 ] }, 'locations_for w/ imap');
my $maildir = 'maildir:/home/user/md';
@@ -33,7 +33,7 @@ $lms->lms_begin;
ok($lms->set_src('deadbeef', $maildir, \$fname), 'set Maildir once');
ok($lms->set_src('deadbeef', $maildir, \$fname) == 0, 'set Maildir again');
$lms->lms_commit;
-is_deeply($ro->locations_for('deadbeef'),
+is_deeply($ro->locations_for("\xde\xad\xbe\xef"),
{ $imap => [ 1 ], $maildir => [ $fname ] },
'locations_for w/ maildir + imap');
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-07-25 0:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-07-25 0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
2021-07-25 0:43 ` [PATCH 1/4] extsearchidx: favor binary comparison in common case Eric Wong
2021-07-25 0:43 ` [PATCH 2/4] lei_search: favor binary OID comparisons Eric Wong
2021-07-25 0:43 ` [PATCH 3/4] lei_inspect: fix typo Eric Wong
2021-07-25 0:43 ` [PATCH 4/4] lei_mail_sync: locations_for API uses oidbin for comparisons Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).