* [PATCH 6/6] cindex: switch --join to use dfpost7 by default
2023-12-08 3:54 6% [PATCH 0/6] cindex join stuff Eric Wong
@ 2023-12-08 3:54 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2023-12-08 3:54 UTC (permalink / raw)
To: meta
Post-image blob OIDs are what solver already works with, and
longer OIDs may not be available in historical mail archives.
`patchid' turns out to be unsuitable since:
1) git's default diff algorithm has changed over time
2) users may use different diff options to improve readability
Of course, we could eventually run `lei rediff' during the index
phase to regenerate patchids, but that's out-of-scope for now
and likely to be too expensive.
---
lib/PublicInbox/CodeSearchIdx.pm | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/CodeSearchIdx.pm b/lib/PublicInbox/CodeSearchIdx.pm
index 967933f2..5d420de2 100644
--- a/lib/PublicInbox/CodeSearchIdx.pm
+++ b/lib/PublicInbox/CodeSearchIdx.pm
@@ -34,9 +34,9 @@
# The $IBX_OFF here is ephemeral (per-join_data) and NOT related to
# the `ibx_off' column of `over.sqlite3' for extindex.
# @ROOT_COMMIT_OID_OFFS is space-delimited
-# In both cases, $PFX is typically the value of the patchid (XDFID) but it
-# can be configured to use any combination of patchid, dfpre, dfpost or
-# dfblob.
+# In both cases, $PFX is typically the value of the 7-(hex)char dfpost
+# XDFPOST but it can be configured to use any combination of patchid,
+# dfpre, dfpost or dfblob.
#
# WARNING: this is vulnerable to arbitrary memory usage attacks if we
# attempt to index or join against malicious coderepos with
@@ -1199,11 +1199,13 @@ sub init_join_prefork ($) {
require PublicInbox::CidxXapHelperAux;
require PublicInbox::XapClient;
my @unknown;
- my $pfx = $JOIN{prefixes} // 'patchid';
- for (split /\+/, $pfx) {
- my $v = $PublicInbox::Search::PATCH_BOOL_COMMON{$_} //
- push(@unknown, $_);
- push(@JOIN_PFX, split(/ /, $v));
+ my $pfx = $JOIN{prefixes} // 'dfpost7';
+ for my $p (split /\+/, $pfx) {
+ my $n = '';
+ $p =~ s/([0-9]+)\z// and $n = $1;
+ my $v = $PublicInbox::Search::PATCH_BOOL_COMMON{$p} //
+ push(@unknown, $p);
+ push(@JOIN_PFX, map { $_.$n } split(/ /, $v));
}
@unknown and die <<EOM;
E: --join=prefixes= contains unsupported prefixes: @unknown
^ permalink raw reply related [relevance 7%]
* [PATCH 0/6] cindex join stuff
@ 2023-12-08 3:54 6% Eric Wong
2023-12-08 3:54 7% ` [PATCH 6/6] cindex: switch --join to use dfpost7 by default Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2023-12-08 3:54 UTC (permalink / raw)
To: meta
1-2 are small speedups, 3-4 are dev improvements, and 5-6
ought to actually improve and future-proof join accuracy.
Eric Wong (6):
*search: simplify handling of Xapian term iterators
*search: favor wantarray form of xap_terms
xap_helper_cxx: drop chdir usage in build
makefile: add `check-build' target
xap_helper: support term length limit
cindex: switch --join to use dfpost7 by default
Makefile.PL | 13 +++++++
lib/PublicInbox/CodeSearch.pm | 15 ++++----
lib/PublicInbox/CodeSearchIdx.pm | 18 +++++-----
lib/PublicInbox/LeiInspect.pm | 1 -
lib/PublicInbox/LeiSearch.pm | 17 ++++-----
lib/PublicInbox/LeiStore.pm | 13 +++----
lib/PublicInbox/Search.pm | 19 +++++-----
lib/PublicInbox/SearchIdx.pm | 13 ++++---
lib/PublicInbox/XapHelper.pm | 24 ++++++++++---
lib/PublicInbox/XapHelperCxx.pm | 19 ++++------
lib/PublicInbox/xap_helper.h | 11 +++++-
lib/PublicInbox/xh_cidx.h | 61 ++++++++++++++++++++++++--------
lib/PublicInbox/xh_mset.h | 2 +-
t/xap_helper.t | 33 +++++++++++++++++
14 files changed, 177 insertions(+), 82 deletions(-)
^ permalink raw reply [relevance 6%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2023-12-08 3:54 6% [PATCH 0/6] cindex join stuff Eric Wong
2023-12-08 3:54 7% ` [PATCH 6/6] cindex: switch --join to use dfpost7 by default Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).