user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 6/6] cindex: switch --join to use dfpost7 by default
  2023-12-08  3:54  6% [PATCH 0/6] cindex join stuff Eric Wong
@ 2023-12-08  3:54  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2023-12-08  3:54 UTC (permalink / raw)
  To: meta

Post-image blob OIDs are what solver already works with, and
longer OIDs may not be available in historical mail archives.

`patchid' turns out to be unsuitable since:
1) git's default diff algorithm has changed over time
2) users may use different diff options to improve readability

Of course, we could eventually run `lei rediff' during the index
phase to regenerate patchids, but that's out-of-scope for now
and likely to be too expensive.
---
 lib/PublicInbox/CodeSearchIdx.pm | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/CodeSearchIdx.pm b/lib/PublicInbox/CodeSearchIdx.pm
index 967933f2..5d420de2 100644
--- a/lib/PublicInbox/CodeSearchIdx.pm
+++ b/lib/PublicInbox/CodeSearchIdx.pm
@@ -34,9 +34,9 @@
 # The $IBX_OFF here is ephemeral (per-join_data) and NOT related to
 # the `ibx_off' column of `over.sqlite3' for extindex.
 # @ROOT_COMMIT_OID_OFFS is space-delimited
-# In both cases, $PFX is typically the value of the patchid (XDFID) but it
-# can be configured to use any combination of patchid, dfpre, dfpost or
-# dfblob.
+# In both cases, $PFX is typically the value of the 7-(hex)char dfpost
+# XDFPOST but it can be configured to use any combination of patchid,
+# dfpre, dfpost or dfblob.
 #
 # WARNING: this is vulnerable to arbitrary memory usage attacks if we
 # attempt to index or join against malicious coderepos with
@@ -1199,11 +1199,13 @@ sub init_join_prefork ($) {
 	require PublicInbox::CidxXapHelperAux;
 	require PublicInbox::XapClient;
 	my @unknown;
-	my $pfx = $JOIN{prefixes} // 'patchid';
-	for (split /\+/, $pfx) {
-		my $v = $PublicInbox::Search::PATCH_BOOL_COMMON{$_} //
-			push(@unknown, $_);
-		push(@JOIN_PFX, split(/ /, $v));
+	my $pfx = $JOIN{prefixes} // 'dfpost7';
+	for my $p (split /\+/, $pfx) {
+		my $n = '';
+		$p =~ s/([0-9]+)\z// and $n = $1;
+		my $v = $PublicInbox::Search::PATCH_BOOL_COMMON{$p} //
+			push(@unknown, $p);
+		push(@JOIN_PFX, map { $_.$n } split(/ /, $v));
 	}
 	@unknown and die <<EOM;
 E: --join=prefixes= contains unsupported prefixes: @unknown

^ permalink raw reply related	[relevance 7%]

* [PATCH 0/6] cindex join stuff
@ 2023-12-08  3:54  6% Eric Wong
  2023-12-08  3:54  7% ` [PATCH 6/6] cindex: switch --join to use dfpost7 by default Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2023-12-08  3:54 UTC (permalink / raw)
  To: meta

1-2 are small speedups, 3-4 are dev improvements, and 5-6
ought to actually improve and future-proof join accuracy.

Eric Wong (6):
  *search: simplify handling of Xapian term iterators
  *search: favor wantarray form of xap_terms
  xap_helper_cxx: drop chdir usage in build
  makefile: add `check-build' target
  xap_helper: support term length limit
  cindex: switch --join to use dfpost7 by default

 Makefile.PL                      | 13 +++++++
 lib/PublicInbox/CodeSearch.pm    | 15 ++++----
 lib/PublicInbox/CodeSearchIdx.pm | 18 +++++-----
 lib/PublicInbox/LeiInspect.pm    |  1 -
 lib/PublicInbox/LeiSearch.pm     | 17 ++++-----
 lib/PublicInbox/LeiStore.pm      | 13 +++----
 lib/PublicInbox/Search.pm        | 19 +++++-----
 lib/PublicInbox/SearchIdx.pm     | 13 ++++---
 lib/PublicInbox/XapHelper.pm     | 24 ++++++++++---
 lib/PublicInbox/XapHelperCxx.pm  | 19 ++++------
 lib/PublicInbox/xap_helper.h     | 11 +++++-
 lib/PublicInbox/xh_cidx.h        | 61 ++++++++++++++++++++++++--------
 lib/PublicInbox/xh_mset.h        |  2 +-
 t/xap_helper.t                   | 33 +++++++++++++++++
 14 files changed, 177 insertions(+), 82 deletions(-)

^ permalink raw reply	[relevance 6%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2023-12-08  3:54  6% [PATCH 0/6] cindex join stuff Eric Wong
2023-12-08  3:54  7% ` [PATCH 6/6] cindex: switch --join to use dfpost7 by default Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).