From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 00/14] IT'S ALIVE! www loads cindex join data
Date: Tue, 28 Nov 2023 14:56:13 +0000 [thread overview]
Message-ID: <20231128145628.1455176-1-e@80x24.org> (raw)
8/14 is the killer one which actually makes the cindex data
useful for WWW and powering solver. Keep in mind, I've had
to cap solver at 3 coderepos as a temporary measure since
there's a lot of "weak" joins we should be weeding out.
More documentation coming, but cindex joins are very much
a fuzzy thing which will have to deal with false positives
and such. So figuring out the scoring for sanity would
make sense...
Fortunately, --join=aggressive,reset only takes ~1 hour for me,
so probably 1/3 that on modern hardware. Incremental
`-cindex --join' (no suboptions) usually takes <5 minutes if
done frequently.
New performance problem: solver could definitely be smarter
about dealing with common roots/groups. For the longest time,
I've only had 1 coderepo per-inbox, having hundreds is wacky.
Actual searching against the cindex isn't done, yet, but
that's kinda straightforward.
Eric Wong (14):
test_common: create_*: detect changes all parameters
t/cindex*: require SCM_RIGHTS for these tests
codesearch: eliminate redundant substitutions
solver: schedule cleanup after synchronous git->check
xap_helper.h: move cindex endpoints to separate file
xap_helper: implement mset endpoint for WWW, IMAP, etc...
hval: use File::Spec to make relative paths for href
www: load and use cindex join data
git: speed up ->git_path for non-worktrees
cindex: require `-g GIT_DIR' or `-r PROJECT_ROOT'
git: speed up Git->new by 5% or so
admin: resolve_git_dir respects symlinks
cindex: extra quit checks
www: start working on a repo listing
Documentation/public-inbox-cindex.pod | 2 +-
MANIFEST | 3 +
Makefile.PL | 8 +-
lib/PublicInbox/Admin.pm | 25 +-
lib/PublicInbox/CodeSearch.pm | 162 ++++++++++-
lib/PublicInbox/CodeSearchIdx.pm | 52 ++--
lib/PublicInbox/Config.pm | 39 ++-
lib/PublicInbox/Git.pm | 27 +-
lib/PublicInbox/Hval.pm | 12 +-
lib/PublicInbox/RepoList.pm | 39 +++
lib/PublicInbox/Search.pm | 42 +++
lib/PublicInbox/SearchIdx.pm | 10 +-
lib/PublicInbox/SolverGit.pm | 9 +-
lib/PublicInbox/TestCommon.pm | 35 ++-
lib/PublicInbox/View.pm | 7 +-
lib/PublicInbox/WWW.pm | 1 +
lib/PublicInbox/WwwCoderepo.pm | 44 ++-
lib/PublicInbox/WwwStream.pm | 11 +-
lib/PublicInbox/WwwText.pm | 19 +-
lib/PublicInbox/XapHelper.pm | 51 ++--
lib/PublicInbox/XapHelperCxx.pm | 14 +-
lib/PublicInbox/xap_helper.h | 379 +++++++-------------------
lib/PublicInbox/xh_cidx.h | 244 +++++++++++++++++
lib/PublicInbox/xh_mset.h | 96 +++++++
script/public-inbox-cindex | 38 ++-
t/admin.t | 12 +
t/cindex-join.t | 9 +-
t/cindex.t | 91 ++++++-
t/xap_helper.t | 53 +++-
xt/solver.t | 3 +-
30 files changed, 1111 insertions(+), 426 deletions(-)
create mode 100644 lib/PublicInbox/RepoList.pm
create mode 100644 lib/PublicInbox/xh_cidx.h
create mode 100644 lib/PublicInbox/xh_mset.h
next reply other threads:[~2023-11-28 14:56 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-28 14:56 Eric Wong [this message]
2023-11-28 14:56 ` [PATCH 01/14] test_common: create_*: detect changes all parameters Eric Wong
2023-11-28 14:56 ` [PATCH 02/14] t/cindex*: require SCM_RIGHTS for these tests Eric Wong
2024-01-29 21:23 ` [PATCH 0/2] pure Perl sendmsg/recvmsg on *BSD Eric Wong
2024-01-29 21:23 ` [PATCH 1/2] syscall: update formatting to match our codebase Eric Wong
2024-01-29 21:23 ` [PATCH 2/2] syscall: use pure Perl sendmsg/recvmsg on *BSD Eric Wong
2024-04-06 0:43 ` Gaelan Steele
2024-04-08 9:48 ` [RFT] syscall: set default constants for Inline::C platforms Eric Wong
2024-04-08 12:12 ` Gaelan Steele
2024-04-08 20:11 ` Eric Wong
2023-11-28 14:56 ` [PATCH 03/14] codesearch: eliminate redundant substitutions Eric Wong
2023-11-28 14:56 ` [PATCH 04/14] solver: schedule cleanup after synchronous git->check Eric Wong
2023-11-28 14:56 ` [PATCH 05/14] xap_helper.h: move cindex endpoints to separate file Eric Wong
2023-11-28 14:56 ` [PATCH 06/14] xap_helper: implement mset endpoint for WWW, IMAP, etc Eric Wong
2023-11-28 14:56 ` [PATCH 07/14] hval: use File::Spec to make relative paths for href Eric Wong
2023-11-28 14:56 ` [PATCH 08/14] www: load and use cindex join data Eric Wong
2023-11-28 14:56 ` [PATCH 09/14] git: speed up ->git_path for non-worktrees Eric Wong
2023-11-28 14:56 ` [PATCH 10/14] cindex: require `-g GIT_DIR' or `-r PROJECT_ROOT' Eric Wong
2023-11-28 14:56 ` [PATCH 11/14] git: speed up Git->new by 5% or so Eric Wong
2023-11-28 14:56 ` [PATCH 12/14] admin: resolve_git_dir respects symlinks Eric Wong
2023-11-28 14:56 ` [PATCH 13/14] cindex: extra quit checks Eric Wong
2023-11-28 14:56 ` [PATCH 14/14] www: start working on a repo listing Eric Wong
2023-11-28 17:55 ` [PATCH 15/14] www: load cindex join data for ->ALL, too Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231128145628.1455176-1-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).