about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2017-04-13search: allow searching within mail diffs repobrowse
This can be tied into a repository browser to browse in-flight topics on a mailing list.
2017-04-12Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master: search: fix help message for searching within quotes learn: scan all inboxes when learning spam watchmaildir: do not reject lowercase flags on Maildir files searchview: show full (&x=t) messages in ascending chronlogical order searchview: add "t" id to link to thread overview extmsg: use updated mail-archive.com URL view: escape HTML description name
2017-04-11search: fix help message for searching within quotes
I'm not sure if people use either and it's not in mairix (where we base our abbreviations off of). Lets go with the shorter prefix since it's easier-to-type.
2017-04-04watchmaildir: do not reject lowercase flags on Maildir files
Dovecot uses 'a'..'z' (lowercase) to designate keywords in Maildir flags. This was preventing certain messages from being marked as spam. https://wiki2.dovecot.org/MailboxFormat/Maildir
2017-03-24searchview: show full (&x=t) messages in ascending chronlogical order
When displaying search results with full messages, it makes more sense to show them in ascending chronological order when going by date. Reverse chronological order makes more sense for search results which only show the subject.
2017-03-24searchview: add "t" id to link to thread overview
At least for the thread view (&x=t); this will make it easy to link to the overview.
2017-03-22extmsg: use updated mail-archive.com URL
Apparently mid.mail-archive.com does not support HTTPS, and the HTTP version redirects to the search query, anyways.
2017-03-14view: escape HTML description name
Otherwise funky filenames can cause HTML injection vulnerabilities (hope you have JavaScript disabled!)
2017-03-04repoobrowse: explicit EOF handling for git async callback
We need to ensure we've fully-drained the pipe before signalling EOF to the callback, since pipelining may not be the best choice with detachable processes in the future.
2017-03-04repobrowse: stop abbreviating object names
Ending up with potentially ambiguous identifiers in the future is not worth saving some bytes, in this case.
2017-03-04repobrowse: fixup format-patch display
We need to take the revision into account when generating patches :P While we're at it, disabiguate URLs by resolving refnames to (un-SHAttered) hex identifiers.
2017-03-03repobrowse: raw: show the resulting tree for commits and tags
Seeing the raw tag or commit is not very useful, but people tend to treat them as trees. This behavior is also shared by the "plain" endpoint in cgit.
2017-03-03repobrowse: src: show a nicer message for big files
It should be unlikely for a code repository to need any source files over 64K; and we can't display binaries in a meaningful way in HTML, anyways.
2017-03-03repobrowse: src/ endpoint requires a tip to be specified
Implying a tip would make for ambiguous URLs and ruin caching, so try to get everybody to hit the same URL. This also simplifies some of our other code since the tip is always in the request.
2017-03-03repobrowse: raw display avoids forking for small files
This is more efficient for the majority of source files which fit into a stock 64K Linux pipe buffer used by our interaction with git-cat-file.
2017-03-03repobrowse: avoid excessive buffering in raw endpoint
Relying on qspawn allows us to serve arbitrarily large files without excessive buffering. We'll special-case small files in the future to avoid qspawn, as those small files should fit comfortably in socket buffers.
2017-03-03repobrowse: remove unused "blob" endpoint
This is redundant with the "raw" endpoint.
2017-03-03repobrowse: consistently set text charset
For everything with relevant content, we'll try to set UTF-8 charset and reduce duplication when generating response headers.
2017-03-02repobrowse: rename "tree" endpoint to "src"
This is shorter, and makes more sense as the endpoint displays both tree listings and actual blob sources. This will also make rewriting existing URLs from cgit installations easier.
2017-03-02repobrowse: rework source view to use async cat-file API
This will allow most source files to be displayed without blocking public-inbox-httpd on slow disk access. However, we no longer support displaying source files larger than 65536 bytes (the size of a pipe on current Linux).
2017-02-24repobrowse: update documentation and variable naming
Another change from abandoning the cgit URL format.
2017-02-24repobrowse: update documentation for git patch generation
We abandoned cgit-compatible URLs, update documentation to match.
2017-02-24repobrowse: git tree view checks object asynchronously
... when inside public-inbox-httpd. This will allow the server to handle other requests/responses while waiting on "git cat-file --batch-check"
2017-02-24git: move async detection to runtime
We don't actually know what context we'll be called under, so detecting the mere use-ability of Danga::Socket is not sufficient.
2017-02-22repobrowse: eliminate unused query parameters
We will try to reduce the amount of query parameters as much as possible to make URLs more amenable to caching at various levels.
2017-02-22repobrowse: fixup revision handling
Revisions passed in the URL must not be ignored. This fixes some bugs introduced in commit f6244586ba4f5a5e7575e1254be8c9bbe303fce9 ("repobrowse: switch to new URL format to avoid query strings")
2017-02-21repobrowse: stop abbreviating commit hashes
Abbreviations can become ambiguous over time, and it seems other tools are fine with displaying unabbreviated hashes for commits. This should reduce workload for the search engines, too.
2017-02-19repobrowse: unconditionally remove trailing slash handling
We do not need specialized trailing slashes if we break URL compatibility from cgit, here. Removing trailing (and redundant) slashes improves our hit rates with across both server-side (varnish, squid) and client-side (browser) layers.
2017-02-19repobrowse: return git errors as text/plain, for now
For now, this avoids an HTML injection vector. We'll try to have more consistent error reporting in the future.
2017-02-17repobrowse: minor style cleanups
Avoid using '=>' arrow notation for arrays and array references, it is confusing and more verbose. Additionally, combine "use constant" statements when possible.
2017-02-17repobrowse: remove unnecessary import
We do not need to escape URIs in this file.
2017-02-17repobrowse: rename "plain" endpoint to "raw"
This name is shorter and matches terminology in gitweb and other popular git web viewers.
2017-02-16repobrowse: memoize git symbolic-ref resolution
The "HEAD" symbolic ref is rarely changed, so memoize it for now and avoid exposing it in URLs.
2017-02-16repobrowse: shorten "repo_info" to "-repo"
This makes it more consistent with how we use the Inbox objects for the main code.
2017-02-16repo: only read description if git
Other VCSes have other means of providing the description.
2017-02-16repobrowse: switch to new URL format to avoid query strings
Query strings make endpoint caching more difficult since they're order-independent. They are also more likely lost or truncated inadvertantly when copy+pasting, so try to avoid them for default endpoints. There's still some things which are broken and followup commits will be needed to fix them.
2017-02-15config: avoid circular loading dependency
We must lazilly load one of them, so load Inbox later since we need to parse the config, first.
2017-02-14repobrowse: do not unescape PATH_INFO twice
PSGI specs already require PATH_INFO to be unescaped. Followup-to: commit 364de65f8a6b5729027cb70228312a141430122f ("www: do not unescape PATH_INFO twice")
2017-02-14Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master: www: do not unescape PATH_INFO twice t/mime: quiet warnings for old versions of Email::Simple handle repeated References and In-Reply-To headers
2017-02-14searchidx: switch to accounting by message bytes
Xapian memory usage is tied to the size of the indexed text, so take the raw message size into account when deciding when to flush Xapian data. More importantly, we now flush Xapian before we have it buffer beyond our maximum; and we do it unconditionally to prevent even high priority processes from OOM-ing.
2017-02-14www: do not unescape PATH_INFO twice
PSGI specs already require PATH_INFO to be unescaped; so our tests were wrong, too.
2017-02-11handle repeated References and In-Reply-To headers
It seems possible for git-send-email(1) to generate repeated repeated instances of References and In-Reply-To headers, as evidenced in: https://public-inbox.org/git/20161111124541.8216-17-vascomalmeida@sapo.pt/raw This causes a mismatch between how our search indexer threads and how our HTML view handles threading. In the future, View.pm will use the smsg-parsed {references} field and avoid redoing Email::MIME header parsing. We will still need to figure out a way to deal with messages with repeated Message-IDs, at some point, too.
2017-02-11repo: lazily read description and cloneurl
This improves startup speed at the cost of CoW-friendliness for long-lived daemons (which can be fixed, later).
2017-02-10config: move try_cat function from inbox
This allows RepoConfig to be independent of the PublicInbox::Inbox class.
2017-02-10repo: add class for representing a code repo
This should hopefully allow us to organize our code better
2017-02-10repogit: add prototypes for error checking
And add a note to remove git_commit_title
2017-02-10repo: search index flushes for excessive active refs
For certain repos, having too many active refs will cause memory usage problems. Mitigate the Xapian problems, for now, and consider a switch to GDBM_File or similar for repos with more refs.
2017-02-10search: remove unnecessary abstractions and functionality
This simplifies the code a bit and reduces the translation overhead for looking directly at data from tools shipped with Xapian. While we're at it, fix thread-all.t :)
2017-02-10repo: search index no longer indexes for --contains
It's extraordinarily expensive to add these terms for each and every commit.
2017-02-09repo: increase search index flush granularity
We need to flush Xapian more frequently to account for gigantic commits which introduce lots of text, so do it when accounting for each line processed, and not for each commit processed.