In the beginning, there was readnews.
There were a couple hundred newsgroups, and anyone capable of reading without drooling on their terminal could read all the news in a not-unreasonable amount of time. (This is referenced in the Hackers Test...)
Today, with thousands of newsgroups and millions of posters, you can't review a list of all the newsgroups in a reasonable period of time.
The second news reader, rn, provided "kill files" that allowed messages to be premarked as read based on article headers.
The "kill rules," highly representative of the "rule-based" approach, come in two varieties:
Local - to a particular newsgroup
Global across all newsgroups
This is an "obvious" extension of kill files; various keywords can be combined with different weights to build up an article "score."
Articles with high scores are likely "most interesting" should be read first; articles with poor scores (typically below some threshold) may be eliminated forthwith.
One might have things like the following SLRN scoring rules:
% These rules are for all newsgroups... [*] % AOL has a slight tendancy towards having bozos... Score: -5 From: aol.com % Slight reversal for MIT... Score: 5 From: mit.edu % I certainly want to see anything coming from linus... Score: 5000 From: Linus Torvalds % And nothing from Bob Allistat... Score: -9999 From: Bob Allistat % If Linux is mentioned, favor the article a bit... Score: 5 Subject: Linux % In a Linux group, a "Linux" subject is rather uninformative... [*linux*] Score: -5 Subject: Linux % In database newsgroups, I want to highlight Linux stuff... [comp.database*] Score: 100 Subject: linux % And vice-versa [comp.os.linux.*] Score: 100 Subject: data Subject: base Score: 100 Subject: dbm % Ditto for spreadsheets for Linux [comp.apps.spreadsheets] Score: 100 Subject: linux % And the converse rule [comp.os.linux.*] Score: 100 Subject: spread Subject: sheet
The most sophisticated scoring system comes in the Gnus news reader that integrates into GNU Emacs and XEmacs.
Anyone needing inspiration for improved features in a news reader should consult the Gnus documentation, and "steal" features from there.
Neat idea: Dynamic Scoring
If I read an article, give the topic/author a little positive score
If I don't read an article, give 'em a little negative score.
If I follow up or reply, give a big positive score, as I clearly found the article interesting
See also:
strn has one;
Brad Templeton created the NewsClip[tm] News Filtering Language.
The Gravity (Windoze) newsreader has a scoring system as do several other Windows-based newsreaders.
Windows developers don't seem to be very good at conforming to NNTP specifications; they are even less skillful at integrating common scoring systems.
Modular Article Filtering defines a protocol by which news readers can request score information from a server daemon.
Unfortunately, Tim Pierce's scheme is currently oriented towards writing small "scoring programs" to evaluate the scores in Perl or TCL. While I have no problem with the use of these languages as it certainly makes the scheme powerful, I don't think it is realistic to expect users (particularly those that may not be programming-literate) to write such code.
Netscape does not offer a filtering system, which is the primary reason I strongly recommend against using it as a newsreader.
This tool takes Yet Another Approach:
You create a "rule file" not dissimilar to the SLRN scheme
You run newsBot:
It reads your .newsrc file, and your "rules" file
It heads over to your NNTP server
It determines which articles should be discarded based on the rules
It updates your .newsrc file, marking "undesirable" articles as being already read.
Scoring newsreaders all have some sort of "score database," and I think it important that a score server have some interface that allows automated insertion/modification of rules. This should involve the following major features:
Commands in the protocol for inserting score rules into "the score database."
Utilities that can read score files in the native formats used by various news readers and insert equivalent score rules into "the score database."
Utilities to extract score rules in formats compatible with sundry news readers.
Major features that can come out of the use of a common rule database include:
Efficient handling of large rule databases
The scoring systems that I have used tend to significantly degrade in performance as the number of rules increases. For instance, in slrn, each rule consumes memory, and when I had about 5000 rules in my score file, loading and unloading the score "database" took vastly increasing amounts of time, and resulted in the slrn process consuming vast amounts of memory.
A more systematic "rule manager" could be designed to be memory efficient, perhaps storing rules in disk-based DBM files so as to minimize memory usage.
Rule Sharing
I've got a list of scoring rules for filtering out "SPAM" that have proved highly effective. (Just suppose.) Wouldn't it be nice to be able to import my antispam rules into your database? This suggests that it's necessary to have date/time/"IMPORT ID" information in the database so as to be able to purge/reload groups of related scoring rules.
In addition, this allows sharing of rules within a network domain.
Automatic query optimization
Score rules typically have multiple components, e.g. Here is an SLRN rule:
Score: -50 # Score value From: joeblow@ickysite.com Newsgroup: comp.os.linux.networking
This rule indicates that messages from joeblow and posted to the Linux networking newsgroup should have a score of -50 attached to them.If we're applying rules to the can.politics newsgroup, it is somewhat unlikely that this rule will apply.
The two "subrules" can probably be applied in an optimal order that will allow us to determine most quickly whether or not the aggregate rule applies.
For instance, if Joe posts a lot to other newsgroups, it would make sense to apply the Newsgroup: rule first, as it will more quickly eliminate the rule when it does not apply.
On the other hand, if Joe is extremely prolific in this newsgroup, and posts relatively little elsewhere, it would be sensible to try to match the "From:" rule first.
What the world needs is an RFC describing a standardized file format for news scoring, even if only to standardize "rule interchange."
adcomplain is a shell script through which you can redirect "offending" news and mail messages.
The script has the ability (in some cases) to search for the "real" identity of the offender, as well as a best guess of someone to contact at their service provider.
I installed the script as /home/cbbrowne/bin/spam. Character-interfaced news readers can typically pass messages to spam by typing: |spam
I have assisted in the removal of a number of "spammers'" ISP accounts by virtue of the use of this utility.
I have experimented with passing my news through the Ifile system. "Spam" gets quite accurately dumped to my MH Spam folder; the sexually oriented material even more successfully flows to Spam/phonesex for later purging.
Beyond that, I defined "virtual newsgroups" relating to topics of interest. (And created some for "trash" to flow into.)
Material looking like Linux advocacy tended to flow to Linux/Advocacy, regardless of where it was originally posted.
Material on Linux hardware gets automatically (and fairly accurately) split up into:
Linux/Hardware/CPU
Linux/Hardware/CDROM (mostly junked)
Linux/Hardware/Cameras (digital cameras)
Linux/Hardware/Disk
Linux/Hardware/Ethernet (mostly junked)
Linux/Hardware/Modems (mostly junked) ...
Linux/Hardware/SCSI
Linux/Hardware/Tape (some keepers)
Linux/Hardware/Video (mostly junked)
This allowed me to fairly quickly target useful hardware information for reading.
Unfortunately, the filtering process still dropped a lot of garbage into my "mail feed," and required that I go through and delete a whopping lot of messages once they're read. It works, but needs some work to make it of "production" quality.
There needs to be some sort of "scoring" ability, with "score thresholds" to indicate that messages can be discarded. I can't think of a decent interface to feed in information to this effect.