Christopher B. Browne's Home Page
cbbrowne@acm.org

4. Internet: News - NNTP

4.1. NNTP

4.2. NNTP News Servers

In order to maximize the usage of my 28.8K modem, and minimize the busy signals people may get in trying to call me, I no longer read news while online. (At least, it's a complete coincidence when this happens, rather than the rule.)

I no longer use an NNTP server on my local host (I used to use leafnode ), but rather use slrn's slrnpull utility that builds a news spool hierarchy. It provides somewhat better performance than Leafnode for news "pulling," retrieval, and expiration, as well as allowing me to expire news at different rates on a group-by-group basis.

I used to (5)slurp news from my ISP during the day while at work using the Leafnode package. Leafnode is an NNTP server optimized for use with an individual's news feed. If you have a site with only a few users, who require a limited news feed, then this is a good alternative to the far-more-complex-to-configure full-featured NNTP server such as INN or CNews.

4.3. News Clients

4.4. SLRN

My favored news reader has often been SLRN. The fundamental reason is that it works well with both NNTP news feeds accessed at an ISP as well as providing very good performance working on local news feeds. It is a full-featured news reader that includes a C-like (and somewhat FORTH-like) programming language for doing customizations.

Its user interface and various aspects of its functionality bear a marked resemblance to that of the Emacs-based news reader, Gnus. In the past, I have used such luminary news readers as the venerable readnews, rn, trn, and strn.

As of May 1999, it has received the Good Net-Keeping Seal of Approval (GNKSA), thus establishing it as one of the few well-behaved, standards-compliant news readers available.

It makes extensive use of the S-LANG library, which notably provides:

SLRN can be readily extended using SLANG scripts, and its own C-based source code is "clean" enough that many customizations can be readily added at that level too.

4.4.2. My .slrn.sl file

SLRN permits you to create "macro" programs in a C-like language called S-LANG.

Here's my .slrn.sl file. The major working feature is that I use the O and M keys to archive articles into appropriate MH mailboxes.

4.4.3. Reformat Score Files

The Perl script, shsc is used to shorten and otherwise clean up slrn score files, doing the following:

  • It deletes comment lines.

  • It doesn't delete those comment lines that start with three % signs. (This allows you to have comment lines that are treated as being "important" that will be kept.)

  • It sorts the scoring rules into order based on the associated newsgroup and then the lexicographic ordering of the concatenation of the rule parameters.

  • It folds rules that match one another together. For instance, if we have the following two score rules:

    [*] Score: 5 From: cbbrowne@io.org Score: 15 From: cbbrowne@io.org

    This will get folded into:

    [*] Score: 20 From: cbbrowne@io.org

  • Since scores are arranged in order of newsgroup, most of the newsgroup ([some.news.group]) lines can be omitted, shortening the score file further.

  • Strips excess whitespace.

At some point in the future, it should be modified as follows:

  • Delete rules that have expired

  • Delete rules that are likely to be irrelevant. For instance, Reference: rules and rules with very small scores are more likely than the average to be irrelevant. With some of the enhancements described below, the score file can potentially grow very large despite "folding" if no pruning of irrelevant rules is done.

    I have since produced a version called weed that eliminates scores that do not exceed 50.

  • Convert SLRN rules into/from some other news scoring format(s)

4.4.4. Scoring Enhancements

I have, at different points in time, made modifications to slrn to make it score more flexibly.

In particular, this involves marking articles as "deleted" and as "favored" at score levels other than <0 and >0. I've set my levels at <-50 and >50, respectively. At one point, this required some minor patches to SLRN. As John Davis is a pretty good guy, equivalent changes have now been included in recent versions of SLRN.

This would be particularly useful in conjunction with a "Dynamic scoring" feature which I implemented for version 0.8.8.4.

4.4.5. Dynamic/Automatic Scoring

The general idea (credit for inspiration going to the authors of Gnus) is that I want to have the system attach scores to subjects/authors simply on the consideration of whether or not I actually read the articles.

For example:

  • If I look at the subject/author, and delete the article before reading it, this indicates that similar articles are likely to be "unfavored" and can be given negative scores.

  • If, on the other hand, I respond to an article (reply/followup), this is a good indication that the subject/author is of interest, and should be given a sizable positive score.

Every time an interesting action is performed on an article (read, respond, delete...), some set of score rules are added to the score database. In my 0.8.8.4 implementation, I added the following new variables:

set auto_local_sub_del -3 set auto_local_sub_reply 150 set auto_local_sub_read 15 set auto_local_from_del -2 set auto_local_from_reply 100 set auto_local_from_read 15 set auto_local_ref_del -1 set auto_local_ref_reply 0 set auto_local_ref_read 0 set auto_global_sub_del -2 set auto_global_sub_reply 50 set auto_global_sub_read 10 set auto_global_from_del -2 set auto_global_from_reply 50 set auto_global_from_read 10 set auto_global_ref_del -10 set auto_global_ref_reply 100 set auto_global_ref_read 3

For example, set auto_global_from_del -2 means that every time I mark an article as deleted, a rule is added giving the sender a score of -2 for all newsgroups.

On the other hand, set auto_local_ref_reply 100 indicates that when I reply/followup an article, a fairly high positive score is associated in the local newsgroup with other articles referencing the original article.

It unfortunately appears that despite the presence of my score file shortener, SLSC, score files that grow very large. (Thousands of rules; hundreds of K long.) Which has the unfortunate effect of horrendous memory consumption.

Solutions to this would include:

  • Collect statistics on when scoring rules are actually activated. This allows obsolete scoring rules to expire over time.

    Unfortunately, modifying SLRN to collect these statistics proves to be rather difficult. This is a quite surprising result, since many other modifications have proven straightforward.

  • Place the scoring information in a nonrelational database that could have a smaller RAM footprint.

  • Handle scoring via a network connection to a score server daemon which could do all of the above, and potentially more.

    This would have the massive advantage of allowing multiple newsreaders to use common scoring databases. It is rather annoying that every newsreader on the planet seems to implement its own kill/score file format that is not-quite-compatible with those of any other newsreader.

4.4.6. Really, Really Really Dynamic Scoring...

I have, at some points in time, attached slrn to the ifile message redirection system. This task is highly eased by the fact that SLRN's slrnpull system uses a filing scheme compatible with MH.

In applying this to slrn, I would "prime the pump" by taking the dictionary of parameters I have gotten from processing mail to date using Ifile , and start dumping in news messages as if they were mail. Since the spool systems are interoperable, mail and news are treated virtually identically, which is perfect for our purposes.

4.4.7. Required Modifications

  • Inject news thru ifilter

    To process news, I would do something like:

    % find -type f -exec "cat {} | /usr/local/src/Ifile-3.0/ifilter; rm {}"; \ /home/cbbrowne/News/slrnpull/news

    Messages would be classified to various folders which, it is interesting to note, will not necessarily relate at all to the original newsgroup(s) messages were directed to.

    • It is quite likely that spam-like messages would head to the Spam folder;

    • slrn postings would hopefully classify themselves into the Slrn folder;

    • Linux postings relating to the DEC Alpha will probably be further categorized into relevant folders on UDB systems, the EM86 emulator, Kernel-related stuff, ...

    • And so forth...

    After redirecting news to appropriate folders (probably using EXMH, which has a very nice user interface for doing that sort of thing), statistics would quickly build into place to direct news into news folders that are "the most relevant."

    Ifile will quickly learn which "folders" are the most appropriate places for new messages to go. Which in effect handles both scoring and the (often requested) virtual newsgroups without the need to actually implement additional SLRN functionality to that end.

  • Spool Modifications

    The news spool would of course need to be modified so as to have slrn access the ifiltered folders as a news spool.

    This merely requires appropriate directory linking.

  • News Expiry

    Varying policies on a group-by-group basis would clearly be preferable; the Spam newsgroup might ultimately be an expire-before-reading newsgroup. Some newsgroups might turn effectively into "archives" that never expire.

    There probably should be a "Trash" group where news can be dumped that simply indicates that it has expired without causing the filter statistics to be updated. This may simply involve adding some SLANG macros...

  • User Interface

    There needs to be a quick and dirty interface that allows the user to tag a bunch of articles and refile them to another newsgroup (thus updating the dictionary of statistics). While SLRN is not too likely to have as extensive an interface for this purpose as does EXMH, it's certainly necessary to have some functionality withinin SLRN...

    This is a job for... Super-SLANG!

  • Optimize Ifile Done. Ifilter is now quite fast; about 0.5s per message with a 800K "dictionary." knowledge_base.perl takes on the order of 45 minutes to run, but can do so at slack times, and does not need to run every day.

4.5. Gnus

Gnus is a news reader that runs inside the Emacs text editor (both the GNU Emacs or XEmacs forms), and is written largely in LISP. It provides vastly more highly customizable functionality than is possible with other news readers, and knows how to get at news in a wide variety of forms.

4.6. Other News Stuff

Some matters of style; there are good news readers, and bad news readers; good approaches to news reader development, and bad approaches. It is a wide generalization (but usually true) that developers of PC-based news readers build software that doesn't conform very well to the NNTP standards.

4.7. The Minimalist News Interface

% telnet news.host.com 119

Trying 144.9.158.81... Connected to news.host.com Escape character is '^]'. 200 news.host.com Netscape-Collabra/3.52 17222 NNRP ready (posting ok).

group comp.emacs

211 56 8188 8243 comp.emacs

listgroup

211 Article list follows 8188 8189 8190 8191 8192 ... 8239 8240 8241 8242 8243 .

article 8235

220 8235 <3A2E8D25.515759F6@is.elta.co.il> article Path: news.host.com!jerry.sabre.com!feed2.news.psi.net!tokyonet.ad.jp!newsfeed.rim.or.jp!newsfeed.media.kyoto-u.ac.jp!fu-berlin.de!uni-berlin.de!213.8.216.205!not-for-mail From: Eli Zaretskii <eliz@is.elta.co.il> Newsgroups: comp.emacs Subject: Re: beta tester [was Re: where can I get Emacs 21?] Date: Wed, 06 Dec 2000 21:01:57 +0200 Lines: 9 Message-ID: <3A2E8D25.515759F6@is.elta.co.il> References: <m3pujvym19.fsf@localhost.localdomain> <3A159030.AFBB6C5E@is.elta.co.il> <m3lmuis51r.fsf@sebold.lcms.org> <3A163D45.C6D5E331@is.elta.co.il> <t1g8r3ct3j4obe@corp.supernews.com> <3PZR5.31$9t1.17285@nnrp.gol.com> <Pine.OSF.4.21.0011191916100.25522-100000@student1.physics.umd.edu> <86zoiv71bb.fsf@wojo.dulug.duke.edu> <877l5jdc8g.fsf@mutsaers.com> <87n1efuwip.fsf@turnbull.sk.tsukuba.ac.jp> <87y9xtmskt.fsf@mutsaers.com> NNTP-Posting-Host: 213.8.216.205 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: fu-berlin.de 976129316 1700350 213.8.216.205 (16 [61365]) X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: en Peter Mutsaers wrote: > > Indeed. I don't like toolbars, buttons etc, so I don't use the extra > features in xemacs (I always disabled such things when using/trying > xemacs). So the extra's of xemacs I don't need (I can only hope that > version 21 of GNU emacs won't be spoiled by such candy). Emacs 21 supports the toolbar, but it can be disabled, just like the menu bar, with a single line in your .emacs. Is that good enough? .

quit

205 Connection closed by foreign host. %

4.8. RSS Aggregators

4.9. Web Search Engines

Most of these are now defunct...

4.10. Net Kooks

The Internet has permitted some unusual characters to "spew" information.

"The Right Reverend Colin James III" has been known to stir controversy on Usenet. It also appears that he occasionally takes outside action, contacting the organizations through which "disagreeable souls" access the Internet and suggesting that the employee/student/customer may be abusing their Internet access.

That may understate things somewhat; various people have created web pages documenting CJIII's actions in greater detail...

Google
Contact me at cbbrowne@acm.org