Christopher B. Browne's Home Page
cbbrowne@acm.org

8. Web Filtering

The dangers of "web surfing" include:

8.1. JunkBuster

JunkBuster also available in RPM form is a rule-based "blocker." It can selectively block:

8.1.1. It can block requests for URLs

For instance, certain servers are exclusively used by those advertisers that put up annoying images that take a long time to load. They also collect statistics on your information accesses.

# Block servers: ad.doubleclick.net ad.infoseek.com ad.linkexchange.com adcount.hollywood.com adforce.imgis.com focalink.com ads.imdb.com ... # Block paths - this probably should be done more # carefully than I show here... .com/Ads/ .com/ad/ .com/ads/ .net/Ads/ .net/ads/ .net/ad/ www.geocities.com/images # Remove annoying GeoCities images www.geocities.com/sponsor # and advertisements...

This makes some web sites somewhat more enjoyable.

This feature can also be useful for blocking access to "inappropriate materials" such as sexually oriented web sites. This requires building up listss of "inappropriate sites," and/or lists of "permissible" sites.

8.1.2. Cookie Blocking

  • Remote web server gives you a cookie

    Consists of key and value

    Key usually identifies the remote site

    Value is commonly a key to identify your "session"

  • Cookie is held by your browser, and the value is returned on demand.

It simplifies creation of online catalogue systems, and was intended for such. Abuses are unfortunately possible.

Blocking sample:

# Allow cookies from really trustworthy organization: www.nytimes.com # Allow cookies to go out, but don't allow them in... <send-user-cookies.com # Cookies restricted to "outgoing" for this path: >wired.com/news # Allow cookies in, but forbid their retrieval <keep-user-cookies.com

8.1.3. Return "Wafers"

JunkBuster can pass back with every cookie request a second cookie (called a "wafer") with a text along the lines of the following:

TO WHOM IT MAY CONCERN: Do not send me any copyrighted information other than the document that I am requesting or any of its necessary components. In particular, do not send me any cookies that are subject to a claim of copyright by anybody. Take notice that I refuse to be bound by any license condition (copyright or otherwise) applying to any cookie.

They may ignore the wafer; it may give their server indigestion.

8.1.4. Limit personal information passed to the server

Your web browser typically passes out such information out as:

  • User agent (e.g. - Lynx 2.7, Mozilla 3.0, Mosaic...)

  • User name

  • Email address

  • Referring URL

Junkbuster can be be configured to specify values for most of these on a global or a by-path basis.

Some Linux users have, after using JunkBuster to change their browser ID to indicate that they were running a Linux version of Microsoft Internet Explorer, received irate email messges from people at Microsoft asking how they obtained IE for Linux.

8.2. Muffin

Muffin

Muffin is a Java-based tool does various sorts of filtering.

8.3. Web Page Filters/Transducers

The general idea is that web pages are passed through a filter program that recognizes various critical patterns either in the URL or in the page, and rewrites recognized pages. This allows you to:

8.4. Other Web Stuff

There are web and news robots that go out to "harvest" email addresses.

This process can be made less useful by putting either "garbage" or "dangerous" addresses on your web pages:

8.5. Web "Helpers"

There are some web services that are using collaborative filtering schemes (not unlike Ifile ) to evaluate information on a web search engine for its relevance to individuals.

The approach is to "vote" for or against web pages that are presented to you; the system then returns pages more like what you liked, and less like what you disliked.

The downsides of most of these services include that:

Note

In late-breaking news, there are some new things:

8.6. Gaming...

There are ongoing attempts to "spam" people by adding links to irritating spam sites to try to get Google to highly rate web sites related to various spam-worthy products.

There is also an effort to widely have links to (reasonably balanced) entries in the Wikipedia about the topics. I'm game to help a little with the latter...

Google
Contact me at cbbrowne@acm.org