The dangers of "web surfing" include:
Information collection
There is a surprising amount of information that web servers can collect about your information requests; the search engine providers are trying to find ways to sell this sort of information.
My brother used to run an internal IBM web server; in monitoring logs on his PC, he sometimes would find that people were having problems. He was known to call people to explain how to accomplish what they apparently want to do. They were entirely surprised; they had no idea he can check what they're doing on his server.
Cookies
They can be annoying; they circumvent privacy and sometimes even security.
Advertisements
Those GIFs and JPEGs at the various search engine sites take time to load, fill up your screen, and allow third parties to collect information about your "web surfing."
Sexual, Hate-Oriented, Subversive, and Dangerous Material
Mature people with self-control may merely find it "generally offensive," and avoid it when it comes up.
Others may be less tolerant, particularly when the "surfer" may be a child. Parents and schools may feel it appropriate to apply some form of censorship.
JunkBuster also available in RPM form is a rule-based "blocker." It can selectively block:
URL requests
Cookies
Passing out user information
For instance, certain servers are exclusively used by those advertisers that put up annoying images that take a long time to load. They also collect statistics on your information accesses.
# Block servers: ad.doubleclick.net ad.infoseek.com ad.linkexchange.com adcount.hollywood.com adforce.imgis.com focalink.com ads.imdb.com ... # Block paths - this probably should be done more # carefully than I show here... .com/Ads/ .com/ad/ .com/ads/ .net/Ads/ .net/ads/ .net/ad/ www.geocities.com/images # Remove annoying GeoCities images www.geocities.com/sponsor # and advertisements...
This makes some web sites somewhat more enjoyable.
This feature can also be useful for blocking access to "inappropriate materials" such as sexually oriented web sites. This requires building up listss of "inappropriate sites," and/or lists of "permissible" sites.
Remote web server gives you a cookie
Consists of key and value
Key usually identifies the remote site
Value is commonly a key to identify your "session"
Cookie is held by your browser, and the value is returned on demand.
It simplifies creation of online catalogue systems, and was intended for such. Abuses are unfortunately possible.
Blocking sample:
# Allow cookies from really trustworthy organization: www.nytimes.com # Allow cookies to go out, but don't allow them in... <send-user-cookies.com # Cookies restricted to "outgoing" for this path: >wired.com/news # Allow cookies in, but forbid their retrieval <keep-user-cookies.com
JunkBuster can pass back with every cookie request a second cookie (called a "wafer") with a text along the lines of the following:
TO WHOM IT MAY CONCERN: Do not send me any copyrighted information other than the document that I am requesting or any of its necessary components. In particular, do not send me any cookies that are subject to a claim of copyright by anybody. Take notice that I refuse to be bound by any license condition (copyright or otherwise) applying to any cookie.
They may ignore the wafer; it may give their server indigestion.
Your web browser typically passes out such information out as:
User agent (e.g. - Lynx 2.7, Mozilla 3.0, Mosaic...)
User name
Email address
Referring URL
Junkbuster can be be configured to specify values for most of these on a global or a by-path basis.
Some Linux users have, after using JunkBuster to change their browser ID to indicate that they were running a Linux version of Microsoft Internet Explorer, received irate email messges from people at Microsoft asking how they obtained IE for Linux.
The general idea is that web pages are passed through a filter program that recognizes various critical patterns either in the URL or in the page, and rewrites recognized pages. This allows you to:
Change "framed" pages into unframed pages
Remove advertisements entirely
Replace advertisements with your favorite graphics
WebFilter
The German author did not understand American sensibilities about scatalogical vocabulary when he first named this package. The URL is somewhat hidden as it includes one of the Ten Words You Can't Say on Television: WebFilter
Various filters are available for common web search engines, and can be written using any Unix tool. Samples use SED and Perl.
The researchers at the France's Inria institute have written this similar and probably more powerful tool written in Objective CAML (an OO version of the functional language ML) called V6;
There are web and news robots that go out to "harvest" email addresses.
This process can be made less useful by putting either "garbage" or "dangerous" addresses on your web pages:
This Perl utility builds randomly-constructed addresses that look realistic, but aren't real.
For agencies officially responsible for dealing with mail fraud, such as the US FTC.
This puts addresses onto the spammer's mailing list that they surely don't want to have aware of their activities.
Or, you can always insert addresses of other spammers, so that they start harassing one another...
Include bait from SPAMBAIT in your web pages or news postings, and expect to find spammers spamming other spammers. This is not a horribly inappropriate thing to do; it merely means that you're providing the evil email address harvesters with not-so-valuable catches.
A web CGI tool that is a "poison pot" to trap ill-designed address harvesters into adding enormous quantities of bogus email addresses to their databases.
Freely Available Filtering Systems, Information Filtering Resources
Ad blocking via IP addresses
One might dump known "bad" domain names into one's /etc/hosts file, pointing them to a "black hole" IP address
There are some web services that are using collaborative filtering schemes (not unlike Ifile ) to evaluate information on a web search engine for its relevance to individuals.
The approach is to "vote" for or against web pages that are presented to you; the system then returns pages more like what you liked, and less like what you disliked.
The downsides of most of these services include that:
They make extensive use of cookies and/or JavaScript
They collect information about your browsing tendancies to improve the "targeting" of the advertisements that appear onscreen, so that JunkBuster is really handy...
They tend to be compatible only with Netscrape and Internet Exploder
![]() | In late-breaking news, there are some new things: |
There are ongoing attempts to "spam" people by adding links to irritating spam sites to try to get Google to highly rate web sites related to various spam-worthy products.
There is also an effort to widely have links to (reasonably balanced) entries in the Wikipedia about the topics. I'm game to help a little with the latter...