Information Overload on the Internet – How Much Can We Handle?
Posted March 22, 2010on:
The amount of information on the Internet today is staggering. We are reaching the point where it is too large to be effectively searched, filed, indexed, briefed, organized, or numbered. What can we do? Here is an excellent article on “Internet Overload” that was written by a leading authority on the subject – Andrew Kantor.
The Internet, as we all know, removed previous barriers to publication: the money required to buy a printing press or a transmitter or what have you. And that’s good; more voices are heard. But those barriers served a purpose that is only now been seen. They kept the signal-to-noise ratio low and kept the amount of information out there manageable.
But the amount of stuff on the Internet is of a magnitude larger than any previous collection of any sort. There is too much information to manage. But we still try.
As the Internet grew, better ways of organizing the information on it emerged. In the early days, you had to know exactly what you wanted and where it was, and go directly there. Word of mouth was the name of the game. Then came Gopher, which allowed people with servers to organize their text documents within nested menus.
The Internet Today
Today’s web is so much bigger that it’s impossible to organize it by category in a meaningful way. One of two things happens: You end up either with broad categories with too many items to be useful. That is why search tools like AltaVista and Google became so important. Browsing couldn’t cut it because predefined categories couldn’t cut it — should Jeep be under “Cars/Chrysler” or “Military equipment/historical”? Answer: It depends.
Search tools dispense with categories and let users define their needs ad hoc. Everything’s in a pool, and the keywords you enter narrow it down.
But even they have become less and less useful because there’s too much stuff to search. Unless you narrow your search down with a long list of carefully chosen search terms, you’ll end up with hundreds, thousands, or even millions of results.
Think about it — there are now search tools that aggregate other search tools, taking results from several search engines and try to find the most meaningful of them. “Unwieldy” doesn’t begin to describe it.
Playing Tag Doesn’t Work Anymore
One of the best ways to deal with the massive amount of information on the Web is something that’s only really possible with the power of computers: tagging.
Tagging is simply associating words, keywords, with something, in this case web pages. One of the big guns in this space is del.icio.us.
If you’re not familiar with it, it works like this. When you find a web page you want to remember or bookmark, you add it to your del.icio.us account and can give it a list of keywords.
So instead of rigid, predefined categories, you have flexible ones defined by all the users; it’s called a “folksonomy.” With it, a web page about the WWII classic US Army Jeep might end up tagged “military,” while one about DaimlerChrysler’s Jeep Liberty might be tagged simply “car.”
Del.icio.us leverages the power of users to define and clarify categories, and — something you can’t as easily do with books in a library — to put web pages in as many categories as they like. When the information source gets too big, tagging replaces sorting as the best way to organize it. Structures can’t support something that large.
Even Tagging Can’t Handle the Internet
First of all, a tag-based organizing system is hard to browse; imagine a library where all the books are sorted randomly.
There are other problems – “polysemy” – the same word but with different meanings, e.g., apple or window.
Then there’s the opposite — different words for the same thing. What if the best article on television has been tagged “tele” by a Brit, but you search on “TV”? Or tagged as “cat” and you search on “cats”?
When you have a lot of people doing the tagging, of course, the aggregate wins out and while you’ve have plenty of people using “TV” and “tele,” the majority will (hopefully) use “television.” But the majority’s tags will tend to be broader because there’s more overlap in a broad space than a narrow one — that is, there are more people who will think of something as a “dog,” and fewer who will think of it as a “Bulgarian Shepherd Dog.”
Therefore, you are going to have a lot of things tagged “dog” (294 million when Kantor wrote this). But the net is so big that even the narrow category is overloaded. There are 219,000 Google hits on “Bulgarian Shepherd Dog.”
The Web is just too big for any current organization scheme to handle. There are only so many meaningful tags.
So where does it lead? Back where we started — to categories and organization, but in a Balkanized way. It leads to Wikipedia. That site is so popular because it serves as a convenient central repository for, well, everything. It makes the Web a manageable organism, but still one that everyone can contribute to.
The web will continue to grow, but it will also splinter more and more often as it bursts its britches.
Andrew Kantor is a technology writer who covers technology for the Roanoke Times. You can read more of his work on his website.