Here comes everything: Can technology solve information overload?
Below is the text of an article I wrote for the Australian Financial Review's Bumper Beach Holiday Issue (29 Dec - 3 Jan, pp20-21). There's also a pdf version if you prefer.
(PS. For those of you interested in such things my title is a play on James Joyce's HCE acronym in Finnegan's Wake which sometimes stands for 'Here Comes Everybody' - just the ticket for a financial newspaper headline!)
"Imagine turning on your computer in the morning to find an inbox full of articles, blog posts, audio and video programming gathered from a multitude of sources around the globe. No more sifting through the dross to find the gems, just a neat package tailored for you."
It won’t happen tomorrow but its not science fiction either. Already, people around the world are using an array of mint-fresh technologies to open the door on growing worlds of expertise which help them navigate though the ever-expanding information universe that surrounds us all.
The surprising consequence of these technologies is that its users are relying on people to be their guides more than ever before. Instead, of heading towards a fully-automated solution, today’s online leaders are replicating human communities and networks as the best way to sort the (informational) wheat from the chaff.
More of that later but first we need to look at some of the shortcomings of search engines and the quest for a ‘pure technology’ solution.
The search engine dream come true is something geeks call the semantic web. Through the use of intelligent agents (programs that ‘learn’ your preferences) they hope to turn cyberspace from a huge and confronting storehouse of documents into something that is more human in scale and appeal.
In this vision of the future, intelligent agents would replace the functions of an array of people we have traditionally relied on like journalists, editors, teachers, researchers and, of course, friends and colleagues.
We would no longer have to overcome the frustrating ad hocery of these ‘human’ networks. Instead, we would find the best of what’s available on any subject almost instantaneously.
Cameron Reilly, a Melbourne blogger, podcaster and IT company-owner, would love to have a program that searches the Internet for him and provides daily updates of everything he should read, listen to or view to stay up-to-date in the areas critical to his business and recreational pursuits.
We certainly need help from somewhere because cyberspace is already far too big for any of us to know or explore it through anything as antiquated as ‘surfing the net’.
First up, all that information needs to be sorted in a way that allows us to move through it quickly and productively, finding the best and not wasting time with mountains of junk.
There are basically three ways of organising information on the Internet – hierarchical (as they do in libraries), search and chronological. Currently, the most popular, and most practical, of these methods is, of course, search.
Just about everyone with access to a computer uses search engines, Google users alone now conduct over 55 billion searches annually, because it provides a reasonably easy way of imposing some order on an otherwise chaotic world. Search engines are mind-boggling because of the sheer quantity of material they ‘index’ and the powerful algorithms that lay behind them. Although Google searches over 8 billion pages, its coveted, and secret, algorithms are able to find many accurate matches to your search queries and it finds them fast.
So ubiquitous have search engines become that it is now possible to talk about a “Google generation” of people who expect to find the information they need, when they need it, by simply typing in the right search terms.
Not everyone, however, is sanguine about this growing reliance on search engines. One prominent software developer, Nick Bradbury CEO of FeedDemon recently asked (about Google): “Are we really just building the next version of TV, one even more powerful because it knows your name and shopping habits? More to the point, are we simply creating a potent tool for controlling the next generation of mass-market sheep?”
This is a somewhat surprising ‘geek’ perspective because it is more usual to criticise old media as encouraging passive, uncritical consumption of information whereas anything online is championed as active, independent and so on.
Yet a closer look at search engines does point to some reasons for concern, and the most important among these is the question of quality. Search engines rely on a pretty crude approximation for quality. Essentially, they use the links between sites to estimate the popularity and authority of those sites when they rank the matches to your search terms.
The commercial importance of this practice has not been lost on the legions of sharp operators that infest the Internet.
As retail empire-builder Gerry Harvey has said: when someone wants to buy a television, for instance, the first step they take is to think of three shops, on average, to visit. If your shop is not on that list you are as good as dead.
The same thing works on-line, the higher up the rankings your product or service goes the more hits you will get and the more eventual sales.
So people go to a lot of effort to basically ‘game’ search engines. One of the more infamous techniques is called ‘google bombing’. With bombing, a number of websites link to a main site and by using the same piece of text they can force the main site up the google rankings. Bombing can be done very effectively by even a small group of sites.
Besides gaming there is also the now well-established search engine optimisation industry (SEO) which offers people (“sure-fire”) techniques and tips for pushing their sites up those rankings. SEO devotees, for instance, will ensure that their site’s ‘key words’ feature prominently in any new content they create in an effort to own or dominate a particular idea or area of interest.
As one prominent SEO company, ‘bigmouthmedia’, says: “There are over 3 billion web sites fighting tooth and claw for the best search position. There are billions of other pages not even listed in search engines. Let's hope they like the obscurity.”
Quality is an ongoing issue for search engine users and it raises the deeper question of whether technological tools, on their own, can ever replace our traditional reliance on human networks for making appropriate judgments in this vexed area.
In recent years, the popularity of blogging has created a way of finding quality information that resembles peer-reviewing. Bloggers create interest-based communities that assess, critique and recommend (or otherwise) relevant information and sources across the web.
The most famous recent example of a blogging community in action was the ‘Rathergate’ incident, where bloggers swapped and aggregated information online, accumulating proof that the documents leaked to CBS were faked, and it all happened within a few hours.
Political punditry and fact-checking are, nevertheless, only a tiny part of what happens in the blogosphere. If you don’t think blogging covers just about anything a human being could be interested in, then try this 60 second test. Enter ‘blog OR weblog
’ into google and just see how many results it generates.
I did it for gardening and found a great site called “Horticultural” which is the work of Jane Perrone, deputy editor - news and politics at Guardian Unlimited.
Jane blogs because she gets “help and advice from gardeners all over the place, and I feel part of something bigger. When I talk to most people about the joys of mulching, or try to explain why I put comfrey leaves into a bucket every few days, they look at me with mute uncomprehension”.
Finding ‘help and advice’ through blogging networks is not only effective but it feels right because social relationships in the blogosphere approximate what happens offline.
According to two American political scientists (in a July 2004 conference paper), Daniel W. Drezner and Henry Farrell, the pattern of links across blogs is similar to what happens in our ‘real’ lives. Most people have a few friends and acquaintances; but a few people have large numbers of friends and acquaintances.
These ‘popular’ people (and blogs) are the active nodes that weld broader networks together. These networks replace, or partially displace, searching because just about anything of real value in the relevant subject area which is loaded onto the web will get reviewed on the blog networks and usually sooner rather than later.
Staying in regular contact with blog networks, however, would be awkward and time-consuming without Rich Site Syndication (RSS), undoubtedly the most important push (something sent without being initiated by the recipient's request) technology to emerge since email. While blogs and RSS are made for each other, many other regularly updated websites (particularly traditional media) also now offer RSS feeds.
A by-product of the RSS phenomenon is podcasting (named after Apple’s amazingly successful MP3 player, the iPod), which involves audio files being sent as enclosures in the feed. With another little program in place these audio files can be loaded automatically on to a docked iPod (or another mp3 device).
Without much fuss, podcasting is now offering some brave pioneers a form of time-shifted radio. Podcasting has really only been going for a few months and already there are over a 1,000 ‘programs’ being fed to eager listeners. Most of the programs are amateur, but recently BBC4 started podcasting Melvyn Bragg’s popular weekly “In Our Time” program. Some major US radio stations are also experimenting with it. We are likely to see many more experiments emerge next year.
Recently we have also seen the emergence of (near) real time searching. Services like Feedster, Technorati and Pubsub continuously scan millions of feed sources. They send the new material in your subject areas to your computer soon after it is published. These feed reading sites, of course, rely on the booming popularity of RSS.
Feed monitoring can change the way you think about the web. I recently wrote something (mildly) critical of Technorati on my blog and its founder / owner David Sifry (based in San Francisco) had posted his response on my site within hours. Dave included the rather tart comment "btw, I found you by keeping a close eye on my Technorati watchist so it certainly provides me some value.”
That’s the sort of immediacy and human interaction that is difficult to find if you rely on search engines. The proponents of feed monitoring services like to speak about a live web, or real-time conversations across the Internet, and the capacity to know what’s being said ‘right now’.
Live feed monitoring also highlights just how much content is being generated these days. PubSub, for instance, regularly clocks well over 1,000 new items every minute.
One of the world’s best-known ‘feedaholics’ is Microsoft’s Robert Scoble, who currently subscribes to about 2,000 feeds (including feeds from over 900 blogs). Each night he spends about 3 to 5 hours ‘reading’ these feeds.
When I recently asked him whether he was getting sick of this regime, Scoble told me he was ‘having a ball’, his only problem is that his feeds still amount to a tiny piece of the action on blogs on any one day.
Scoble, appropriately, was the discussion leader for the Overload session at Bloggercon3, held at Stanford University in November, where a hundred or so bloggers discussed various ways of managing these information flows more effectively. And, particularly, the question of what are the limits of our ability to be across the new information the web generates each day.
Some participants thought intelligent agents might yet provide the solution. Other participants said that they wanted people to be their intelligent agents, and they saw that happening through blogging networks. If you choose your online friends wisely, they suggested, you may not need a lot of them, or a lot of feeds, because the important stuff will always float to prominence.
On the other hand, as one participant said, we may need to adopt a more Zen-like approach to the problem and just accept that we will (shock) miss some good information from time to time. We may need to get off the technology-driven information treadmill and occasionally ‘catch a sunset’.
cool article trevor
i believe my how to manage RSS overload post might help:
http://www.rolandtanglao.com/archives/2004/11/13/my_rss_overload_strategy_that_i_developed_after_the_bloggercon_iii_overload_session
The gist is find the 150 bloggers who are your best filters (and revisit this list once a month at least), add some PubSub and Feedster searches (Technorati is unfortunately still not as good as PubSub and Feedster) and forget about the rest because through triangulation you will find about the good stuff eventually and anyways everything has a URL so unlike email you can find stuff if you miss it!
really, as you say "just accept that we will (shock) miss some good information from time to time"
and if you really want to subscribe to over 150, then i recommend using the auto-expire feature that NetNewsWire (but not Bloglines AFAIK) and alot of other aggregators have
happy new year!
Posted by: Roland Tanglao | 04 January 2005 at 09:58 AM
Interesting take on blogging, Trevor. Thanks for sharing it.
For me it's not about being a human sponge for huge volumes of information out there -- how daunting. It's about finding information that you can't find easily anywhere else -- or insights by people who have an unusual or unique perspective -- WHEN you need it.
Therefore, I am reasonably focused about what I read. And what I read changes with the purpose at hand. For instance, if I am looking for information and perspectives about adoption of the FireFox browser versus Internet Explorer, I will go to the blogs of some of the early adopter techies. But those blogs are not part of my normal routine. I don't overwhelm myself with trying to stay current on all the other tech stuff they write about all the time. Nor do I need to stay glued to Technorati or FeedDemon -- after all, those services are there when the need arises.
As in all things, moderation.
That said -- ounce for ounce, I find out some of the most up-to-date, insightful, specific, usable and reader-friendly information from blogs. I wouldn't go back to pre-blog days for anything.
BTW, this hot-off-the-presses Pew Internet study is interesting, about the explosion in blogs and RSS in America: http://www.pewinternet.org/pdfs/PIP_blogging_data.pdf.
Posted by: Anita Campbell | 04 January 2005 at 12:12 PM
Your point on bloggers as human filters will intrigue my lecturers at uni when I pass it on to them - this is what librarians see themselves as, i.e. between the publishing industry/information providers and the library patrons.
If people know how to find quality information quickly they are probably less interested in what they may have missed in the information soup - the Zen aspect may have always been with us. Reference librarians speak sometimes of knowing when to stop looking, of letting clients decide when the information is good enough instead of overwhelming them with options and suggestions.
The niggling question for me perhaps is how widespread the feedmonitoring trend will become - it may remain something only for those who already know a lot about searching on the Internet.
Interesting stuff!
Genevieve
Posted by: genevieve | 04 January 2005 at 10:57 PM
Trevor,
Nicely done. I believe there is still a whole lot to be seen in the areas of finding the personal content aggregator. Right now all these technologies are more or less in their early stages. I doubt even blogs are mainstream although they are certainly getting there. Once people feel as comfortable blogging or catching podcasts as they feel dialing a number in a cell-phone, we will be ready to talk about what can be done to separate low-quality from high-quality content on a per-person basis.
I wonder about a dimension of search that you didn't cover: the economic incentive behind it. I submit the following statement for debate: As long as there is advertising revenue to be extracted from a search result there will be an incentive to devise better ways of searching. The following extension might apply: as soon as there is a clear advertising revenue attached at aggregating all contents for each individual one or more tools will be developed.
Thoughts?
Jaime
Posted by: Jaime Batiz | 06 January 2005 at 09:57 AM
I agree with you Jaime and I doubt that we are far off someone devising a way of turning a search engine into a push engine covering text, audio and video. We probably need better infrastructure (broadband) first but that is happening too. It might even work on a subscription model. I'd pay for something that worked well - but I'm not sure how accurate it would be and people would have to get confidence in the service before they would pay for it (or advertisers pay for it). So maybe there is a chicken and egg thing here. But I can't see more than a few info junkies spending hours looking for stuff everyday, so someone's bound to come up with a better way. Because most of us do want some form of aggregation, which is a key service provided by traditional media, when you think of it.
Posted by: Trevor Cook | 06 January 2005 at 10:07 AM
Great stuff Trevor. One of the back stories to your piece is the (accurate) assumption that the majority of readers/listeners/viewers of both mass and trade media still don't 'get' blogs.
But the irony is that once you do bite the bullet and start subscribing to RSS feeds it doesn't take long to get overwhelmed, particularly if you're still figuring out your information preferences.
So I reckon the overload issue will become more evident this year as the gap narrows between those who 'don't get it' and those who 'get it,' particularly in Australia. Podcasting trials like Triple J's (http://www.abc.net.au/triplej/hack/podcast/) are a good sign of widespread acceptance. The trick will be convincing people that mucking around with PubSub and Feedster will actually save you time in the long run.
Posted by: Mark Jones | 11 January 2005 at 10:46 AM
Yes this is exactly the space I am in. I think technology just a tool - up to us how we use it.
Posted by: note | 28 March 2006 at 04:40 PM