Needle in the Haystack – Information Overloading 2.0

Posted on April 19, 2007 by peter

Do you also have the feeling that you are totally drowning under the unbelievable amount of information that is emited by the Web today? (and by other media as well, which emphasizes this greatly, but I would like to focus solely on the Web aspect in this article). I feel more and more frustrated day by day, trying to stay on top of my ever-growing heap of unopened e-mails, undread blog entries, unchecked news sites etc. with a constant fear that though I spend a fair amount of time to consume and process all the information pouring in, I am still missing something very important all the time.

The “problem” is that there are way too many outstanding blogs, aggregators, social new sites, bookmarking service popular links and other sources of information which you “just can not miss”. I fear I am definitely losing the battle – there are more and more information sources, but no new, more effective methods (at least I don’t know about them) to handle them, so I guess it’s pretty clear that as time is progressing, more and more info will fall through the cracks (or spending more and more time will be needed to prevent this).

Since there is no way to stop the exponential growth of information (and if there would be, I doubt anybody would want to utilize it – this is just not the way this problem should be approached), we have to influence the other factor: find more effective means of locating, sorting, bookmarking, processing and otherwise handling the most important data.

It is interesting to observe that at the moment, services with this intention are not really receiving as much attention as they should – provided that the above reasoning is sound and thus there is a need for more effective handling of existing information. Google is a trivial example of this: it has loads of interesting tricks to refine, specify and narrow your search (like for example the synonym operator, ~, or other advanced options) – yet I bet 2 hours of my most precious blog-reading time that most of us can not even tell when did we use advanced search for the last time (besides a few trivial ones entered to the search box, like site:.rubyrailways.com). In most of the cases I never check out more than 2-3 result pages (and just the first page in 90% of the time) – which is interesting, given that I am doing 99% of my searches on google!
In my opinion, exactly the opposite is true: Sites like twitter or tumblelog are immensely popular, flooding you with even more and more information, all the time, every minute, as fast as possible etc. You did not have enough time to read blogs? No problem, here are tumblelogs and twitter messages, which will help you by shooting even more data right into your face much more frequently than ever. Welcome to information overloading 2.0.

Fortunately there is hope on the horizon: some sites are striving to help the situation by providing interesting view on the data, narrowing down the information to a specific niche, or aggregating and presenting it in a way so that you do not have hand-pick it from an enormous set of everything-in-one-bag infosoup. I will try to describe a few of them which I have found interesting recently.

Tools utilizing visual representation of data – People are visual beings. In most of the cases, a few good, to-the-point pictures or diagrams can tell much more than boring pages of text. Therefore it is quite intuitive that visual representation of data (typically result of search engine queries) could help to navigate, refine and finally locate relevant results compared to text-only pages.

My current favorite in this category is quintura. Besides working as a normal yahoo search, quintura does a lot of other interesting things: it finds related tags to your query and displays them as a tag cloud. You can further refine the search results or navigate to documents found by any of the related tags. Hovering over the related tags displays the related tag for that tag. For example, searching for web scraping, and hovering over the ‘ruby’ related tag, ‘scrubyt’ is also displayed – it would definitely take more time to find scrubyt on google, even by using the search term combination ‘web scraping ruby’ – so the functionality offers more than just a fancy view, it actually speeds up and makes searching faster and more effective.

Am I using quintura regularly? Nope. Given that I have just stated a few sentences ago that it can speed up and make searching faster and more effective’ this is strange – but for some reason, if I am after something, I am trying to find it on google.com. This is rather irrational, don’t you think so?
Sites concentrating on a specific niche – I feel that (otherwise great) sites like digg are just too overcrowded for me: with over 5000 submissions a day in a lot of diverse categories it simply takes too much time to read even just the front page stories. I am mainly interested in technology and development related articles – and while a lot of digg proponents are arguing that there are both technology and programming categories on digg, they are still too ‘mainstream’ for my taste and rarely catering to a ardcore developer/hacker in my opinion.
Fortunately dzone and tektag are here to rescue the situation!

The guys over at dzone are really cranking all the time to bring a great digg-like site for developers that helps you to stay on top of the current development and technology trends. The community (which is crucial in the case of such a site of course) is really nice and helpful, and in my opinion the site owners have found (and are consantly fine-tuning) the right magic formula to keep the site from being overloaded with redundant information but still delivering the most relevant news and stuff. Currently, dzone is my no 1. source of developer and tech news on the web.

In my opinion, tektag did not reach the maturity level of dzone yet (I think they are currently in public beta), but once this will happen, I bet it would be a very important and relevant source of information for developers, too. To put it simple, tektag is to del.icio.us what dzone is to digg. Why is this so great? If you need to bookmark something, you should just use del.icio.us, right? Wrong – at least if you intend to use del.icio.us in any other way than store your personal bookmarks. The problem with del.icio.us again is that people are using to bookmark just anything with it – therefore it is virtually impossible to track the movers and shakers in a narrow topic (like programming). Visiting del.icio.us/popular will show you what’s being bookmarked the most overall, not inside your category of interest (of course I know there are categrories like del.icio.us/popular/programming, but these still do not solve the situation fully by far).
Tektag has the potential to solve this situation by adding development-specific features and tweaks, but most importantly by the fact that only developer articles will be saved here and thus interpreting the data will me much more easy since the input won’t be cluttered with an enormous amount of information from arbitrary topics. In my opinion the only question of their succes is: can they build the critical user mass?
Semantic search – if you hear the word ‘search engine’ most probably google or one of it’s competitors (yahoo, msn) springs to your mind, and you are right – for the absolute majority of the searches, we are using these sites. However, they are not really that powerful in letting you express what are you searching for exactly (and of course, stemming from this fact, actually bring you the right results) because they are not trying to understand the documents on the Web: they just crawl and index them to be searchable with phrases they contain.
Since the above mentioned sites are still the absolute market leaders in search, It’s clear the keyword based indexing is still good enough(tm) – until somebody will show that there is a more sophisticated way of searching, by trying to apply natural language processing, ontology extraction and other semantic techniques to to actually understand the documents, and deliver usable results with these techniques.

Spock, an upcoming people search engine is a great example of this principle in action. Spock’s goal is to crawl the whole web and extract information about people – which is far from trivial – since to do this properly, their spiders have to be smart enough to understand human language as much as possible (A simple example: think of a birth date, e.g. 07/06/05 – is 07 denoting a day (meaning the 7th day in the month) or a year (the year 2007)? There are hundreds, maybe thousands of date formats used on the Web – and there are far more complicated problems to solve than this).
OK solving complex problems or not, what’s so cool about a people search engine? After all you can use ye good olde google as for everything else. Tim O’Reilly has an excellent example against this approach: on google, it’s trivial to find Eric Schmidt, google’s CEO – however it’s much harder to find the other 44 Eric Schmidts returned by spock. It’s not that google does not find them – but to actually locate them in as much as approximately 4,500,000 returned documents (as opposed to spock’s 45) is nearly impossible.
Spock is probably the best example in this article to demonstrate how a service should bring you all the information you need – and not even a bit more!

If these services are so promising and they help you to figt the information overloading, thus helping you to find desired information easier (so that you will have more time to read other blogs :-)), why they are less popular by magnitudes than the ones flooding you all the time? Why do not people use as simple things as advanced google search to fight information overloading? Is information overloading a bad thing at all (since it seems the sites generating the most information with the fastest pace are the most popular)? I can’t really answer these questions at the moment, but even if I could, I have to run now to read some interesting (tumble|b)logs. What!? 20 twitter messages received? Ok, seriously gotta go now…

27 thoughts on “Needle in the Haystack – Information Overloading 2.0”

Tim on April 19, 2007 at 9:18 am said:

Thanks very much for covering TekTag. We generally agree with your assessment, and are working hard on some new features that we hope will make TekTag even easier to use. We’ve been chasing down a few annoying bugs that should soon be fixed, adding some basic things (popular, tektag.com/tagname navigation, comments), and will be adding some better search functionality shortly. After that, we’re thinking about some specific technical tools and networking features to help people find answers and experts. Don’t want to tip our hand, but you get the idea.
Ben on April 19, 2007 at 10:02 am said:

I agree with you on the general information overload feeling. I don’t have any time for sites like Twitter myself, it’s about finding the quality content rather than quantity (which seems to be the problem with most blogs).

Personally, I subscribe to over 300 RSS feeds via (my own Rails application) http://www.trawlr.com and use tags and the “favourite” feeds feature to ensure I keep up-to-date depending upon available time. If I only have a few minutes I’ll quickly skim through important feeds; if I have plenty of time I can just go through the entire list (river of news style view) until I get bored!

As for the alternative search interface you noted, a Google search for “web scraping” ruby returns scRUBYt within the first page. So your visual search example isn’t actually any quicker (in fact I’d guess that displaying a Google search result is much quicker than creating the fancy cloud of results).
Pingback: The TekTag Blog
Stephan on April 21, 2007 at 2:51 pm said:

Haystack and information overload indeed: I’ve read (part) of this on another machine. Needless to say that I wanted to come back, when at home, alas when I wanted to come back I didn’t remembered where to look for it – or what it was exactly that caught my interest. Information overload, indeed.

Now I was looking for a Ruby or Rails blog in the internet. A special one. This one. Needle in the haystack, that’s what it is.

Finally I remembered having found this blog on http://www.rubycorner.com – open all updated blogs from the past few days and here I am (again).
Thanks for sharing those links!
Jobmatchbox on June 13, 2007 at 3:48 am said:

You may also want to check out Mylifebrand.com as it is another tool that may help you cope with your information overload. I found it a few days ago and haven’t tested it out yet, but it promises to let you manage all of your social network accounts from one master account.
Pingback: JOBMATCHBOX » Blog Archive » Online Account Redux - Part Two
Mike on September 7, 2007 at 8:50 am said:

If you like Quintura check out Kartoo, Clusty/Vivisimo (no visuals but good clustering), or Grokker (which has a visual portion). Cheers.
google adwords tool on April 24, 2014 at 9:33 am said:

Can I simply say what a comfort to uncover someone who truly understands what they are talking about online.
You certainly know how to bring a problem to light and make it important.
A lot more people should read this and understand this side of your story.
I can’t believe you are not more popular given that you certainly have
the gift.
Graciela on May 8, 2014 at 1:14 pm said:

I will immediately grab your rss feed as I can not in finding your email subscription hyperlink or newsletter service.
Do you’ve any? Kindly allow me recognize in order that I could subscribe.
Thanks.

Feel free to visit my page – web site – Graciela –
best lawn mower on May 12, 2014 at 4:46 pm said:

Hi to every one, it’s truly a pleasant for me to visit this web page,
it consists of useful Information.

my blog post best lawn mower
Goodgame empire cheats on May 29, 2014 at 8:23 pm said:

?ighly energetic article, I liked that bit. Will there be a part 2?

Feel free t? surf to my homepage :: Goodgame empire cheats
Christen on June 2, 2014 at 5:46 pm said:

? couldn’t refrain from commenting. Well written!

My webpa?e: Herve Lege? Sale – Christen,
herve leger zip up dress on June 4, 2014 at 4:21 am said:

I am s?re thi article has touched ?ll the internet
visit?rs, its reallly really pleasant article on buidi?g up new website.

?y pa?v? – herve leger zip up dress
studiogalloway.com on June 4, 2014 at 5:13 am said:

??eeti?gs from ?o? angeles! I’m bored at ork s? I decided to check outt your site on my iphone
during lu?ch break. I love the information you present here and can’t wait to take a ook
wgen I gget home. I’m amazwd att how ast your blog loaded o? mmy phone ..
I’m not even using WIFI, just 3G .. Anyways, awesome site!

Feel free tto surf to my webpage; herve le?er ardell dress – studiogalloway.com,
Elgin zumba fitness on June 4, 2014 at 10:38 am said:

You ought to take part in a contest for one
of the greatest blogs online. I will highly recommend this web site!
Lamborghini Insecta on July 11, 2014 at 10:52 pm said:

Terrific ?ost however , I was wondering if you ?o?ld
write a litte more on this topic? I’d be very grateful if you could elaborate a little bit more.

Thanks!
cute pick up lines guys on July 21, 2014 at 2:44 am said:

Hello all, here every person is sharing these kinds
of knowledge, therefore it’s pleasant to read this website,
and I used to pay a visit this web site daily.
beauty blog on August 4, 2014 at 7:29 pm said:

I used to be able to find good info from your content.

Feel free to surf to my blog … beauty blog
vera and john free spins on September 13, 2014 at 1:05 am said:

Wonderful goods from you, man. I have understand your stuff previous to and
you’re just too fantastic. I actually like what you have acquired
here, really like what you are stating and the way in which you say it.
You make it entertaining and you still care for to keep it smart.

I cant wait to read much more from you. This is actually a great
site.
safety meeting program on October 19, 2014 at 8:16 pm said:

Howdy very cool web site!! Man .. Excellent .. Superb ..
I’ll bookmark your website and take the feeds also?I’m glad to seek out so many useful information right here in the put up,
we want work out more strategies on this regard, thank
you for sharing. . . . . .
nature.parks.Org.il on November 21, 2014 at 12:16 pm said:

Hello everyone, it’s my first pay a visit at this web site, and
article is genuinely fruitful designed for me, keep up posting these
content.
math tutoring Miami on December 30, 2014 at 11:50 am said:

Appreciate this post. Let me try it out.

Feel free to surf tto my web blog … math tutoring Miami
Mayra on February 2, 2015 at 6:50 pm said:

Thank you for another informative blog. The place else could I am getting that kind
of information writtsn in such aan ideal means? I’ve a
project that I’m simply now working on, and I have been at the glance out for such information.
Roosevelt on February 3, 2015 at 9:00 pm said:

Appreciate the recommendation. Let me try it out.

website (Roosevelt)
ben pakulski mi40 sign in on May 30, 2015 at 4:08 am said:

Not only will this give the necessary building blocks to
new muscle mass, but it offers another source of calories.
Improve insulin action- Helps to balance blood sugar levels.
Additionally, you can look into taking supplements that enhance your ability to produce more muscles.
Hubert on November 30, 2015 at 12:26 pm said:

Actually…one issue I am not totally clear on with
Saxo….aren’t they one of the liquidity providers?
tandblekning med blueillume on December 6, 2015 at 5:13 pm said:

For most up-to-date news you have to pay a quick visit world wide web and on internet I found this site as a best web page for latest updates.

Ruby, Rails, Web2.0

Experiences with Ruby and Rails, Web2.0 and other development technologies

Needle in the Haystack – Information Overloading 2.0

27 thoughts on “Needle in the Haystack – Information Overloading 2.0”

Leave a Reply