Random Links from the Web, 21-04-2008

Random Links from the Web, 10-04-2008

  • RubyFlow – Peter Cooper (of RubyInside fame) created “RubyInside’s sister site” – RubyFlow. Hope it takes off!
  • Thinking Sphinx Reborn – I thought a phoenix is the creature with the reborn ability – though that doesn’t mean a sphinx can’t do that! btw. “Sphinx is a very fast search engine that indexes data and provides flexible ways of searching it.”
  • Rails vs Merb – “The conclusion is simple, I recommended that my client go with Merb”.
  • RubyAMP – powerful looking Ruby TextMate bundle.

Compiling Firefox 3 beta 5 on OS X (with JSSH support)

Since I bought my Mac, Safari has been my primary browser of choice. I have been using Firefox sporadically too – you can’t do serious web development without FireBug! Safari is light, zippy and renders wonderfully looking OS X look’n’feel pages – however with the arrival of Firefox 3 I am really pondering to ditch it in favor of FF (still no OS X look in there, tho’ ;-)). Firefox 3 beta 5 is just phenomenal – fast, powerful, and it has all the stuff I am missing from Safari (FireBug, del.icio.us toolbar, DOM Inspector, ton of other extensions) so I am really thinking that I will remain a Safari user on my iPhone only.

However, FF3 is missing some functionality I am depending on – most notably jssh support – DOM inspector was removed too, but can be installed as an add-on, and since beta 4 FireBug is working, so no major hurdles there. However, jssh, as usually was a harder nut to crack.

Fortunately Angrez, FireWatir‘s author pointed me to the right direction – which was in this case compiling Firefox 3 with jssh support! Here’s how:

Sit back, enjoy your coffee and in a few minutes you’ll have your own, new hot FF build with jssh support!

By the way, FireWatir 1.1 hit the streets just today! Grab it and let the testing commence!

Downloading a File Behind http-basic-auth

There is a great number of possibilities to do this in Ruby (just to mention a few: standard net/http, curb, mechanize, rio etc.). I have chosen a semi-standard way (=no need to learn new syntax, but definitely simpler than net/http): open-uri.

require 'open-uri'

open('target.file','w') {|f| f.write open("my.url.com/source.file", :http_basic_authentication=>['ueser', 'pass']) }

I am using Ruby for more than 2 years already but it’s succinctness still keeps me amazed!

Random Links from the Web, 04-04-2008

EuRuKo 2008 – Favorite Quotes

EuRuKo 2008
Quite a few blogs covered EuRuKo talks (for example here, here, here or here) so I am not going to do an n+1-th writeup. Instead I collected a few quotes I found interesting and/or funny. Without further ado, here we go:

Matz: Ruby – Past, Present and Future (Keynote)

  • Chad Fowler wrote a book ‘My Job Went to India’ – and yesterday my luggage went to Oslo!
  • Ruby was a hobby that came from not having a job after end of Japan’s bubble economy
  • I consider myself as no great programmer!
  • Then we have this nasty snake language here… (sampling through different languages starting from COBOL)
  • Python people love to be organized and have one true way. Ruby people don’t care.
  • (after going through a ton of different languages and finally arriving at Ruby) Ruby is not perfect either… but it’s close!
  • We will need to keep Ruby alive when Rails will be gone… Hopefully Ruby will be around in 15 years, but Rails… hmm…khm… never mind.
  • 10 years ago – Ruby? What’s That? Language? See This Cool Java!
  • 5 years ago – Ruby? I’ve Heard of It. But I haven’t Used It YET.
  • 2 years ago – Ruby! I Know! It’s for Rails, Right?

Koichi: Merging YARV is not a goal but a start

  • (Running around in a t-shirt ‘No Ruby No Life’) My version: No Ruby no Job!
  • 1st slide: [some completely different slide than his real topic title] – this is just a joke, the presentation is not on this
  • I don’t speak English so I wrote down everything on the slides… unfortunately most of the slides are in Japanese! (LOL)
  • Please ask questions in Japanese/C/Ruby! … or use slow/short English
  • I have a lot of slides (hm, like 50) of different optimization techniques but it’s just too much so I won’t show you here 😉
  • My PhD thesis is in Japanese (Efficient implementation of Ruby VM), so please learn Japanese if you’d like to read it 🙂
  • NT means not Windows NT but Native Thread
  • Ruby thread and native thread: (shows a brutally complicated slide, fully in Japanese) and I would like to point out this part (clicks a button, and a red frame appears around a portion of fully Japanese text)
  • This is complex so I skip it (after the 10th incomprehensible slide)
  • I can’t program in Ruby!
  • Come to Japan/Enter my Lab (please teach me English) – Unfortunately I can’t employ you because there is not enough money, so please bring your own $
  • (response to a question about YARV’s memory need:) Yes, YARV also needs memory to start the VM! (considers the question answered. When asked for further details:) How much? hmm… you need to measure yourself 😉

Charlie Nutter, Tom Enebo: JRuby – Ready for Action

  • Java != a dirty word!
  • Did you see this demo already? No? Good. Everybody has seen this 100 times in the US so they are really sick of it!
  • See? new is on the right side of the class name!
  • I love swing (after listing 9342923 looooong fuctions of the Button class)
  • this would take 6-7 lines of Java (after adding a listener to a button in 1 lines of Ruby syntax)
  • This is the weirdest error I have ever seen during a demo! I guess Mac OS X is not ready for this stuff yet (after not being able to kill off a process which slowed down his machine)
  • Kill that bird!!!! (after thunderbird jumping around for like 5 minutes after OSX bootup)
  • ColdFusion is not a fine example for anything (Charlie, after a guy proposing CouldFusion as an example of… I don’t remember what but it doesn’t matter)

Disclaimer: Please take the above with a grain of salt – it might seem based on the above sentences (which were taken out of context and possibly squeezed/lost in translation) that Koichi’s topic was to pursue people to learn Japanese (not true, his talk was very interesting and deep) or Matz was making fun of/mocking Python (not true, he was just joking all the time) etc. If you have some more, drop me a comment!

My EuRuKo 2008 Photos

EuRuKo 2008 is over… I have had a really great time, both as an organizer and an attendee, and can’t wait for next year’s conference!

Until that gets sorted out (currently Spain (Madrid) and Poland (Warsaw vs Krakow) are competing) here are some photos we made with Marianna… They were usually made in a hurry and/or dark so don’t expect too much (I guess I should invest in a better objective and flash :-))

You can check out all the (correctly tagged) EuRuKo 2008 photos here.

Please post your photos to flickr or whatever service you are using, and leave a comment here with the address… Cheers!

Problems of Social Bookmarking Today – Part One

Can you imagine the on-line world without del.icio.us, reddit, digg, dzone and other Web2.0 social bookmarking sites? Sure, you can – they were not always around and nobody missed them before they appeared. However, since their debut, I guess no serious geek can exist without them anymore. The functionality and information richness these sites offer is unquestionable – however, there are more and more flaws and problems popping out as people learn to use, monetize, abuse, trick and tweak them. I would like to present my current compilation of woes and worries, sprinkled with a few suggestions on how to handle them.

DISCLAIMER: this is my subjective view on these matters – I am not claiming the things presented here are objectively true – this is just my personal perception.

General Problems

I have read a nice quote recently – unfortunately I can not find it right now. It goes something like this: “Time is nature’s method of preventing things to happen all at once. It does not seem to work lately…”

Though the notion of a social bookmarking site did not even exist when this quote was thought up by someone, it captures the essential problem of these sites very well: too much things are happening all at once, and it is therefore impossible to process the amount of information pouring from everywhere…

  • Information overload – I think this fact is not really a jaw-dropping mind-boggling discovery – but since it is the root of all evil (not just in the context of Web2.0 or social sites, but in general for the whole web today) it deserves to be presented as the first problem in this list. Today it is almost sure that the thing you are looking for is on the Web (whether legally or illegally) – it is a much bigger problem to actually find it! This applies to the social sites as well. A site like digg gets about 5000 article submission every day – and even if you restrict yourself to the front page stories, it is virtually impossible to keep up with them unless you are spending a given (not so short) amount of time every day just with browsing the site. O.K. this is not a Web2.0 or social site problem per se, but a quite hard one to solve nevertheless.

    Proposed solution: I don’t have the foggiest idea 🙂 Basically an amalgam of the solutions presented in the next points…

  • Articles get pushed down quickly – which is inevitable and not even a terrible problem in itself, since this is how it should work – the worse thing is that the good stuff sinks equally fast as the crap – i.e. every new article hitting the front page makes all the others sink by 1 place.

    Proposed solution: The articles could be weighted (+ points for more votes, more reads, more comments etc, points for thumbs down, spam report, complaints etc.) and the articles should sink relatively to each other at any given moment – i.e. the weight should be recalculated dynamically all the time and the hottest article should be the most sticky while the least-voted-for should exchange it’s place with the upcoming, more interesting ones.

  • Good place, wrong time – if you submitted a very interesting article, and the right guys did not see it in the right time, it will inevitably sink and never make it to the front page. It is possible that if you would have submitted it half a day later, it would be noted by the critical mass to make it to the front page – the worst thinkg is that you never even know if this is so.

    Proposed solution: Place a digg/dzone/del.icio.us/whatever button after or before the article – this way, people will have the possibility to vote on your article after reading it, no matter how did they get to your site and when. The article will stay on your site forever – whereas on digg it will be present on a relevant place for just a few hours.

  • Url structure problems – sometimes the same document is represented by various URLs which confuses most of the systems. The most frequent manifestations of this
    problems are: URL with and without www (like http://www.rubyrailways.com and http://rubyrailways.com), change of the URL style (from /?p=4 to /2002/4/5/stuff.html) or redirects, among other things.

    Proposed solution: Decide for an URL scheme and use it forever (generally, /?p=4 is not a recommended style – /2002/4/5/post.html and other semantically meaningful URLs are preferred (see Cool URIs never change), set your web server to turn http://www… to http:// (or the other way around)). The sites could also remedy the situation by not just checking the URL, but also the content of the document (like digg does just before submission).

Tagging

Tagging is a great way of describing the meaning of an item (in our case a document) in a concise and easy to understand way – from a good set of tags you should know immediately what is the article about just by reading them. The idea is not really brand new – scientific papers are using this technology for ages (much like PageRank – long time before PageRank was implemented by the google guys, it was an accepted and commonly used technique to rank scientific papers based on the number of their quoting in other relevant works).

Some sites have predefined, finite set of tags (like dzone) while some allow custom ones (like del.icio.us – usually with suggestions based on the tags of others or by extracting keywords from the article). The problem of a predefined tag set is that you are restricted to use only the tags offered by the site – well this is sometimes good because it gives you some guidelines about what is accepted on the site. There are much more interesting problems with sites that allow custom tags:

  • No commonly accepted, uniform tagging conventions – some of these sites are accepting space separated tags, some quoted ones and some of them do not require or recommend any specific format. This is again the source of confusion, even inside the same system. Consider these examples:

    ruby-on-rails
    ruby on rails
    ruby_on_rails
    "ruby on rails"
    RubyOnRails
    ruby rails
    ruby,rails
    ruby+rails
    RUBY-RAILS
    ror
    ROR
    rails
    programming:rails
    

    and I could come up with tons of other ones. The problem is that all these tags are trying to convey the same information – namely that the article is about ruby on rails. Of course this is absolutely clear to any human being – however, much less so for a machine.

    Proposed solution: It would be beneficial to agree on one accepted tagging convention (even if you can not really force people to use it). The sites could use (even more) heuristics to turn tags with the same meaning ito one. For example if the user has a lots of ruby and rails bookmarks, and tags something with ‘rails’ it is very likely that the meaning of the tag is ‘ruby on rails’ etc.

  • Too much tags and no relations between them – I think everybody has, or at least has seen a large del.icio.us bookmark farm. The problem with the tags at this point is that there is a lot of them, and they are presented in a flat structure, without any relation between them. (O.K., there is tag cloud, but it is more of an eye candy in this sense). With a really lot of tags (say hundreds of them) the whole thing can become really cumbersome.

    Proposed solution: Visualization could help a lot here. Check out this image:


    Clustered Tag Graph

    Example of a Clustered Tag Graph

    I think such a representation would make the whole thing easier, mainly if it would be interactive (i.e. if you’d click the tag ‘ActiveRecord’, the graph will change to show the tags related to ‘ActiveRecord’. The idea is that all of your tags should be clustered (where relevant ones should belong to one cluster – the above image is an example of a toread-ruby cluster) and the big graph should consist of the clusters, with each cluster’s main element highlighted for easy navigation. If you click a cluster, it would zoom in etc.

  • Granularity of tagging – this is a minor issue compared to the others, but I would like to see it nevertheless: it should be possible to mark and tag paragraphs or other smaller portions of the document, not just the whole document itself. Imagine a long tutorial primarily about Ruby metaprogramming. Say there is an exceptionally good paragraph on unit testing, which is about 0.1% of the whole text. Therefore it might be wrong to tag it with ‘unit testing’ since it is not about unit testing – however, I would like to be able to capture the outstanding paragraph.

    Proposed solution: Again, visual representation could help very much here. I would present a thumbnail of the page, big enough to make distinguishing of objects (paragraphs, images, tables) possible, but small enough not to be clumsy. Then the user would have the possibility to visually mark the relevant paragraph (with a pen tool), and tag just that.
    This should result is a bookmark tagged like this:


    Granular tagging

    Example of More Granular Tagging

    On lookup, you will see the relevant lines marked and will be able to orient faster.
    To some people this may look an overkill – however, nobody forces you to use it! If you would like to stick with the good-old-tag-one-document method, it’s up to you – however, if you choose to tag up some documents also like this, you have the possibility.

  • Tagging a lot of things with the same tag is the same as tagging with none – consider that you have 500 items tagged with ‘Ruby’. True, you still don’t have to search the whole Web which is much bigger than 500 documents, but still, it is a real PITA to find something in 500 documents.

    Proposed solution: the clustered tag graph could help to navigate – usually you are not looking for just ‘Ruby’ things but ‘Ruby and testing and web scraping’ for example. Advanced search (coming in vol. 2), where you can specify which tags should be looked up and also what should the document contain could remedy the problem, too.

  • Common ontologies, synonyms, typo corrections – O.K. these might seem to be rocket science compared to the other, simpler missing features – however, I think their correct implementation would mean a great leap for the usability of these systems. Take for example web scraping, my present area of interest. People are tagging documents dealing with web scraping with the following tags: web scraping, screen scraping, web mining, web extraction, data extraction, web data extraction, html extraction, html mining, html scraping, scraping, scrape, extract, html data mining – just from the top of my head. I did not think about them really hard – in fact there are much more.
    It could solve much confusion if all these terms would be represented with a common expression – say ‘web scraping’.

    Proposed solution: this is a really hard nut to crack, stemming from the fact that e.g. screen scraping can mean something different to various people. However, a heuristics could lookup all the articles which are tagged with e.g. web scraping – and find the synonyms going through all the articles. It is not really hard to find out that ‘web scraping’ and ‘ruby’ or ‘subversion’ are not synonyms – however, after scanning enough documents, the link between ‘web scraping’ and ‘html scraping’ or ‘web data mining’ should be found by the system. The synonyms could be also exploited by using the clustered tag graph.

Voting

The idea of voting for articles as a mean to get them on the front page (opposed to editor-monitored, closed systems) seemed to be revolutionary and definitely the right way to rank the articles in a people-centered way from the beginning – after all it is really simple: people vote on stuff that they like and find interesting, which is equal to the fact that the most interesting article gets to the front page. Or is it? Let’s examine this a bit…

  • Back to the good old web 1.0 – when Tim O’Reilly coined the term Web2.0 in 2005, he presented a few examples of typical web1.0 vs web2.0 solutions, for example: Britannica Online vs Wikipedia, mp3.com vs napster etc. I wonder why did not he come up with slashdot (content filtered by editors) vs digg (content voted up by people). At that time everybody was soo euphoric about Web2.0 that no one would question this claim (neither did I that time).

    However, it seems to me that after these sites evolved a bit, basically there is not that much difference between the two: according to this article, Top 100 Digg Users Control 56% of Digg’s HomePage Content. So instead of 10-or-something-like-that professionals, 100-or-something-like-that amateurs decide about the content of digg. So where is that enormous difference after all? Wisdom of crowds? Maybe wisdom of a few hundred people. Because of the algorithms used, if you don’t have too much time to submit or digg or comment or look for articles all the time (read: few hours a day) like these top diggers do, your vote won’t count too much anyway. Digg (and I read that also reddit, and possibly sooner or lather this fate awaits more sites (?)) became a place where “Everyone is equal, but some are more equal than others…”.

    Proposed solution: None. I guess I will be attacked by a horde of web2.0-IloveDigg fanatics claiming that this is absolutely untrue and since I have no real proofs of this point (and don’t have time/tools tom make one) I am not going to argue here.

  • Too easy or too hard to get to the front page – The consequence of some of the above points (Information overload, Good place, wrong time, Back to the good old web 1.0) is that if the limit to get to the front page is too high, it is virtually impossible to achieve it (unless you are part of a digg cartell or you have a page which has a lot of traffic anyway + a digg button). However, if the count is too low (hence it is too easy to get to the front page), people might be tempted to trick the system (by creating more accounts and voting on themselves, for example), just to get to the front page – which will result in a lot of low quality sites making it to the front page. Though I don’t own a social bookmarking site, I bet that finding out the right heihgt of the bar is extremely hard – and it even has to change from time to time in response to more and more submissions, SEO tricks etc.

    Proposed solution: A well-balanced mixture of silicon and carbon. Machines can do the most of the job by analysing logs, activities of the user on the page, thumbs up/down received from the user, articles submitted/voted/commented and other types of usage mining. However, machines alone are definitely not enough (since their don’t have the foggiest idea about what’s in an article) – a lot of input is needed from humans, too. On the one side by the users (voting, burying, peer review etc.) and from the editors as well. However I think that this is all done already – and the result is not really unquestionably perfect, I guess mainly because of the information overload – 5000 submissions a day (or 150,000 a month) is very hard to deal with…

  • Votes of experts should count more – In my opinion, it is not right that if a 12 year old script kiddie votes down an article and an expert with 20 years of experience votes it up, their votes are taken into account with an equal weight. OK, I know there is peer review and if the 12 old will do a lot of stupid moves, he will be modded down – so he will open a new account and begin the whole thing again from scratch. On the other hand, the expert maybe does not have time to hang around on digg and similar sites (because he is hacking up the next big thing instead of browsing) and therefore he might not get a lot of recognition from his peers on the given social site – which does show that he is an infrequent digg/dzone/whatever user, but tells nothing about his tech abilities.

    Proposed solution: I think it is too late for this with the existing sites, but I would like to see a community with real tech people, developers, enterpreneurs and hackers of all sorts. How could this be done? Well, people should show what they did so far – their blog, released open source software, mailing list contributions, sites they designed or any other proof that they are also doing something and not just criticizing others (It seems to me that always those people are the most abrasive on-line who do not have a blog, did not hack up somehing relevant or did not prove their abilities in any relevant way). This would ensure also that only one account belongs to one physical person. I know that this may sound too much work to do (both on the site maintainer’s and the users’ side) but it could lay a foundation for a real tech-focused (or xyz-focused) social site . Of course this would not lock out people without any tangible proof of their skills – however they votes would count less.

  • Everything can be hot only once – Most of the articles posted to the social bookmarking sites are ‘seasonal’ (i.e. they are interesting just for a given time period, or in conjunction with something hot at the moment) or news (like announcements, which are interesting for just a few days). On the other hand, there are also articles which are relevant for much longer – maybe months, years or even decades. However, because of the nature of these sites, they are out of luck – they can have their few days of fame only once.
    One could argue that this is good so – however, I am not sure about it. Take for example my popular article on Screen scraping in Ruby/Rails: I am getting a few thousand visitors from google and Wikipedia every month (which proves that the article is still quite relevant) and close to zero from all the social sites, despite of the fact that it was quite hot upon it’s arrival. Moreover, I have updated it since it’s first appearance with actual information, so it is not even the same article anymore, but a newer, more relevant one.

    Proposed solution: Let me demonstrate this on a del.icio.us example, where a certain amount of recent bookmarks is needed to get to the ‘popular’ section (something similar to the notion of the front page on digg-style sites). In my opinion, this count should depend also on the number of already received bookmarks. Let’s see an example: Suppose a brand new article needs 50 recent bookmarks to get to del.icio.us/popular. After getting there and a great stir is created around it, it gets bookmarked 300 times. Then, for the next 50 days it does not receive that much attention, gets 1 bookmark a day on average, so it has 350 votes altogether. However, after these 50 days, for some reason (e.g. some related topic gets hot) 30 people bookmark it in a few hours. In my opinion, it should get popular again – and moreover, with these 30 (and not 50) bookmarks – because it was already popular once. This metric should be than adjusted after getting popular once again – if this happens, and people don’t really bookmark it anymore despite of being featured on /popular, it should get again 50 (or more) votes.
    On digg style pages I would create a ‘sticky’ section for articles that are informative and interesting for a longer timespan. I would add another counter to the article (‘stickiness’) which should be voted up by both editors and users in a similar way as ‘hotness’ is now. Of course it is very subjective what should be sticky – it is easy to know that news are not sticky, but harder to decide this in case of other different material.

Since I never had the chance to try these ideas in practice, I can’t tell if how much (and to what extent) of them would work in real life. I guess there is no better method to find this out than to actually implement these features… and the other ones coming in vol. 2!

In the next part I would like to take a look on the remaining problems, connected with searching and navigation, comments and discussion, the human factor and miscellaneous problems which did not fit into another categories. Suggestions are warmly welcome, so if there will be some interesting ideas, I will try to incorporate those into the next (or this) installment!

Making a website for distance learning about ruby on rails is a great way to create awareness for the language. With the help of online certificate such as ibm certification, which is attained through sitting the ibm exams. With this you can create this site efficiently and with the guidance of oracle certification you can create a strong database for it. Next look around for internet hosting companies to upload the site on. One good example is bluehost, as it hires the best out, such as cisco’s 350-029 certified, there to provide quality services. To ensure that your site gets a good traffic work on search engine marketing. Employ affiliate marketing program to cater a wide scope of audience.

Site Updates

I am constantly trying to update rubyrailways.com with useful features and enhancements (feedback is warmly welcome – even the present look was influenced by a lot of your comments). This is the last set of updates:

  • Ajaxified Comment Preview

    is a handy WordPress plugin which can create a preview of the comment you are currently writing before you actually post it. Check out any of the posts (e.g. this one :-)) and scroll down to the ‘Leave a Reply’ section to see it in action (you don’t actually have to comment if you just would like to play around).

    Markdown should work with the comments, too. For adding a comment which contains code, I recommend to use the <pre> tag to preserve whitespace (unfortunately SyntHiHol (see next bullet) does not work here).

  • SyntHiHol

    is a code highlighter, you can see it in action in my last post for example. All you have to do to achieve these nicely highlighted code blocks is to write the code in a div with the attribute “lang” (possible values include ruby, python, java, php, c, c++, c#, bash, …. nearly 70 languages!). Cool, isn’t it?

  • Akismet

    though I don’t think so there is anybody who does not know akismet, for completeness, sake, it’s “a plugin which identifies and blocks comment and trackback spam on blogs with integration to various blogging systems”. In the dark pre-akismet era I have received 10-15 spams a day on average. Guess how much did I get since the installation? Exactly zero. It sounds unbelievable but it’s true

    . I have just checked – akismet caught 370 spam comments during its 1-month reign.

  • Installed trac

    a web-based software project management tool – well this is not of too much interest to the readers of this blog (yet) but maybe someone is interested in my experience.

    Currently this site is hosted on dreamhost, and fortunately I have found the DreamTrac script for trac installation on DH. Well, I think now, after all the struggling I would be able to install trac with this script for the first try – but the current installation took me about 3 hours and 8 tries.

    However, this was not the biggest problem – after all I have managed to install and configure trac – it was much worse that the script (or trac configuration?) overwrote my .htaccess file in the home directory, which caused the main site (http://rubyrailways.com ) to stop – and even worse, it exposed my whole home directory to the world! 🙂
    Since I did not count with such side effects, I have found and corrected this problem after a few hours only.

    If this would not be enough, ther is one ‘minor’ problem with trac on dreamhost: for me it is so slow that it is practically unusable (takes about 20-30 seconds, sometimes more to load). I don’t know what can be the problem – it’s fastcgi enabled, and though not explicitly optimized, I don’t think so it should look like this… Any ideas?