Web 2.0 Tutorial

First of all, I have to make a disappointing confession: this is not a Web 2.0 tutorial – but fear not, at least the logical and absolutely valid question to this dilemma (i.e. why the hell is the article entitled ‘Web 2.0 tutorial’ then?) will be provided.

Although this blog’s tagline is ‘Ruby, Rails, Web2.0’ and I am blogging/planning to blog about all these topics in the future, I did not have an exclusively-and-only-about-Web2.0 post yet (as far as I remember). That’s why it strikes me odd that according to google analytics, a lot of people are finding this site via the keyword combination ‘Web2.0 tutorial’. This post was inspired by them and for them!

Since this trend is nearly as old as this blog – and it seems to continue, and even rise as time goes by – I am now really curious what the heck are people imagining behind the term ‘Web2.0 tutorial’. Why? Well, there are more reasons to ponder about:

– Nobody knows what Web 2.0 actually is (or if does, the others don’t agree :-)). Since coined by Tim O’Reilly back in 2005, ‘Web 2.0’ has been redefined, argued about, glorified, despised, parodied, upgraded to Web 3.0, regarded as vapor, bubble etc. (and who knows what else…) countless times – just one thing did not happen: A commonly accepted, concise (or even lengthy) definition with which everybody would agree.
You won’t find anybody interested in the Web today who would not have his own definition associated with Web2.0 – however, these definitions (although more overlapping and similar than ever) will be varying from person to person.

– The conjunction itself is kind of absurd – even if we accept that there is a common understanding of the term ‘Web2.0’, it definitely has more facets: Look (Apple aqua reinvented, round corners galore, reflections of reflections etc), social aspect (digg, del.icio.us, youTube, myspace et al), theoretical backend (ontologies, folksonomies, openAPIs, microformats, mashups etc), standards (XHTML (2.0! :-)), RDF, FOAF, ATOM, SVG, SOAP), innovative ways of communication and catering to the users (WS, REST, Podcasts, Videocasts), typical Web2.0-purpose pages (wikis, blogs), development tools and frameworks (AJAX, Ruby on Rails, …) and other buzzwords 🙂

– Even if we define Web2.0 as a collection of the things from the previous point, the term ‘Web 2.0 tutorial’ is too broad-sense to get you too much relevant results (I believe – maybe some smart webmasters engaged in the ways of SEO tricking found out the carving after a Web2.0 tutorial already and wrote up a few for you :-)). Just as someone would not search a ‘programming language tutorial’ (but a ‘Ruby tutorial’ instead) or a ‘sport tutorial’ (rather a ‘squash tutorial’), searching after a *real* ‘Web2.0 tutorial’ could be ineffective, too. I suggest to look for ’rounded corners tutorial’, ‘mashup tutorial’ or ‘Ruby on Rails tutorial’ etc. instead. Additionally, if you are really keen on Web2.0-ness of these documents, don’t forget to add ‘Web2.0’ to the query – just in case :-).

– Related to the previous point: attack the problem from bottom up rather than the other way around – i.e. try to look for solutions of concrete problems and assemble them into a Web2.0 style whatever once you are done, rather than trying to do something which is Web2.0 in the first place. In my opinion you should think like ‘I would like to create a great mashup in Ruby on Rails with AJAX and a Web2.0 look – how should I go about this?’ rather than ‘Let’s see a good Web 2.0 tutorial and then I will cook something great’. You should strive for creating great looking websites with great content and functionality, and people will like it and use it – whether you call it Web2.0, Web3.0 or whatever – even if the URL of the site will be www.thissiteisnotweb2.0.com :-).

Now that I have mentioned ‘Web2.0’ and ‘Web 2.0 tutorial’ more times in this article, I guess I’ll be receiving even more hits through this query – though this was definitely not the reason for writing this article. However, if you already got this far, please take a few seconds and share with us your thoughts on this. After all Web2.0 is also about collaboration, you know. Heck, I might even write a few Web2.0 tutorials in the future – just tell me what a ‘Web2.0 tutorial’ means… :-).

Implementing ’15 Exercises for Learning a new Programming Language’

A short time ago in a galaxy not so far, far away I came across a nice blog post: 15 Exercises for Learning a new Programming Language.

One could argue if these are *really* the most appropriate 15(+) exercises to learn a new programming language – however, the task of answering this rather complex question is left as an exercise for the reader. Instead of this I will show you their implementation in Ruby – rubyrailways.com style.

Why did I bother to solve these problems (including not really trivial ones, like a scientific calculator with a GUI) ? Well, actually to learn a new programming language! I still consider myself a beginner Ruby apprentice just playing it by ear in my somewhat scarce free time, so I thought that systematically implementing a task list like this will mean great step forward for me compared to just coding random things at random times. Fortunately I was perfectly right!

Before we move onto the code, one last disclaimer: the fact that I am still a Ruby n00b implies that the code can be somewhat hairy/not optimal/[insert any other language than Ruby here]-ish so don’t use these snippets as a textbook solution of the problems or anything like that. I would be glad if someone could suggest a bit of refactoring of the bad parts but I also hope that that there are some nice parts which you can learn from (actually I am quite sure about this since I used some magick formulas from a few Ruby (grand)masters in some cases).

OK, enough talk for now. Let’s see the stuff!

1. Problem: “Display series of numbers (1,2,3,4, 5….etc) in an infinite loop. The program should quit if someone hits a specific key (Say ESCAPE key).”

Solution: Hmm, well, errr…uh-oh… I could not solve this problem fully (what a terrific start :-)). If Henry Ford would sit beside me now, he would say : You can hit any key to exit – so long as it’s ‘C’ – and one more advice: don’t forget to hold CTRL during this action :-). More on this after the code snippet:

i = 0
loop { print "#{i+=1}, " }

Comments :
If anyone knows how to add code which will cause this program to stop with a specific keyhit (say ‘ESC’) please, please, please drop me a note. I have been researching this for at least 10% of the time of solving all the tasks, nearly spitting blood when I gave up :-). It seems (to me) that there is no simple (i.e. no threads and similar) and clean platform-independent solution for this problem. I guess (hope) the author’s idea here was different than to introduce threading or writing platform specific-code…

2. Problem: “Fibonacci series, swapping two variables, finding maximum/minimum among a list of numbers.”


#Fibonacci series
Fib = Hash.new{ |h, n| n < 2 ? h[n] = n : h[n] = h[n - 1] + h[n - 2] }
puts Fib[50]

#Swapping two variables
x,y = y,x

#Finding maximum/minimum among a list of numbers
puts [1,2,3,4,5,6].max
puts [7,8,9,10,11].min

Comments: The Fibonacci code was written by Andrew Johnson (found via Ruby Quiz). I like it so much that I think it would be a shame to present a trivial version here. I guess the rest of the code is self-explanatory.

3. Problem: "Accepting series of numbers, strings from keyboard and sorting them ascending, descending order."


a = []
loop { break if (c = gets.chomp) == 'q'; a << c }
p a.sort
p a.sort { |a,b| b<=>a }

Comments: This version is accepting strings - I think anybody who got to this point can adapt it to work with numbers.

4. Problem: "Reynolds number is calculated using formula (D*v*rho)/mu Where D = Diameter, V= velocity, rho = density mu = viscosity Write a program that will accept all values in appropriate units (Don't worry about unit conversion) If number is < 2100, display Laminar flow, If it’s between 2100 and 4000 display 'Transient flow' and if more than '4000', display 'Turbulent Flow' (If, else, then...)"


vars = %w{D V Rho Mu}

vars.each do |var|
  print "#{var} = "
  val = gets

reynolds = (D*V*Rho)/Mu.to_f

if (reynolds < 2100)
  puts "Laminar Flow"
elsif (reynolds > 4000)
  puts "Turbulent Flow"
  puts "Transient Flow"

Comments: Can you spot the trick in the part which is filling up the variables? They don't go out of scope after the loop ends because they are constants. Other possibility would be to use $global variables but I guess it is usually not a very good programming practice to do that.

5. Problem: "Modify the above program such that it will ask for 'Do you want to calculate again (y/n), if you say 'y', it'll again ask the parameters. If 'n', it'll exit. (Do while loop)
While running the program give value mu = 0. See what happens. Does it give 'DIVIDE BY ZERO' error? Does it give 'Segmentation fault..core dump?'. How to handle this situation. Is there something built in the language itself? (Exception Handling)"


vars = { "d" => nil, "v" => nil, "rho" => nil, "mu" => nil }

  vars.keys.each do |var|
    print "#{var} = "
    val = gets
    vars[var] = val.chomp.to_i

  reynolds = (vars["d"]*vars["v"]*vars["rho"]) / vars["mu"].to_f
  puts reynolds

  if (reynolds < 2100)
    puts "Laminar Flow"
  elsif (reynolds > 4000)
    puts "Turbulent Flow"
    puts "Transient Flow"

  print "Do you want to calculate again (y/n)? "
end while gets.chomp != "n"

Comments: As you can see, I could not use the same trick here when asking for the variables, because when somebody wants to calculate again, Ruby will complain (although by printing a warning only) that the constants have been already set up. Therefore I went for the hash solution. I think the do-you-want-to-calculate-again part is straightforward so I won't analyze that here.
"While running the program give value mu = 0."
Ruby gives a rather interesting result in this case: infinity :-).
"Is there something built in the language itself?"
Sure: exception handling. Division by zero could be caught with a ZeroDivisionError rescue clause.

6. Problem: "Scientific calculator supporting addition, subtraction, multiplication, division, square-root, square, cube, sin, cos, tan, Factorial, inverse, modulus"

Since this code snippet is longer It would look ugly here - you can download it from here instead.


screenshot of the scientific calculator in action

If you would like to try it, you will need the Tk bindings for Ruby (maybe you have them already, here on Ubuntu I did not). Also note that only the regular 0-9 keys (and of course the mouse) work, the numpad ones do not. One more little detail: % stands for modulo, not percent.

Comments: Phew, this was a real challenge, mostly because I never did any GUI in Ruby before. I was amazed that I could code up a relatively feature rich calculator in 100+ lines of code, without any golfing or trying to optimize for shortness. What I wanted to say with this is that the shortness does not praise my programming skills (since I did not eve try to golf) but the superb terseness of Ruby. OK, of course there are some problems (e.g. cube, cos, tan, inverse are not implemented) but the usability/amount of code ratio is unbelievably high.

The GUI is also not the nicest since I have used Tk - wxRuby or qt-ruby would produce much nicer results, but since I did not code any GUI in Ruby previously, I have decided to try the good-old-skool Tk for the first time.

7. Problem: "Printing output in different formats (say rounding up to 5 decimal places, truncating after 4 decimal places, padding zeros to the right and left, right and left justification)(Input output operations)"


#rounding up to 5 decimal pleaces
puts sprintf("%.5f", 124.567896)

#truncating after 4 decimal places
def truncate(number, places)
  (number * (10 ** places)).floor / (10 ** places).to_f

puts truncate(124.56789, 4)

#padding zeroes to the left
puts 'hello'.rjust(10,'0')

#padding zeroes to the right
puts 'hello'.ljust(10,'0')

#right justification
puts ">>#{'hello'.rjust(20)}<<"

#left justification
puts ">>#{'hello'.ljust(20)}<<"

Comments: Amazingly lot of things can be done with sprintf() - I could solve nearly all the problems with it - but that would not really be rubyish, so I have decided for built-in (and one homegrown) functions. However, mastering (s)printf() is a very handy thing, since nearly all big players (C (of course :-)), C++, Java, PHP, ... ) have it so you get a powerful function in more languages for the price of learning one). As you can see, r/ljust is a nice one, too.

8. Problem: "Open a text file and convert it into HTML file. (File operations/Strings)"

Solution: Well, this problem was not specified in a great detail, to say the least - or to put it otherwise, the solvers are given a great freedom to provide a solution spiced up with their fantasy. This is what I came up with:

doc = <strong tag.

final_doc = <
    Text to HTML fun!


FINAL_DOC rules = {'*something*' => 'something', '/something/' => 'something'} rules.each do |k,v| re = Regexp.escape(k).sub(/something/) {"(.+?)"} doc.gsub!(Regexp.new(re)) do content = $1 v.sub(/something/) { content } end end doc.gsub!("\n\n") {"


"} final_doc.sub!(/embed_doc_here/) {doc} puts final_doc

Comments: As you can see, besides that the text is wrapped around with a minimal HTML, every occurrence of words between asterisks is outputted in strong and between slashes in italic. You can add as many such rules as you like, they will be (hopefully) substituted in the final output.

9. Problem: "Time and Date : Get system time and convert it in different formats 'DD-MON-YYYY', 'mm-dd-yyyy', 'dd/mm/yy' etc."

Solution: Well, it was not really clear (for me) what should be the difference between 'yyyy' and 'YYYY' (resp. 'dd' vs 'DD') so again I had to use my imagination. However, I guess it does not matter too much, the solution has to be changed by 1-2 characters only if the original author had something different on his mind.

require 'date'

time = Time.now
#'DD-MON-YYYY', e.g. 12-Nov-2006 in my interpetation
puts time.strftime("%d-%b-%Y")

#'mm-dd-yyyy', e.g. 11-12-2006 in my interpetation
puts time.strftime("%m-%d-%Y")

#'dd/mm/yy', e.g. 12/11/2006 in my interpetation
puts time.strftime("%d/%m/%Y")

10. Problem: "Create files with date and time stamp appended to the name"


#Create files with date and time stamp appended to the name
require 'date'

def file_with_timestamp(name)
  t = Time.now
  open("#{name}-#{t.strftime('%m.%d')}-#{t.strftime('%H.%M')}", 'w')

my_file = file_with_timestamp('test.txt')
my_file.write('This is a test!')

Comments: Maybe a more elegant solution could be to subclass File and override its constructor - but maybe that would be an overkill. I have voted for the latter option in this case :-).

11. Problem: "Input is HTML table. Remove all tags and put data in a comma/tab separated file."

Solution: Since web extraction is both my PhD topic and my everyday job (and even my free-time activity :-)) I will present 3 solutions for this problem. First, the classic old-school regexp way (by Paul Lutus), then with HPricot and finally with scRUBYt!, a simple yet powerful Ruby web extraction framework currently developed by me.

table = <


rows = table.scan(%r{.*?}m)

rows.each do |row|
   fields = row.scan(%r{(.*?)}m)
   puts fields.join(",")

Now for the HPricot solution (in the further examples let's consider that table is initialized as in the previous example):

require 'rubygems'
require 'hpricot'

h_table = Hpricot(table)

rows = h_table/"//tr"
rows.each do |row|
  child_text = (row/"//td").collect {|elem| elem.innerHTML }
  puts child_text.join(',')

and last, but not least scRUBYt!

require 'scrubyt'

table_data = P.table do
               P.cell '1'

table_data.generalize :cell

puts table_data.to_csv

Some explanation: first of all, at the moment scRUBYt! is avaliable on my hard disk (and partially in my head) only - it should be released around XMAS 2006. I am using this solution for a little bit of self-promotion :-).

The example works like this: extract something (in this case a HTML <table>) which has something (in this case <td>) which has '1' as its text (well in reality much more is going on in the background, but roughly along these lines). This little code snippet will extract the first <td>s of ALL <tables> on a HTML page. With the 'generalize' call we tell the extractor that it should not extract just the first <td> in a table (which is the default setting), but all of them.

scRUBYt! can handle much, much, MUCH more complicated examples than this (like an ebay or amazon page) and has loads of sophisticated functions... so stay tuned!

12. Problem: "Extract uppercase words from a file, extract unique words."

Solution: (you can find some_uppercase_words.txt here and some_repeating_words.txt here

open('some_uppercase_words.txt').read.split().each { |word| puts word if word =~ /^[A-Z]+$/ }

words = open('some_repeating_words.txt').read.split()
histogram = words.inject(Hash.new(0)) { |hash, x| hash[x] += 1; hash}
histogram.each { |k,v| puts k if v == 1 }

13. Problem: "Implement word wrapping feature (Observe how word wrap works in windows 'notepad')."

Solution: Unfortunately I am not a Windows user and I have seen notepad a *quite* long time ago - so I am not sure the task and it's implementation are fully in-line - I have tried my best. Here we go:

input = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

def wrap(s, len)
  result = ''
  line_length = 0
  s.split.each do |word|
    if line_length + word.length + 1  < len
      line_length += word.length + 1
      result += (word + ' ')
      result += "\n"
      line_length = 0

puts wrap(input, 30)

14. Problem: "Adding/removing items in the beginning, middle and end of the array."


x = [1,3]

#adding to beginning

#adding to the end
x << 4

#adding to the middle

#removing from the beginning

#removing from the end

#removing from the middle

#we have arrived at the original array!

15. Problem: "Are these features supported by your language: Operator overloading, virtual functions, references, pointers etc."

Solution: Well this is not a real problem (not in Ruby, at least). Ruby is a very high level language ant these things are a must :).

Finally, you can download all the solutions in a single archive from here.
I would like to see the implementation of these tasks in both Ruby (different (more optimal) solutions of course) as well as in anything else. If you set out to do something like that, be sure to drop me a note.

Internet contains huge number of opportunities to earn money online. Simply create a site that you think has the potential to sell hot items using ruby on rails. Register a relevant domain name and purchase a web hosting service through hostgator, one of the better web host out there today. Get a internet connection through one of the wireless internet providers to upload your site. Work on search engine optimization to get a better traffic and also use affiliate marketing program for the same reason. Finally get a free voip phone service to contact customers directly. The pc to phone system is the most effective method of marketing.

How to reach the finish line

Before everything else, getting ready is the secret of success. (Henry Ford)

For the last few months of my professional/geek life I have felt like I am sitting on a huge roller coaster – reaching the sky today just to find myself at the bottom of a cozy swamp tomorrow. Sometimes I have days when I achieve the work of ten people and I am filled with so much energy and enthusiasm that my family is afraid I might blow up in any minute :-). Even after such a very busy day I usually can not sleep too well because I am thinking/dreaming about how this or that unit test or piece of code could be improved and how will I tackle it tomorrow morning.

However, more frequently then I would like it to happen, the very next day everything may turn out quite the opposite: Working may be mixed with browsing, playing video games, watching TV etc. instead of focused and effective work that was fueling me to the boiling point yesterday.To add to the frustration, it is more or less unpredictable in advance when and why are the ups and downs coming and how long will they persist.

My overall average productivity is about equal with a well balanced person: both of us is working 200 hours a month. However, the balanced guy achieves this by working 50 hours a week whereas my pattern is absolutely unpredictable – it may be 30 + 80 + 20 + 70 or 20 + 30 + 40 + 110 or anything that sums up to 200. I guess 200 + 0 + 0 + 0 did not happen for only single reason yet: because a week has just 168 hours.

Someone could argue: why bother then? After all, the job gets done and that is all what matters. Well, for me this is not the only thing that matters: to effectively pursue a wide range of activities, some kind of planning is needed in order to be able to process them all at once, ensuring that each gets the proper weight at due time. While the hectic model was OK during the campus life (i.e. watch a season of 24 (24 hours), do the work of yesterday and today (20 hours), get some sleep at last (20 hours), rinse, repeat etc), the balanced way of doing things is recommended if you have a family and job yet you still would like to stay involved in a diverse array of activities.

My wife used to laugh at me in the mornings when I put on my loser or winner face. With her constant teasing she made me understand that the fact that how I feel the given day and what I am able to do depends only on me and not on certain circumstances.

My biggest problem seems to be that after the first excitement I tend to easily lose my interest in about everything I start. This of course leads to cooling down and eventually drifting off the track right before the finish line. Why? Well, to jump into some “more interesting stuff” of course. The result: constant feeling of failure and no results to show up. Sad but true.

Since I am an optimistic person and I hate to be in unpleasant situations if I can choose not to, after all that struggling I decided to come up with a plan to “visit” the finish line more often 😉 The points I have identified work for me pretty well so far and I am updating and extending them from time to time based on the results in practice. At the moment this is what I have:

  1. Have a clear vision – if you do not know exactly what do you want to reach or where you want to end up, you will never reach that point. (Even if you would, how would you know? 🙂
  2. Make a detailed plan – if you see the progress, it motivates you to continue. You can break down any task to small pieces. By the end of the day, looking at your to-do list you can put up your “I made a great step forward today” smile with reason. I break down even the household chores to small tasks, so after looking at my all-checked to-do list my wife thinks I am the fastest and most effective clean-up guy on the globe ;). “Nothing is particularly hard if you divide it into small jobs.” (Henry Ford)
  3. Find a partner – even the most excited people lose their enthusiasm over time. A good partner can help you to stand up when you feel like falling or loosing interest and makes you go further if you are just about to give up.
  4. Value yourself and the progress you have achieved – do not be too critical but also do not overestimate yourself. It leads to frustration and ruins your motivation step by step.
  5. Always determine your current level correctly and try to expand from there – it is not the biggest problem if you are weak in certain areas (since that can be improved). However, it is much worse if you don’t admit this to yourself (which means you won’t release any effort to improve it). Once you acknowledge your current position (e.g. that you can work only 10 minutes continuously) you can gradually improve it (e.g. by working without break for 1 more minute daily) until you reach the desired goal (e.g. Alt-Tab-less working for 2 hours). Don’t be ashamed or blame yourself for any clumsiness – identify it and get rid of it!
  6. Do not be impatient – be realistic about the amount of work you are capable to do for the given period of time. There are some things that can be done overnight. (I wanted to finish my Ph.D. in a week and I failed miserably 😉
  7. Never give up – do not stop before the finish line. Nothing is worst than a work without results. It consumes too much time. By looking at just the achievements (which is usually the practice in the real life) it is useless. No credits, no recognition. Never stop at 99%, since that is still an unfinished job. Even 10 * 99% = 0 (and not 990%, which is 9 by rounding) – thus 1 * 100% > 10 * 99%.
  8. Do not be sorry for yourself – it does not help and drags you down.
  9. Do not look around too often – too many people fail because they look at their surrounding and can not step over the boundaries of the tiny world around them. Who cares if the guy next door is ten times better at bugfixing or writes his papers ten times faster while you are struggling with every word? Even if you may feel this on your own skin, believe me that staring at the abilities of others and the constant comparison leads to a dead end. After some time it makes you believe you are truly useless and stupid. Do not believe everything – probably those “Supermans” are not half as good as they seem to be, and half of the stories about them are urban legends. And what if they are really so superb? Does it change your abilities in any way? Of course not.
    If there is only one point you remember, this be it. Believe me, it can make your life much easier. At least it made mine.
  10. Do not look for excuses – it is the simplest way to get stuck.
  11. Do the interesting and annoying things side by side – it is easy to fall to the “trap of interesting parts”. Unfortunately the most hated parts are also part of the project (at least I have not met a project with solely “fun” parts) and need to be done as well. Do not leave them to the end if you do not want ot struggle just before the finish line. If you do them aside with the fascinating jobs the suffering will “disappear”.

+1 Believe in yourself“To succeed, we must first believe that we can.” (Michael Korda) or as Henry Ford put it: “If you think you can do a thing or think you can’t do a thing, you’re right.”.

Remember: the work you have done is measured by the final result, not the time and effort you have invested in it. Once you start something, never look back – just from behind the finish line.

Sometimes less is more

Update: A lot of people were disappointed that 10.minutes.ago etc. is not working in pure Ruby. Well, after executing the line require ‘active_support’ it does – I think this is a fairly small thing to do to enable these powerful features.

Every guide published on favorable writing principles emphasizes the power of brief and concise style. This is especially true in the case of technical texts, and in my opinion, in the case of well-designed programming languages as well.

Note the word well-designed. I did not say in the case of any (programming) language, since that would just not be true: conciseness can come at the cost of readability. (If you ever tried to read kanji, you know what I am talking about 😉 . However, I am claiming that in the case of a really well-designed programming language, succinctness helps readability, reduces bloat and leads to easier and faster understanding of the code. In my experience, the amount of boilerplate code to write is decreasing proportionally with the terseness of the programming language, ultimately leading to a coding style where you (nearly) don’t need to write boilerplate at all.

I will demonstrate this on a few Java vs. Ruby code examples. However, this is NOT a Ruby-bashing-Java article, but a few examples of idioms and interesting constructs; C++ vs Haskell or Lisp could serve equally well (sometimes even better), but since I am currently working with Java and Ruby on a daily basis, it is easier for me to use them.

If you are a pro Ruby and/or Java programmer, and/or you think the article is too long for you, please jump to the “Random Code Snippets” section.

Possibly the most straightforward reason why Ruby code is more readable even in shorter form is that really everything is an object [1] in Ruby-land. For example in Java, primitives need wrapper classes to ‘become’ objects., while in Ruby they are first class objects on their own. This makes constructs like

10.times { print "ho" }  #=> "hohohohohohohohohoho"

or (will output the same string)

print "ho" * 10 #=> "hohohohohohohohohoho"


There are a handful of other reasons which make Ruby more readable and elegant, but before I get bogged down in the explanation too much, let’s see the examples!

Whetting your appetite

In the first part I will describe some basic constructs which would make the life of any Java developer much easier. These techniques are neat, but they are not using any really sophisticated stuff yet: I will try to take a look at those in the next bigger section.

The empty program


class Test
    public static void main(String args[]) {}



I did not forget the Ruby snippet; You can not see anything there because actually a Ruby program doing nothing is exactly 0 characters long.
On the other hand, the Java version is slightly longer. I is kind of weird to explain to a newcomer what do ‘class’, ‘public’, ‘static’, ‘void’, ‘String’, the [] operator and several braces here and there mean, and why are they needed if the program does literally nothing

Fun with numbers

Note:For some of the next examples you will need to use Rails Active Support.

if ( 1 % 2 == 1 ) System.err.println("Odd!") #=> Odd!


if 11.odd? print "Odd!" #=> Odd!

Does not the first example make more sense (even for a non-programmer)?. I believe it does. More of this type:


102 * 1024 * 1024 + 24 * 1024 + 10 #=> 106979338


102.megabytes + 24.kilobytes + 10.bytes #=> 106979338

OK, maybe this is an unfair comparison since Java does not have (?) those functions. However, the point is that even if it had,
the best I could come up with would look like:

Util.megaBytes(102) + Util.kiloBytes(24) + Util.bytes(10) #=> 106979338

Which is far from the elegance and readability of the Ruby example.
In the next example we will assume that we have a Java function similar to ordinalize in Ruby.


System.err.println("Currently in the" + Util.ordinalize(2) + "trimester");


 print "Currently in the #{2.ordinalize} trimester"    #=> "Currently in the 2nd trimester"

In this example we can observe variable interpolation: anything wrapped in #{} inside double quotes gets evaluated
and substituted in the string, providing a more readable form without a lot of + + Java constructs (which is cool mainly if you have more variables inside the double quotes).


In my opinion, handling dates and times is a great PITA in Java, especially if you are implementing some complex code.

System.out.println("Running time: " + (3600 + 15 * 60 + 10) + "seconds");


puts "Running time: #{1.hour + 15.minutes + 10.seconds} seconds"


new Date(new Date().getTime() - 20 * 60 * 1000)




Date d1 = new GregorianCalendar(2006,9,6,11,00).getTime();
Date d2 = new Date(d1.getTime() - (20 * 60 * 1000));


20.minutes.until("2006-10-9 11:00:00".to_time)

I think you do not have to be biased towards Ruby at all to admit which code makes more sense instantly…

I have recently found a very cool way of parsing dates in Ruby: using Chronic. However, I would not like to present it here since it is not a feature of the language, ‘just’ a nifty natural-language date parser [2].

A little bit more advanced stuff



Class Circle
  private Coordinate center, float radius;

  public void setCenter(Coordinate center)
    this.center = center;

  public Coordinate getCenter()
    return center;

  public void setRadius(float radius)
    this.radius = radius;

  public Coordinate getRadius()
    return radius;


class Circle
  attr_accessor :center, :radius

Believe it or not, the two code snippets are absolutely equal; The getter and setter methods in Ruby code are generated automatically, so not only you do not have to write them, but they are not even there to clutter the code.

I have seen argumentation from Java guys that stuff like this (i.e. the public static void main … thing, getters/setters and other boilerplate code) can be generated with any decent GUI like Eclipse (or by tools like XDoclet etc) is a non-issue. Well, as for their generation, let us say this is true. But for the readability of code it is absolutely not!

For example. take getters/setters: Every variable in Java ads 8 more lines of code (not counting the lines between the function declarations) compared to the Ruby :attr_accessor idiom. That is, a simple class definition having 10 fields in Java will have 80+ lines of code compared to 1 lines of the same code in Ruby. For me, this definitely means a big difference.

Arrays (and other containers)

This section was inspired by a blog entry by Steve Yegge.

Arrays are interesting citizens of Java: They are not really objects in the “classical” sense , so they have very limited functionality compared to first-class Java objects. On the other hand, they are offering a huge advantage over the other container classes: they can be easily initialized.


String languages[] = new String[] {"Java", "Ruby", "Python", "Perl"};

instead of

List<String> languages = new LinkedList<String>();

which is kind of lame when you quickly need to hack up some testing data.

However, they have also some serious problems: you have to define the number of the elements upon construction time, like so:


String someOtherLanguages<String>[] = new String[15];

which sometimes really cripples their functionality. [3]

How does this work in Ruby? Let’s see on three different examples (All three code snippets provide the same result):

stuff = [] #An empty array - as you can see there is no need to define the size
stuff << "Java", "Ruby", "Python" #Add some elements
#Initialize the array with the values
stuff = ["Java", "Ruby", "Python"]
#Yet another method yielding the same result
stuff = %w(Java Ruby Python)

In my opinion, these forms (especially the last one) are more straightforward and can save a lot of typing.

Another major shortcoming of Java arrays is that besides the [] operator you have only the methods inherited from Object and a single instance variable : length [4]. This means that even essential functionality like sorting, selecting elements based on something etc. has to be done via a ‘third party’ function, like this:



which seemed quite normal to me when I have been learning Java and have had no previous experience with dynamic languages, but now it looks kind of annoying.

Another Java-container-woe compared to Ruby is that in Java, an array is an array. A list is a list. A stack is a stack.
If you are wondering what the hell I am talking about, check out these Ruby code snippets:


stuff = %w(Java Ruby Python)
#Add the string "Perl" to the array
stuff << "Perl"
#Prepend the string "Ocaml" 
stuff.unshift "Ocaml"  
=> ["OCaml", "Java", "Ruby", "Python", "Perl"]
#Use the array as a stack
=> "Perl"  #stuff is now ["OCaml", "Java", "Ruby", "Python"] 
stuff.push "C", "LISP"
=> ["OCaml", "Java", "Ruby", "Python", "C", "LISP"]
#Update C to C++ 
stuff[4] = "C++"
=> ["OCaml", "Java", "Ruby", "Python", "C++", "LISP"]
#Remove the fisrt element
=> "OCaml" #stuff is now ["Java", "Ruby", "Python", "C++", "LISP"] 
#Let's just stick with Java and Ruby - slice out the  rest!
=> ["Python", "C++", "LISP"] #stuff is now ["Java", "Ruby"]

As you can see, the Ruby Array class offers functionality that could be achieved only by mixing up several Java containers into one (to my knowledge, at least) [5]. In practice, this usually speeds things up a lot.

Another thing that really annoys me when using containers in Java is the lack of this functionality:

stuff = ["OCaml", "Java", "Ruby", "Python", "C++", "LISP"]
#Lua is just gaining steam, add it to the 7th place
stuff[7] = "Lua"
=> ["OCaml", "Java", "Ruby", "Python", "C", "LISP", nil, "Lua"]

i.e. that if I am adding an element to an index which is bigger than the size of the array, the empty space inbetween is filled with nils. Now seriously, who would not exchange this for the Java behaviour (an IndexOutOfBoundsException is thrown) – after all, if I would need this functionality (which is VERY seldomly the case) in Ruby, I could check it myself and raise an exception if I don’t like what I see.

I wanted to write a bit about differences between hashes and files in Ruby and Java, but the post is already longer now then I wanted it to be so I guess I will just show some concrete code snippets to conclude.

Random Code Snippets

In this section I would like to present some real cases I have been solving with Ruby recently. Since I am still new to Ruby, I was totally amazed just how much more simpler, shorter yet much more understandable the code can be in Ruby compared to Java.

Files and Regular Expressions

As Bruce Eckel once put it, In Java, it’s a research project to open a file. Well, I have to agree. Maybe I am the only one Java programmer (besides Bruce) who – even after using Java professionally for five years – still can not write to a file without using google first. Maybe I should learn it one day?

Regular expression support in java is OK (at least one does not have to use external packages as in the pre-1.4 era), however, compared to Ruby the syntax is quite heavy.

Let’s see a demonstration on the following task: Open the file ‘test.txt’ and write all the sentences to the console (one sentence per line) which contain the word ‘Ruby’. First, the Java solution:


import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test 
	public static void main(String[] args)
	    try {
	        BufferedReader in = new BufferedReader(new FileReader("test.txt"));
	        StringBuffer sb = new StringBuffer();
	        String str;
	        while ((str = in.readLine()) != null) 
	          { sb.append(str + "\n"); }	        
	        String result = sb.toString();
	        Pattern sentencePattern = Pattern.compile("(.*?\\.)\\s+?");
	        Pattern javaPattern = Pattern.compile("Ruby");
	        Matcher matcher = sentencePattern.matcher(result);
	        while (matcher.find()) {
	            String match = matcher.group();
	            Matcher matcher2 = javaPattern.matcher(match);
	            if (matcher2.find())
	    } catch (IOException e) 

It is quite straightforward what this relatively simple code snippet doing – but if this is straightforward, what should I say about the Ruby version?

File.read('test.txt').scan(/.*?\. /).each { |s| puts s if s =~ /Ruby/ }

Well, umm… I guess this example quite much expresses the point I am talking about from the beginning: sometimes less is more, a.k.a. Succinctness is Power!

Again, I wanted to show much more examples, but I have the feeling that since the article is already too long, no one would read it 🙂 It is a big pity since I did not even talk about hashes, blocks, closures, metaprogramming (well, I will mention it briefly in the last (really :-)) example) and other goodies – maybe in part 2?

If this is still not enough…

Although I find it very easy and natural to express a lots of things in Ruby, the language can not offer anything I would ever need. However, there is a powerful concept to invoke in such situations, called metaprogramming.

A few days ago I needed to test some algorithms on trees, so I needed to hack up a lot of tree test data. Here is how I would go about this in Java using the example below (let’s assume in both languages that we have a simple data structure Tree):

            /      \
          b         c
         /  \      / | \
        d    e    f  g  h


Tree a = new Tree("a");

Tree b = new Tree("b");
Tree c = new Tree("c");

Tree d = new Tree("d");
Tree e = new Tree("e");

Tree f = new Tree("f");
Tree g = new Tree("g");
Tree h = new Tree("h");

Another possibility would be to create an XML file with the description of the tree and parse it from there. This solution is even more convenient since though you have to write the parsing code, you just have to edit an XML file once it is written. One possibility how the tree of this example could look something like


  <node name="a">
      <node name="b">
          <node name="d"/>
          <node name="e"/>
      <node name="c">
          <node name="f"/>
          <node name="g"/>
          <node name="h"/>

The latter solution is quite cool. After all you do not need to write any code, just alter the XML file and that’s it.

Now let’s see the Ruby solution I came up with:

tree = a {
            b { d e }
            c { f g h }

Well… suddenly even the XML file seems too heavy, does not it? 🙂 Not to mention the fact that the latter example is pure Ruby code – there is no need to open an external file and parse it – you just run it and the variable tree will contain your tree. That’s it.

Of course Ruby can not handle this code as it is – for this we need to invoke some metaprogramming magic.

Metaprogramming is a way to drive Ruby with Ruby. Java (especially J2EE) is usually driven by XML (which is not always really a good thing in my opinion) As you could see, Ruby is driven by Ruby instead 🙂

This example merely scratched the surface of Ruby’s possibilities through metaprogramming. However, as with the other examples, my goal was not to advocate a concrete pattern/method over a different one, but rather to show how a specific toolset can change the way of thinking about the task at hand, and the way of code design/implementation in general.

Final thoughts

When I was a child, I spoke as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish things. – The Bible, I Corinthians 13:11

This thought pretty well expresses how I felt about Java/C++/(substitute any non-dynamic language here) when I came to know (some of) Ruby’s true dynamism and expressive power through terse yet powerful idioms which transformed my whole thinking about programming. Of course I do not claim that I ‘became a man’ because that’s still a very long way to go, but still, even with my very limited knowledge of Ruby, the way to express things in Java/C++ now seems… well… childish ;-). [7]

Creating a site on online tutorials is not a completely bad idea. One should include courses for 642-054, 642-054, 642-176, ruby on rails, cissp certification. If you have the knowledge for creating the tutorials, the rest is very simple. By using dot5hosting company’s site you can purchase a web hosting package that is economical. Even dedicated servers can be found for low prices. Then through the use of ip telephony one can directly reach its potential clients. Other internet marketing methods should also be considered to create awareness the site.


[1] I wonder whether this déjà vu will happen to me in the future once again: I have had this ‘Wow, everything is an object’ feeling when coming from C++ to Java; Then after coming from Java to Python; and most recently, after coming from Python to Ruby. Back

[2] Some examples that Chronic can handle:

  6 in the morning
  this tuesday
  last winter
  this morning
  3 years ago
  1 week hence
  7 hours before tomorrow at noon
  3rd wednesday in november
  4th day last week


[3] In the previous array initialization example this was not needed since it is trivial when all the elements are stated beforehand.Back

[4]which somewhat confusing given that all the other containers use the method size() (and not a field!) to determine the count of elements. To nicely mesh with the confusion, the String object provides the method length() (and not a field, as with array) to query the number of characters… Back

[5] I mean it is not possible to construct any container – other than an array – with literals, you can use the [] operator on arrays only, you can not get the i-th element of a stack etc. Back

[6] The code I have been using to accomplish this task relies on the method_missing idiom:

class Object
  @stack = []
  @parent = nil

  def method_missing(method_name, *args, &block)
    tree = Tree.new(method_name)
    @parent.add_child(tree) if @parent != nil
    if block_given?
      @stack ||=[]
      @parent = tree
      @stack.push @parent
      yield block
      @parent = @stack.last


[7] This does not necessarily mean that Java is bad and Ruby is good – just that it was the ‘Ruby way’ that struck a chord in me after trying/playing around with programming in many programming languages. Many of the features I adore in Ruby are there in Java as well, but they did not ‘came through’ whereas with Ruby there was a point of enlightenment when I really understood a lot of generic, non-language specific principles. Back

Evaluating XPaths with indices in HPricot

Currently I am working on a Ruby web-extraction framework (more on this in a future post) and I have chosen Hpricot for the HTML parsing and XPath evaluation. So far, it seems to be the perfect choice – it is lightning fast (nothing I have encountered so far – REXML, Htree, RubyfulSoup – was even near in terms of speed), very intuitive to use, and it simply works for nearly everything you will need for day-to-day tasks.

However, since my Web-extractor is relying on XPath very heavily, sometimes I need things not included in the ‘nearly-everything’ bag of goodies – but so far I have always managed to come up with a solution. Last time it was the evaluation of XPaths with indices. This is what I hacked up:

def to_hpricot(xpath)
  "#{'(' * (xpath.split(/\d+/).size-1)}" +                 
  xpath.gsub(/\d+/) { |num| (num.to_i - 1).to_s }. 
        gsub(/\/(.*?)\[/) { |p| "/'#{$1}')[" }            

Whew, that’s ugly – but it works quite well. Let’s see a simple example:

doc = Hpricot("<p>A<b>very</b>simple<b>
  <i>small</i>test<i>document</i>.</b>Very cool.</p>")

=> {elem <i> {text "document"} </i>}

OK, it is really just a hack – it does not consider special cases, does not handle anything besides indices (i.e. no axes, @’s or anything) etc. – but anyway this is hopefully just a temporary solution and _why will add evaluation of indexed XPaths to Hpricot soon. There would be at least one HPricot user who would be really delighted about it 🙂

The Sadly Neglected Pickaxe-killer

I have just finished reading Ruby for Rails: Ruby techniques for rails developers from David A. Black. Here is my (WARNING: highly opiniated) review…

I have been a Python fanatic for quite some time, and decided to give Ruby a shot. After some googling, I found most references pointing to a book called the ‘Pickaxe’. Quite a strange name for a programming book, thought to myself, but picked it up nevertheless. I have been instantly converted after a few dozen pages – mining Ruby with the pickaxe was an awesome experience! Since then, I have finished reading the second edition and became a Ruby enthusiast.

After lurking around a bit, I have learned that the common standpoint is that every newcomer/beginner should grab a copy of the Pickaxe to get started. Based on my previous, positive experience I could not agree more – until I came across R4R.

Ruby for Rails is awesome: The technical depth is just right to not distract beginners, yet detailed enough for even the more advanced readers. I did not skip a single page (though years of programming experience and tons of similar programming books I came across during that time could allow me) and finished reading it in no time.

I could write some more about how cool this book is (and it would deserve every bit of it), but I think you can read about that just anywhere (a nice review can be found here), so I would like to point out something different: If we consider the Pickaxe THE book for newcomers, then IMHO R4R is a Pickaxe killer.

Don’t get me wrong: I am a great fan of the Pickaxe, which is another very high-quality technical book – but if someone wants to apply the ‘right tools for the right job’ principle, I think newcomers who already decided to learn Ruby should grab Ruby for Rails. Programming Ruby’s Part I is absolutely well suited to get the ‘feeling’ of Ruby, and it’s next chapters are great to learn the advanced stuff – however in my opinion, the leap between the first and the next chapters is too big for an absolute beginner. Ruby for Rails is there to fill this gap.

Maybe someone might not advice this book to a newbie eager to learn Ruby, since it has ‘Rails’ in it’s title. However, R4R is still primarily a Ruby book, and while I found the Rails parts to be very helpful, I can recommend it to anyone who would not like to learn Rails at all – though the full potential of the book comes through if one would like to learn both.

Conclusion: Ruby for Rails is an awesome book on Ruby. If you are beginner, would like to get a solid understanding of the Ruby principles, or your goal is to polish up your Ruby knowledge to grasp the Rails framework – R4R was made just for you! Check it out – you won’t be disappointed.

Gems or Snakes?

After being a very happy Python programmer for 2 years, i have switched to Ruby a few months ago, and though Python is still
my 2nd favourite language, i have never thought of going back to it for a second. In fact, this feeling was so
natural that i did not even think about it’s reason for some time.

If someone compares these two languages just from the technical point of view, the difference is de facto non-existent.
Both languages are built on similar principles, both of them serve essentially the same purpose. What is the secret
sauce of Ruby then? Why did i get attracted to it immediately, past the point of no return? Here are a few points that came to my mind:

‘He is the ONE

  • If a beginner stumbles onto Ruby, there is ONE book he will be pointed to. The PickAxe.
  • If somebody asks which web framework should he use in Ruby, he will be pointed to a specific ONE: Ruby on Rails.
  • If he asks for a starter book on RoR, he will be advised to buy the coolest ONE: Agile Web Development with Rails.
  • If he asks for a discussion list/newsgroup, he will be pointed to the only ONE: ruby-talk.
  • If he looks for an XML processing library, he will be pointed to the standard ONE: REXML.

The list could go on and on…
A Rubyist with no previous Python experience may ask ‘Well, what’s so cool about this? It’s normal’. Well, i am glad that in Ruby is, but Python is a different story. I think it lacks the books like PickAxe and Agile Web Development with Rails, and also the community is divided up between Django, Turbogears, Pylons, Subway, … and the other dozen of web frameworks.

nice application of the DRY principle 🙂

Rolling on Rails

If you would ask random people to summarize in one point why Ruby is so popular today, i am quite sure most of them would say ‘because of Ruby on Rails’. This framework is really that cool, believe it or not. Some people are already apostrophing it ‘the language/framework of web2.0’, pointing out that Rails is the next big thing in the web space.

Spread the word

A programming language is essentially a bunch of boring definitions: Some grammar, rules, constructs etc. Even if it is very very cool, no one will notice it unless it is evangelized. That’s why great stuff needs great evangelizators: Perl+Larry Wall. Microsoft+Bill Gates. Ruby+DHH.
To follow the logic, i should have written Ruby+Matz. But i would not write Matz, just as i would not pair Python with Guido van Rossum in this sense. These smart gentlemen are really good at language crafting, but the analogy with Perl/Larry Wall stops here. Fortunately Ruby has a great evangelizator, too, although an ‘indirect’ one: David Heinemeier Hansson, who, in my opinion made Ruby really popular through the Rails framework.


After a few dozen of mails, i have much much better experience with the ruby-talk ML than with python-tutor. Of course one should not judge based on a few dozen mails, but the Ruby community feels to me like a big family, whereas the Python community is more like a bunch of engineers in white suits. Matz’s ‘Why does Ruby suck’ kind of style appeals me much much more than Python’s agnostic approach – ‘Maybe it is not even sure that there is a problem – first you should define what do you think the term ‘problem’ means, anyway’ etc. Of course this rigorous style may appeal to some – but not for me.

Integration [with Java]

From the JRuby page:
[JRuby is] A 1.8.2 compatible Ruby interpreter written in 100% pure Java
On the Jython page, i could not even find the compatibility with java – but according to the page, “The final release of Jython-2.1 occurred on 31-dec-2001”
For comparison: Ruby 1.8.2 is almost the latest stable, and JRuby’s last release was on 27-march-2006. JRuby makes also some Rails integration possible already, and the authors are focusing on other J2EE issues like calling EJBs etc.
I think in a world where Java is the king of the hill (at the moment), Java integration can be a deciding factor.


T-shirts. Cofee Mugs. Baseball caps. Other kind of good-for-nothing junk – must haves for all geeks! Of course with their favourite stuff on it. Well, after looking on cafepress.com (and on google in general) Ruby is a winner again when compared to Python.

The list could continue on, but since this entry is already too long i am going to stop here 😉 Of course, as everything on this blog,
this article reflects my opinion, my perception of Ruby/Python. If you think Python is better suited for you, i am not arguing or anything – it would not make sense. However, i think Ruby has much more potential to become widely accepted as a mainstream language right now than Python – and this, besides that i like to code in Ruby much more, will keep me in the Ruby camp for a long-long time…

W3C Mozilla DOM Connector was released today

I have announced the upcoming release of the W3C Mozilla DOM Connector in one if my previous posts, and now it has finally arrived. You can view it at


or check it out with svn:

svn co http://svn.rubyrailways.com/W3CConnector/

For a description about the connector, please refer to my previous post. If you would like to try it out, here is how:

#this code snippet gives you a DOM document of the currently loaded page:
  nsIWebBrowser brow = getWebBrowser();
  nsIWebNavigation nav =
  nsIDOMDocument doc = (nsIDOMDocument) nav.getDocument();
  Document mozDoc = (Document)

From now on, you can use all the existing java/dom libraries such as an XPath2 engine like saxon, xalan, whatever you want working on mozilla documents.
This means tremendous power compared to (in their category outstanding, but still limited) tools like RubyfulSoup or Mechanize, stemming from the power of XPath to query XML documents.
A simple example – dumping DOM of the html document to stdout:

public static void writeDOM(Node n)
      throws IOException
      try {
          StreamResult sr = new StreamResult(System.out);
          TransformerFactory trf = TransformerFactory.newInstance();
          Transformer tr = trf.newTransformer();
          tr.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
          tr.transform(new DOMSource(n), sr);
      catch (TransformerException e) {
          throw new IOException();

Cool, isn’t it?
At the moment, I am discussing different integration issues with the Mozilla guys, since the connector should be the part of Mozilla and the Eclipse editor in the future.

W3C Mozilla DOM Connector will be released soon!

I am happy to announce that the much anticipated W3C Connector, after lots of coding, testing, bug fixing and several months of successful usage in a commercial product was proven worthy to be released to the public. If everything goes well, it will hit the streets next week.

OK, but what the heck is the W3C Connector?

The W3C Connector is a Java package which can be used to access the Mozilla DOM tree from Java, while implementing the standard org.w3c.* interfaces. This means you can use it with any standard Java package that is expecting org.w3c.* interfaces ( Xerces, Saxon, Jaxen, … ) to execute effective queries on the Mozilla DOM (XML/XSLT/XPath/XQuery operations for example).

Technically,the W3C Connector is an implementation of the standard org.w3c.* interfaces. The implementing classes are calling Javier Pedemonte’s JavaXPCOM package, which in turn wraps the Mozilla XPCOM in order to gain access to the Mozilla DOM. See the image for an illustration:

This is very nice and all, but why should I care about it?

If you ever wanted to do (or have done) a screen scraping application, where you needed to understand the underlying document to some extent (regular expressions were not sufficient) you should know that there are many pitfalls along the way:

  • First of all, malformed HTML code: Despite the continuous efforts of the W3C and other organizations/individuals to remedy this problem by promoting X(HT)ML and other machine parsable formats, a lots of web pages still have malformed code in them. Based on the level of non-standardness, parsing such a page can be more than a moderate technical problem: in practice there are pages which can not be parsed to produce an usable input.
  • You can not use a standard query language like XPath or XQuery – these languages require a XML input, which you can not ensure because of the previous point, so you are left to roll your own code to process the parsed data.

Of course this is not a big problem for a crafted programmer, mainly if he is equipped with tools like HTMLTidy to address the first point, RubyfulSoup or similar to tackle the second. However, even these (and other) tools and a cool programming language are still just easing up the pain of effective screen scraping, but not offering a generic solution. If you want to scrap a lot and different pages, these problems in practice will cripple your efforts (or at least make it last very long time in practice).

How does the W3C Connector solve this problem?

By solving both points: The Mozilla DOM is a structure reflecting how gecko (the mozilla rendering engine) renders the page, and it always translates to valid XML (no unclosed tags or otherwise malformed code), and because of implementing the org.w3c.* interfaces you can use very robust and effective XPath packages (like Saxon) to query the document for effective HTML extraction.

There are of course a lot of other possible uses – the connector is not a tool itself, but a gateway to the world of W3C compliant XML tools – it is up to you how to leverage the power it gives you.

The Big Brother

The W3C Connector will be released officially as the part of theATF project. The code is under the last review at the moment, it is possible that I will come out with a preview release before the official one.

Ruby on Rails and Ubuntu

Although i have nice Ruby on Rails support at dreamhost, i have decided to install RoR to my local machine. IMHO a local RoR, combined with Radrails and WEBrick is the simplest and quickest solution, usually fully enough for development purposes.

To install Ruby on Rails, you just need to do 3 things:

  1. Install Ruby
  2. Install RubyGems, a package management system for Ruby
  3. Once you have RubyGems installed, it is a piece of cake to install new packages. In this concrete case:

gem install rails –include-dependencies

And you are on rails! This was easy, wasn’t it?

Well, yes, unless you happen to use Ubuntu .

Under Ubuntu, the MySQL-Ruby binding was outdated. No problem, just get the newest mysql-ruby gem:

gem install libmysql-ruby
At this point the trouble begun. Instead of installing the mysql binding, i got every kind of nasty error messages, including:

ERROR:  While executing gem ... (RuntimeError)
ERROR: Failed to build gem native extension.
Gem files will remain installed in
/usr/local/lib/ruby/gems/1.8/gems/mysql-2.7 for inspection.
ruby extconf.rb install mysqlnchecking for mysql_query() in
-lmysqlclient... no
checking for main() in -lm... yes
checking for mysql_query() in -lmysqlclient... no
checking for main() in -lz... yes
checking for mysql_query() in -lmysqlclient... no
checking for main() in -lsocket... no
checking for mysql_query() in -lmysqlclient... no
checking for main() in -lnsl... yes
checking for mysql_query() in -lmysqlclient... no

WTH?! I have googled a day to find the problem, then also wrote to the RoR and Ruby mailing lists, but got no answer. As i came to know later, this happened because my problem had nothing to do with Ruby, Rails or MySQL.
After several hours of trying every possibility, i have been checking the last one: Do i have gcc and make on my machine? Well, i did not! I have thought a decent linux distribution should have make, gcc et al – and i was wrong (considering that Ubuntu is a decent distro).

Ubuntu is a very nice distribution, for desktop usage. However, if you are planning to use it for development, compiling etc., double check whether you have basic development tools (apt-get install build-essential), gcc and whatever you may need for compiling/deploying your applications.

Or install Rubuntu

rubyrailways.com up and running!

I have registered to host my website at dreamhost yesterday and have to say i am quite impressed. The package contained also a free domain registration – the address was up and running in 3 hours. The registration took about 3 minutes, after which i had e-mail account, MySQL databases, ftp, WordPress, MediaWiki, Gallery – just to name the most crucial ones – up and running (yes, this was all done during the 3 minute setup). The support and the control center (the ‘panel’) is feature-rich and very user friendly. Rails is rolling on fastcgi. Kudos dreamhost! I can’t wait to deploy my first Rails application…