Rails *is* (still) a Ghetto

nice_ass.png

While I know the title is both asking for trouble (because of the now anecdotal original article with a similar title) and flamebaity, please read on – my goal is not to get some great stats but rather to know your opinion about the situation and discuss the possible solutions of the problem.

How it all started…

I would not like to re-iterate what has been said on several blogs, just to summarize: Matt Aimonetti, member of the Rails Activists, gave a presentation at GoGaRuCo which contained sexually explicit images (according to some – I am not here to judge whether that’s true, and it doesn’t matter anyway, as you’ll see in the rest of the post).

I am not really discussing whether it’s appropriate to have images of nude chicks in your presentation at a Ruby conference (I think it’s not, it’s unprofessional etc. – but that would be a matter of a different post Update: Someone summed this up in the article’s reddit thread nicely: If you’re a Rails programmer, or a Ruby programmer, and you don’t decry this sort of thing, you have no business calling yourself a professional. It doesn’t matter how large your website is, how easy it was to write, how much better it is over PHP or ASP.NET or J2EE; by definition, you do not belong to a professional community. That’s all there is to it.
It’s incumbent on every Ruby programmer to either reject this sort of misogynistic sewage, or accept that you’re never going to advance the promotion of Rails in the public perception because members of the community still think it’s edgy or cool to put pictures of strippers in their public presentations.
And here’s a hint: if your decided reaction is to talk about how unimportant this is, how much it doesn’t matter, or how much it doesn’t offend you personally, you probably don’t understand professionalism at all.
) because sadly, I think there are far bigger problems here than that – shedding light on them is the real purpose of the article, not talking about pr0n at GoGaRuCo again.

Would You Walk Into a Hindu Temple with Your Shoes on?

hindu_temple.pngI have been living in India for 2 months last summer, working on a Rails startup. Maybe I am odd or something, but I knew that I had to remove my shoes when entering a Hindu temple, and _no one had to convince me (what’s more, I didn’t even think about it for a second) wether this is the right thing to do, why is it so, whether I should do otherwise etc_. This is a similar situation – I just don’t do X when speaking at a conference, if I suspect that X makes feel even one person in the room uncomfortable, whether because of his gender, race, nationality, Ruby/Rails skills, penis size or what have you – _regardless whether I think it’s fine for me, my wife, for other members of the community and/or the majority of the room_.

The trick is, how does a *hindu* feel when I enter a temple in footwear (even if that is perfectly acceptable in my country, culture, family, friends) – it’s perfectly irrelevant how do *I* feel in the given situation. Using the previous paragraph, try to apply this to a Ruby/Rails conference.

Shit happens…

Until this point in the story, I see no problem at all, and could even agree with the guys asking “what’s wrong with you, don’t make a fuss out of nothing” – the pictures Matt used are non-problematic in my book, and he had no idea they are problematic in anyone’s book – theoretically it could have worked, but the point is, *it did not*. Some members of the Ruby community got offended, and here our story begins.

…and hits the fan

One of the real problems is that after this has been pointed out, Matt still keeps answering “As mentioned many times earlier, I don’t think my presentation is inappropriate.”. As I mentioned two paragraph above, it doesn’t matter what do you think, unless of course, you don’t care about offending some members of the community. In that case you should not try to apologize at all. However, if you are trying, reciting “I don’t think my presentation is inappropriate” will not put and end to the discussion. It just doesn’t work. Why can’t you just simply apologize, admitting that this was a bad move (because it offended some, not because porn, sexual images or whatever in presentations are bad, per se) and finish the discussion?

Rails is Still a Ghetto

However, in my opinion that’s still not the worst part of the story, or to put it differently, some members of the Rails community still found a way to make things worse, by applauding to all this:


dhh_pr0n_is_great.png

OK, you say, we are all used to DHH’s style, this is just how the guy is. That’s (kind of) cool, but I heard that most of the Rails core team (and obviously Matt himself) has the same opinion – and that’s a much more serious problem, because it means that a Rails activist, backed by DHH and other Rails core members finds all this OK, despite of the fact that numerous people in the community voiced their opinion otherwise.

This is not about being a closed-minded prude, shouting for police and suing everyone using sexually explicit images in a presentation. This is not even about women, as I have seen both males and females on either side of the fence. This is about mutual respect – I don’t agree with you, but respect your feelings. Or not, as demonstrated in this case.

So Rails continues to be the most socially unacceptable framework – associated with arrogance, elitism and whatnot in the past – now add pr0n images in presentations. Thankfully RailsConf is held in Las Vegas, and that should calm down all the people who associate Rails with all this crap :-). The real problem is that people associate you with the tools you are using – think Cobol, PHP, Java… or Rails. By being part of the Rails community people associate me with Railsy stereotypes automatically, which aren’t nice at all right now.

I hear you, dear creme-de-la-creme Rails (core) member, I know you don’t give a shit, and you think this is all prude babbling – because your hourly rate is more than some of us earn in a day, and you’ll be sought after even if Rails will have a much worse image than it has now. But 99.9% of us are not in the ‘circle of trust’ and would be happier if Rails would not be constantly associated with a ghetto.

MINASWUBN

In case you are wondering what does the acronym stand for, it’s “Matz is Nice And So We Used to Be Nice”. Unfortunately, the stuff I don’t like about the Rails community is sneaking into Ruby too, it seems, as the above case demonstrates. Besides this, the count of aggressive comments and reactions on various blog posts is really disturbing to me. Please (at least Rubyists) try to avoid being contaminated by all this shit and stop thinking you are cool because you can swear on a forum (always in anonymity). You don’t have to be a douchebag just because you are a Rubyist / Rails coder, as surprising as this might sound to some.

Conclusion

I think “incidents” like this and getting more and more antisocial members are inevitable by-products of growth in a community. The questions is, whether, and if, how, do we stop them. The problem is that it seems to me the Rails “top management” doesn’t want to stop them (what’s more, even encourages them) in the first place (please prove me otherwise – maybe I don’t see the full story – I’ll be the happiest to admit that I am talking bullshit).

I have to admit I have no clue what would be the right move – burying our heads in the sand and pretending everything is fine is not. Please leave a comment if you have an idea or anything to add.

Rails Rumble Observations, part II – trends in gem/plugin usage

rails_rumble.png
In part I I wrote about the hows and whys of gathering gem/plugin usage data based on Rails Rumble submitted user information, and in this part I would like to present my findings. So without further ado, here we go:

Prototype/jQuery

I already covered this in part I, but for completeness’ sake, here is the chart again:


prototype_jquery.png

It seems that jQuery is (not so) slowly replacing Prototype as the javascript framework of Rails – which is still better (from the Prototype POV) than with Merb, where jQuery is the default framework (oh yeah, I know, Merb is everything-agnostic etc. etc. but I think vast majority of merbists are using Datamapper, jQuery etc. (?)).

Skeleton Applications

Well… this chart is rather dull:


bort.png

One in every three teams used a skeleton application (which in this context can be replaced with ‘Bort’).
The sovereignity of Bort is a bit surprising given that it’s not the only player in the field by far – there are definitely others, like ThoughtBot’s suspenders, Blank by James Golick, starter-app by Pat Maddox, appstarter by Lattice Purple just to name a few.

I am not sure about the others, but the absence of suspenders from the chart has more to do with the fact that it was not yet publicly released before Rails Rumble – I am basing this claim on the fact that a lot of people used the gems/plugins which, combined together, are basically suspenders.

However, this doesn’t alter the fact that Bort is immensely popular – great stuff, Jim.

Testing Frameworks

I think there are (at least) 2 things to note here:

  1. Testing in Ruby/Rails is not considered optional even facing a very tight deadline. Even if we assume that the 49% didn’t test at all (which surely doesn’t sound too realistic – they probably just went with Test::Unit), more than half of the teams did!
  2. Though testing tools are a much debated topic nowadays, and the winner is not clear (yet) – I would guess, based on the above results there is roughly an 1:1:1 ration between Test::Unit, rspec and shoulda *currently* – there are definitely interesting alternatives to Test::Unit.


testins.png

Mocking


mocking.png

Not much to add here – though the above chart says nothing about how much people are using e.g. Mocha with rSpec (vs. using the rSpec built-in mocking tools), one thing is clear – as a stand-alone mocking framework, Mocha reigns supreme.

Exception Notification


ex_notification.png

Another point for ThoughtBot (not the last one in this list) – Hoptoad has no disadvantage compared to the more traditional Exception Notifier (if we don’t count getting an API-key, which takes about a minute) – on the upside, you get a beautiful and user friendly web GUI.

Full-text Search


full_text.png

I found the above chart interesting for two reasons:

  1. I thought that Ferret and/or acts_as_solr are still somewhat popular – it turns out they are not
  2. I also thought Thinking Sphinx is the de-facto fulltext search plugin, and didn’t know about Xapian – well, I learned something new again.

Uploading


uploading.png

ThoughtBot did it again – Paperclip is already more popular than the old-school attachment-fu. I am always a bit cautious when someone challenges the status quo (like Nokogiri vs.
Hpricot, Authlogic vs. Restful Authentication, attachment-fu vs. Paperclip etc.) but it seems Paperclip is ripe to take over. You can find some interesting tutorials here and here.

User Authentication

Another dull graph for you:


user_auth.png

I am wondering how homogenous this chart would be if Authlogic would have appeared earlier – it seems like a strong challenger (already watched by around 260 people on github) and I am sure it will take a nice slice of the pie in the future.

What’s more interesting is the openID support: more than one third of the apps offered openID authentication, and quite a few of them *solely* openID.

Misc

  • factory_girl was used to replace traditional fixtures in every 6th of the apps!
  • HAML/SASS is quite popular – used in about 20% of the applications
  • Hpricot was the only HTML/XML parser used (in 7 apps alltogerher)

What I am happy about the most is that there is still a lot of innovation going on in the Rails world – as you can see, newer and newer plugins/gems are appearing and in some (in fact, a lot of) cases are dethroning their good ol’ competitors. There is a lot of competition going on in almost every major area of Rails web development, and this is always a good thing.

Rails Rumble Observations, part I – jQuery on the Heels of Prototype

rails_rumble.png
As a Rails Rumble judge, I spent quite some time reviewing the applications and I noticed several patterns regarding the gems/plugins used during the 48-hour contest. The participants were asked to submit whatever tools they were using to build their app. With a few exceptions they complied, creating an interesting data set to observe the current trends in the Rails world.

Collecting the Data

Unfortunately it was not possible to gather the information automatically using screen scraping or other mechanical methods, since the input was varying from free text (stating details like ‘we used Rails, macs, TextMate, cocaine (the drink!)’) etc. to the output of _gem list_ – and everything in-between, not following any guideline (perhaps because none was given). So I hacked up a small app with a single form and harvested the info manually. I only collected data for the first 100 entries, for two reasons: the stuff used in the rest of the apps was pretty much the same, and mainly: the task was rather daunting 🙂

Why Does this Matter?

I believe that because of the rules (I mostly mean the 48-hour deadline) the findings are quite representative – I am sure that every team reached after the most productive/easy to use/effective tool they could grab since the deadline was tight. Rails Rumble is not about experimentation or showing off some new shiny toys, but lightning fast hacking aided by state-of-the-art gems and plugins so I think it’s safe to assume that the tools used here are pretty much the crème de la crème of the Ruby/Rails world.

Prototype vs. jQuery

In the first exhibit, I’d like to check out Prototype vs. jQuery usage. To prepare this chart, I took the extra mile and didn’t rely on the user-supplied data, but opened the pages by hand and checked the headers for Prototype/jQuery javascript includes. Here is what I have found:


prototype_jquery.png

1 team was using mootools, the rest of the cake is divided between Prototype and jQuery.
Most probably the real result is even more in favor of jQuery, I would guess well above 60% – all the teams that added jQuery to their application.html.erb were actually using it (why would they bother adding it otherwise), while this is not necessarily true for Prototype, which is included by default and maybe some teams didn’t even use it, just didn’t care to delete it (as you will learn in the next part, every 3rd team used bort, which includes the Prototype/script.aculo.us files by default).

This is not the first indicator of jQuery’s rising popularity in the Rails world – Hampton Catlin’s Ruby Survey found out the same (i.e. jQuery is more popular right now than Prototype). Merb is using jQuery by default.

Is Prototype Dead?

My favorite Austrian Ruby-hacker friend told me over lunch a few weeks ago: ‘Prototype is dead!’. I think this statement is questionable at the moment to say the least, since Prototype is still the default javascript framework of Rails and this is not likely to change anytime soon due to the fact that Prototype is heavily used by 37singnals (and probably entrenched into other older Rails-apps as well).
However, the trend seems to be that jQuery is spreading really fast, replacing Prototype in a lot of cases.

So be sure to check jQuery out (it’s dead easy to install and use it) – I immediately fell in love with it (maybe I was used to Hpricot-style CSS selectors too much?) and I am happily using it in my projects now.

The Next Episode

Which testing tools are used by the community? How about rails skeleton apps? OpenID support? exception-notifier or hoptoad? attachment_fu or paperclip? mocha or flexmock? factory-girl or traditional fixtures? Find out in the next installment!

acts_as_state_machine for Dummies, part I

I recently applied this great plugin a few times to tackle different tasks and would like to share with you the joys of thinking in state machines!

Disclaimer: This installment is ‘just’ an intro to the topic (a teaser if you like), it doesn’t contain actual instructions or code on how to use acts_as_state_machine – that comes in part II. Though the original intent was to write up a small tutorial with some example code, I started with an intro and the article grew so long that I decided to split it up – so stay tuned for part deux!

State what…?!?

A finite state machine (FSM for short) a.k.a. finite state automaton (FSA) is a 5-tuple (Σ,S,s0,δ,F)… OK, just kidding. I doubt too much people are interested in the rigorous definition of the FSM (for the rest, here it is), so let’s see a more down-to earth description.

According to wikipedia, Finite state machine is “_a model of behavior composed of a finite number of states, transitions between those states, and actions_”. Somewhat better than those gammas and sigmas and stuff but if you are not the abstract thinker type, it might take some time to wrap your brain around it. I believe a good example can help here!

Everybody knows and loves regular expressions – but probably it’s not that wide known fact that regular expression matching can be solved with an FSM (and in fact, a lot of implementations are using some kind of FSM on steroids). So let’s see a simple example. Suppose we would like to match a string against the following, simple regexp:

ab+(a|c)b*

First we have to construct the FSM, which will be fed with the string we would like to match. An FSM for the above regular expression might look like this:

fsm_correct.png

String matching against this FSM is basically answering the question ‘starting from the initial state, can we reach the finish after feeding the FSM the whole string?’. Pretty easy – the only thing we have to define is ‘feeding’.

So let’s take the string ‘abbbcbbb’ as an example and feed the FSM! The process looks like this:

  1. we are in q0, the initial state (where the ‘start’ label is). Starting to consume the string
  2. we receive the first character, it’s an ‘a‘. We have an ‘a‘ arrow to state q1, so we make a transition there
  3. we receive the next character, ‘b‘. We have two b-arrows: to q1 and q2. We choose to go to q1 (in fact, staying in q1) – remember, the question is whether we _can_ reach the finish, not whether all roads lead to the finish – so the choice is ours!
  4. identical to the above
  5. after the two above steps, we are still in q1. We still get a ‘b‘ but this time we decide to move to q2.
  6. we are in q2 and the input is ‘c‘. We have no analysis-paralysis here since the only thing we can do is to move to q4 – so let’s do that!
  7. Whoa! We reached the finish line! (q4 is one of the terminal states). However, we didn’t consume the whole string yet, so we can’t yet tell whether the regexp matches or not.
  8. So we eat the rest of the string (the ‘how’ is left as an exercise to the reader) and return ‘match!’

Let’s see a very simple non-matching example on the string ‘abac’

  1. in q0
  2. got an ‘a‘, move to q1
  3. in q1, got a ‘b‘, move to q2
  4. in q2, got an ‘a‘, move to q3 – we reached the finish, but still have a character to consume
  5. in q3, got a ‘c‘… oops. We have no ‘c’ arrow from q3 so we are stuck. return ‘no match!’

Of course the real-life scenarios are much more complicated than the above one and sometimes FSMs are not enough (for example to my knowledge it’s not possible to tell about a number whether it is prime or not with a vanilla FSM – but a regexp doing just that has been floating around some time ago) but to illustrate the concept this example served fine.

This is cool and all but why should I care?!?

Well, yeah, you are obviously not going to model an FSM the next time you would like to match a regexp – that would be wheel-reinvention at it’s finest. However there are some practical scenarios where an FSM can come handy:

  • sometimes the logic flow is just too complicated to model – an if-forrest is rarely a good solution (on the flip side, don’t model an if-else with an FSM :-))
  • encapsulate complex logic flow into a pattern and not clutter your code with it.
  • you are in a stateless world – for example HTTP
  • asynchronous and/or distributed processing where you explicitly need to maintain your state and act upon it

Some real life examples of FSM usage in the Ruby/Rails world are why the lucky stiff’s Hpricot (using Ragel) or Rick Olson’s restful authentication plugin (using acts_as_state_machine)

The Next Episode

In the next installment I’d like to focus on the practical usage of the acts_as_state_machine plugin – I’ll attempt to create an asynchronous messaging system in a Rails app using it.

Today Midnight

I have been always uncertain about the exact expression denoting today midnight (or any day midnight, for that matter). Is 00:00 on e.g. April 24th the midnight between 23rd and 24th or 24th and 25th? If I want something to happen at today midnight, is that today’s date at 00:00? (for the impatient: no, it isn’t :-)).

Chronic to the rescue! (If you don’t know chronic, be sure to check it out – it’s a great natural language date/time parser). All I had to do is:

>> Chronic.parse('today midnight')
=> Fri Apr 25 00:00:00 +0200 2008

so actually it turns out it’s _tomorrow’s_ date at 00:00.

I couldn’t find time zone support though (I am not saying it’s not there, just that I couldn’t find it by looking at the API) – so what if I want to meet someone in Madrid today midnight? Why, I install the tzinfo gem and ask Ruby!

>> TzinfoTimezone["Madrid"].utc_to_local(Chronic.parse('today midnight').getutc)
=> Fri Apr 25 00:00:00 UTC 2008

Random Links from the Web, 21-04-2008

The Top 10 Ruby/Rails Blogs

ubuntu
In my quest to whip my feed reader’s Ruby/Rails related content into shape a bit, I made a little research to find out which Ruby/Rails blogs are the most popular at the moment. I had given up on following most of the blogs systematically a long time ago – it is becoming increasingly hard to keep track of even the aggregators, not to talk about the blogs themselves. There are hundreds of Ruby/Rails blogs out there right now (I am talking about the ones found on the few most popular aggregators – in reality there must be much more of them), so it is clear that you need to pick carefully – unless you happen to be a well-paid, full time Ruby/Rails blog reader (in which case you still would have to crank a lot to do your work properly).

OK, enough nonsense for today – let’s see the results counting down from the 10th place! If you are interested in the method they were created with, or a longer, top 30 list from technorati and alexa, check out this blog entry.

10. http://weblog.jamisbuck.org/ by Jamis Buck.

jamisbuck

Jamis Buck “is a software developer who has the good fortune to be both employed by 37signals and counted among those who maintain the Ruby on Rails web framework”. He is mostly blogging about (surprise, surprise!) Rails – of course on a very high level, which could be expected from a Rails core developer. Very insightful posts on ActiveRecord, Capistrano and other essential Rails topics delivered in a professional way.

9. http://weblog.rubyonrails.org by the Rails core team

weblog_rubyonrails

This is the “default” Ruby on Rails blog, used for announcements, sightings, manuals and whatever else the RoR team finds interesting :-).

8. http://www.slash7.com by Amy Hoy.

slash7

This is a really cool little site – Amy is a very gifted writer and designer, publishing very insightful articles as well as the nicest (hands down!) cheat sheets about different Web2.0, Ajax, Rails and that sort of stuff. Definitely worth checking out!

7. http://errtheblog.com by PJ Hyett and Chris Wanstrath.

err_the_blog

A very serious blog of two Rails-geeks about advanced topics (but very well explained – so if you are not totally green (#00FF00) you should do fine). Among other things, they have contributed Sexy Migrations to Rails recently.

6. http://nubyonrails.com/ by Geoffrey Grosenbach

nubyonrails

Geoffrey is the author of more than twenty of Rails plugins, (including gruff, my favorite graph drawing gem), a horde of professional-quality articles and the PeepCode screencast site. Do I need to say more?!

5. http://redhanded.hobix.com/ by _why the lucky stiff.

redhanded

_why is probably the most interesting guy in the Ruby community. He is the author of (among tons of other things) Why’s Poignant Guide to Ruby, HPricot, the coolest Ruby HTML parser, Try Ruby! (a must see!) and Hackety Hack, for aspiring wannabe programmers who want to hack like in the movies! The list goes on and on… This guy never stops. If someone will ever invent the perpetuum mobile, he will be it (in Ruby, of course).

4.http://hivelogic.com/ by Dan Benjamin.

hivelogic

Dan’s recent work include Cork’d, a web2.0 wine community site or the A List Apart publishing system. He does great podcasts with various guys.

3. http://mephistoblog.com/ by Rick Olson and Justin Palmer

mephisto

Personally I was quite surprised that a blog concentrating on such a narrow topic (in this case the mephisto blogging system) could grab the 3rd place – so I have checked both alexa and technorati by hand just to be sure, and it seems that everything is OK – mephistoblog is ranked very high on both of them, justifying this position. After all, mephisto is the leading blog system of Rails!

2. http://www.rubyinside.com/ by Peter Cooper.

rubyinside

This blog is my absolute favorite from this top 10 list (actually, from all the Ruby blogs I have encountered so far). I am definitely with Amy Hoy, who said If you had to subscribe to just one Ruby blog, it should be this one. If you would like to know what’s happening in the Ruby/Rails community, rubyinside is the place to check. If there is no new post here, it’s because most probably nothing happened!

And the winner is: http://www.loudthinking.com/ by David Heinemeier Hansson.

loudthinking

Well, what should I add? David is the author of Ruby on Rails, so no wonder his blog topped the list!


Conclusion
It’s interesting to note that nearly all the blogs listed here are mostly pure Rails ones – rubyinside (mixed Ruby/Rails) and redhanded (pure Ruby) being the two exceptions. It would be interesting to generate such a list for Ruby blogs – though I am not sure how. The sources I have used (most notably rubycorner) aggregate both Ruby and Rails blogs) – so it seems there are much more Rails bloggers out there (or they are much better (with the exception of _why) than the Ruby bloggers).

I would really like to hear your opinion on this little experiment – whether you think it makes sense or it is completely off, how could it be improved in the future, what features could be added etc. If I’ll receive some positive feedback, I think I will work on the algorithm a bit more, and run it once in say every 3 months to see what’s happening around the Ruby/Rails blogosphere. Let me know what do you think!

If one is thinking about creating a site for affiliate marketing to earn some extra cash they should thoroughly research a few things. To start with look for a cheap company that sell domains for your domain name registration. Next get a cheap, yet reliable web hosting company to host your site on. These can be easily distinguished as they hire many cisco certified professionals. The generally carry 642-371 certifications. Then look into online backup for your files to avoid data loss. More over perform directory submission to get indexed in the search engines. Getting a+ certificate yourself is not a bad idea since it will help you get through the process with much ease.

Book Review: Build Your Own Ruby on Rails Applications

Build Your Own Ruby on Rails Applications
Author: Patrik Lenz
Publisher: SitePoint
Pages: 447
Intended Audience: Beginners/Pre-intermediate
Rating: 5/5

I would like to begin with a few words about SitePoint. According to their definition, ‘SitePoint specializes in publishing fun, practical, and easy-to-understand content for web professionals.’. So far I had the pleasure to read three of their books: (obviously) Build Your Own Ruby on Rails Web Applications, The CSS Anthology: 101 Essential Tips, Tricks & Hacks and The Principles of Beautiful Web Design. If I had to judge the publisher based on these three books, I could not agree more: I have found all their claims (fun, practical and easy-to-understand) to be unquestionably true.

After a brief overview of the book I would like to concentrate on the question that popped up in most of you I guess: Why should I prefer this book over Agile Web Development with Rails or other Rails books available? We’ll look into that in a minute, but first things first: let’s see what has Build Your Own Ruby on Rails Web Applications to offer!

The book starts off with installing Ruby, RubyGems, Rails and even MySQL on different operating systems, presented in painstaking detail – which is very good in my opinion, since advanced users will skip this section anyway, and it offers great step-by-step walkthrough for novices.

The second chapter is the compulsory ‘introduction to Ruby’. I have to admit I did not read it – but judging from the contents and a quick skim-through, it offers at least the same knowledge as the other similar Rails-books, which is more than enough to get you started. If you would like to go deeper into both Ruby and Rails, I suggest to check out David A. Black’s excellent Ruby for Rails.

Chapter 4, ‘Rails Revealed’ is the only more-or-less theoretical chapter, discussing the architecture, components and conventions used in Ruby on Rails.

The real action starts from Chapter 5 in the form of building a digg-clone from scratch. You will learn how to build a Rails application, beginning with generating the necessary files and ending up with a nicely working, (relatively) feature rich digg-like site, dealing with user management (even showing an user view with submitted stories), allowing you to submit and vote on stories (just as you would expect from an application like this), sprinkled with a lot of tasty tidbits like tagging (also introducing polymorphic associations in a very easy-to-understand way) or (of course) AJAX.

The book finishes with some advanced topics: Debugging, Testing and Benchmarking, followed by Deploying and Production use, providing instructions to deploy your application on Apache with Mongrel.

If the review would end right now, you could (rightfully) ask: ‘So what? These are exactly the things I would expect from a Rails book’ – and you would be perfectly right. So let’s see why is this book different from all the other ones available on the market!

First of all, it is written in a very understandable and easy-to-digest way: it explains everything as simply as possible, making even the more complicated topics clear right away. I don’t remember reading anything twice, no matter how advanced the topic was. I think this alone makes Build Your Own Ruby on Rails Web Applications one of the best hands-on RoR books today (definitively the best one I have seen so far, but since I did not read all the competitors, I can not unambiguously claim this is the best one).

What I also like about this book is that it does not require nearly any preliminaries at all – the bare minimum that is needed is explained on the side during the application creation, or can be learned from the book.

A big difference compared to Agile Web Development with Rails – which is the de facto Rails book today – is that testing of the created components is described in great detail. The usual workflow is thus problem statement, solution and creating unit tests to verify the code – explaining the why’s and how’s as well. I am not aware of any RoR book currently available that would explain and demonstrate testing your code to this extent.

One could argue that Build Your Own Ruby on Rails Web Applications is not deep enough, which is more-or-less true (compared to e.g. Agile Web Development with Rails) – but I think this is perfectly fine, since going too deep is not the purpose of the book at all! If you need in-depth coverage of Rails internals, would like to go into advanced topics like caching, scaling or deployment in a great detail then this is not the book to get. However, if you would like to try Ruby on Rails right away, without the need to google for blogs helping you to install the preliminaries or get this and that right, be sure to check it out!

Ruby’s Growth Comes to an End?!

According to O’Reilly’s latest report on the state of the computer book market focusing on programming books, Ruby has the definitive lead. Check out this treemap view – I believe it does not need too much additional explanation (The percentages reflect the relative book sale compared to 2006/Q1):

Now, I would not like to start a language war here at all – there is neither a need to draw zealous consequences from the Ruby camp nor to come up with explanation from proponents of other languages. The diagram shows that compared to the same period of 2006, there is the biggest demand for Ruby (and other Ruby-based/related) books currently – and nothing more. It does not tell anything about the number of people using the given language or related frameworks, job opportunities or the absolute market share – this is just a relative indicator based on the programming book market.

However, if you take a peek at the TIOBE index for May – entitled ‘Ruby’s growth comes to an end’ – you can see that Ruby is the fastest growing language at the moment (again, compared to the same period of 2006). If this is the ‘end of the growth’, then how does the growth look like?!

It is also interesting to check out this graph from TIOBE:

It tells me that starting from July 2006, none of the programming languages shows so big (and steady) growth than Ruby.

I don’t know based on what did the TIOBE guys come to the conclusion that Ruby is losing steam… I have talked to a few Ruby on Rails freelancers recently, and each of them confirmed independently that there is a bigger need for Ruby/Rails programmers than ever. Based on (not only) these data I would say quite the opposite is true: my personal feeling is that Ruby/Rails is just going to be a *lot* bigger than it is currently!

Partitioning Sets in Ruby

During hacking on various tasks, I needed to partition a set of elements quite a few times. I have attacked the problem with different homegrown implementations, mostly involving select-ing every element belonging into the same basket in turn. Fortunately I run across divide recently, which does exactly this… No more wheel reinvention! Let’s see a concrete example.

I have an input file like this:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

The goal is to sum up all the numbers in rows beginning with the same character (e.g. to sum up all the numbers that are in a row beginning with ‘a’). The result should look like:

[{"a"=>241}, {"b"=>246}, {"c"=>145}]

This is an ideal task for divide! Let’s see one possible solution for the problem:

require 'set'

input = Set.new open('input.txt').readlines.map{|e| e.chomp}
groups = input.divide {|x,y| x.map[0][0] == y.map[0][0] }
#build the array of hashes
p groups.map.inject([]) {|a,g|
   #build the hashes for the number sequences with same letters
    a << g.map.inject(Hash.new(0)) {|h,v|
    #for every sequence, sum the numbers it contains
    h[v[0..0]] += v[2..-1].split(' ').inject(0) {|c,x|
      c+=x.to_i; c}; h
  }; a
}

The output is:

[{"a"=>241}, {"b"=>246}, {"c"=>145}]

Great - it works! Now let's take a look into the code...

The 3rd line loads the lines into a set like this:


The real thing happens on line 4. After it's execution, groups looks like:

, , }>

As you can see, the set is correctly partitioned now - with almost no effort! We did not even need to require an external library...
The rest of the code is out of the scope of this article (everybody is always complaining about the long articles here, so I am trying to keep them short) - and anyway, the remaining snippet is just a bunch of calls to inject. If inject does not feel too natural to you, don't worry - it took me months until I got used to it, and some people (despite of the fact that they fully understand and are able to use it) never reach after it - I guess it's a matter of taste...'

Getting Beast up and Running on Dreamhost (for the Truly Lazy)

Though dreamhost offers phpBB as one of their one-click install goodies (ergo it is the easiest to install of all forums since you almost don’t have to do anything), I have been looking for something different. To me, phpBB’s interface was always quite unintuitive and too heavy – I wanted something smaller, easier, more compact. The problem was I did not know what should I search for – until I came across beast, a lightweight forum written in Ruby on Rails. It was love at the first sight!

When it comes to tools I am using, I am really language agnostic – this very blog uses WordPress (PHP), I am using Trac (Python) to track my projects, mediaWiki (PHP) is my preferred wiki etc – so even if it may seem so, I did not choose beast because it is written in Rails (although +1 for that :-)), but because of the design and ease of use. My first thought after trying it was ‘wow, this is as easy to use as a 37signals app’ – it’s really that intuitive and well designed!

Well, this sounds fine and all, but installation on dreamhost was a different story. Thanks God I have found a superb, step by step HOWTO here. However, even after following all the steps, I got ‘incomplete headers’ and other problems, which I have managed to fix – here are some additional comments to the HOWTO:

6. You can forget about this point; as the HOWTO says, it is already installed on DH and it will work without any problems.
7. Forget about ‘development’ and ‘test’, however be sure to get ‘production’ right, as the next step will not work otherwise. It should look something like this:

production:
  adapter: mysql
  database: beast_prod
  host: mysql.myhost.com
  username: us3r
  password: p4ss
  port: 3306

8. For me it worked only *with* the RAILS_ENV=production parameter specified.
9. You can change the salt to anything – it just must not stay the same. The easiest thing is to add or remove a random character from the string.
12. The shebang should be updated to #!/usr/bin/ruby
13. The || should be removed, i.e. it should read:

ENV[‘RAILS_ENV’] = ‘production’

14. Make sure you change the permission of those directories only – I have changed everything recursively, destroying the executable flag of dispatch.fcgi :-).

Now you should apply the ‘GetText patch’ – it can be found later in the thread. After you should be up and running!

After playing around, I have found that the user listing is not working – fortunately I have found this as well in the forum. The solution is:
app/views/users/index.rhtml line 3 should be modified to

%lt;% form_tag '', :method => 'get' do -%>

Enjoy this great forum!

Data Extraction for Web 2.0: Screen Scraping in Ruby/Rails, Episode 1

This article is a follow-up to the quite popular first part on web scraping – well, sort of. The relation is closer to that between Star Wars I and IV – i.e., in chronological order, the 4th comes first. To continue the analogy, probably I am in the same shoes as George Lucas was after creating the original trilogy : the series became immensely popular and there was demand for more – in both quantity and depth.

After I have realized – not exclusively, but also – through the success of the first artcile that there is need for this sort of stuff, I begun to work on the second part. As stated at the end of the previous installment, I wanted to create a demo web scraping application to show some advanced concepts. However, I left out a major coefficient from my future-plan-equation: the power of Ruby.

Basically this web scraping code was my first serious Ruby program: I came to know Ruby just a few weeks earlier, and I have decided to try it out on some real-life problem. After hacking on this app for a few weeks, suddenly a reusable web scraping toolkit – scRUBYt! – begun to materialize which caused a total change of the plan: instead of writing a follow-up, I decided to finish the toolkit and sketch a big picture of the topic as well as placing scRUBYt! inside this frame and illustrating the theoretical things with it described here.

The Big Picture: Web Information Acquisition

The whole art of systematically getting information from the Web is called ‘Web information acquisition’ in the literature. The process consists of 4 parts (see the illustration), which are executed in this order: Information Retrieval (IR), Information Extraction(IE), Information Integration (II) and Information Delivery (ID).

Information Retrieval

Navigate to and download the input documents which are the subject of the next steps. This is probably the most
intuitive step to make – clearly, the information acquisition system has to be pointed to the document which contains the data first, before it can perform the actual extraction.

The absolute majority of the information on the Web resides in the so-called deep web – backend databases and different legacy data stores which are not contained in static web documents. This data is accessible via interaction with web pages (which serve as a frontend to these databases) – by filling and submitting forms, clicking links, stepping through wizards etc. A typical example could be an airpot web page: an airport has all the schedules of the flights they offer in their databases, yet you can access this information only on the fly by submitting a form containing your concrete request.

The opposite of the deep web is the surface web – static pages with a ‘constant’ URL, like the very page you are reading. In such a case, the information retrieval step consist of just downloading the URL. Not a really tough task.

However, as I said two paragraphs earlier, most of the information is stored in the deep web – different actions, like filling input fields, setting checkboxes and radio buttons, clicking links etc. are needed to get to the actual page of interest which can be then downloaded as the result of navigation.

Besides that this is not trivial to do automatically from a programming language just because of the nature of the task, there are a lot of pitfalls along the way, stemming from the fact that the HTTP protocol is stateless: the information provided to a request is lost when making the next request. To remedy this problem, sessions, cookies, authorizations, navigation history and other mechanisms were introduced – so a decent information retrieval module has to take care about these as well.

Fortunately, in Ruby there are packages which are offering exactly this functionality. Probably the most well-known is WWW::Mechanize which is able to automatically navigate through Web pages as a result of interaction (filling forms etc.) while keeping cookies, automatically following redirects and simulating everything else what a real user (or the browser in response to that) would do. Mechanize is awesome – from my perspective it has one major flaw: you can not interact with JavaScript websites. Hopefully this feature will be added soon.

Until that happy day, if someone wants to navigate through JS powered pages, there is a solution: (Fire)Watir. Watir is capable to do similar things as Mechanize (I never did a head-to-head comparison, though it would be interesting) with the added benefit of JavaScript handling.

scRUBYt! comes with a navigation module, which is built upon Mechanize. In the future releases I am planning to add FireWatir, too (just because of the JavaScript issue). scRUBYt! is basically a DSL for web scraping with lot of heavy lifting behind the scenes. Through the real power lies the extraction module, there are some goodies here at the navigation module, too. Let’s see an example!

Goal: Go to amazon.com. Type ‘Ruby’ into the search text field. To narrow down the results, click ‘Books’, then for further narrowing ‘Computers & Internet’ in the left sidebar.

Realization:

  fetch           'http://www.amazon.com/'
  fill_textfield  'field-keywords', 'ruby'
  submit
  click_link      'Books'
  click_link      'Computers & Internet'

Result: This document.

As you can see, scRUBYt’s DSL hides all the implementation details, making the description of the navigation as easy as possible. The result of the above few lines is a document – which is automatically fed into the scraping module, but this is already the topic of the next section.

Information Extraction

I think there is no need to write about why does one need to extract information from the Web today – the ‘how’ is a much more interesting question.

Why is Web extraction such a tedious task? Because the data of interest is stored in HTML documents (after navigating to them, that is), mixed with other stuff like formatting elements, scripts or comments. Because the data is missing any semantic description, a machine has no idea what a web shop record is or how a news article might look like – it just perceives the whole document as a soup of tags and text.

Querying objects in systems which are formally defined and thus understandable for a machine is easy: For instance, if I want to get the first element of an array in Ruby, One can do it easily like this:

my_array.first

Another example for a machine-queryable structure could be an SQL table: to pull out the elements matching the given criteria, all that needs to be done is to execute an SQL query like this:

SELECT name FROM students WHERE age > 25

Now, try to do similar queries for a Web page. For example, suppose that you already navigated to an ebay page by searching for the term ‘Notebook’. Say you would like to execute the following query: ‘give me all the records with price lower than $400’ (and get the results into a data structure of course – not rendered inside your browser, since that works naturally without any problems).

The query was definitely an easy one, yet without implementing a custom script extracting the needed information and saving it to a data structure (or using stuff like scRUBYt! – which does exactly this instead of you) you have no chance to get this information from the source code.

There are ongoing efforts to change this situation – most notably the semantic Web, common ontologies, different Web2.0 technologies like taxonomies, folksonomies, microformats or tagging. The goal of these techniques is to make the documents understandable for machines to eliminate the problems stated above. While there are some promising results in this area already, there is a long way to go until the whole Web will be such a friendly place – my guess is that this will happen around Web88.0 in the optimistic case.

However, at the moment we are only at version 2.0 (at most), so if we would like to scrape a web page for whatever reason *today*, we need to cope with the difficulties we are facing. I wrote an overview on how to do this with the tools available in Ruby (update: there is a new kid on the block – HPricot – which is not mentioned there).

The rough idea of those packages is to parse the Web page source into some meaningful structure (usually a tree) then provide a querying mechanism (like XPaths, CSS selectors or some other tree navigation model). You could think now: ‘A-ha! So actually a web page *can* be turned into something meaningful for machines, and there *is* a formal model to query this structure – so where is the problem described in the previous paragraphs? You just write queries like you would in a case of a database, evaluate them against the tree or whatever and you are done’.

The problem is that the machine’s understanding of the page and human thinking about querying this information are entirely different, and there is no formal model (yet) to eliminate this discrepancy. Humans want to scrape ‘websop records with Canon cameras with maximal price $1000’, while the machine sees this as ‘the third <td> tag inside the eight <tr> tag inside the fifth <table> … (lot of other tags) inside the <body>> tag inside the <html> tag, where the text of the seventh <td> tag contains the string ‘Canon’ and the text of the ninth <td>, is not bigger than 1000 (to even get the value 1000 you have to use a regular expression or something to get rid of the most probably present currency symbol and other possible additional information).

So why is this so easy with a database? Because the data stored in there has a formal model (specified by the CREATE TABLE keyword). Both you and the computer know *exactly* how a Student or a Camera looks like, and both of you are speaking the same language (most probably an SQL dialect).

This is totally different in the case of a Web page. A web shop record, a camera detail page or a news item can look just anyhow and your only chance to find out for the concrete Web page of interest is to exploit it’s structure. This is a very tedious task on it’s own (as I have said earlier, a Web page is a mess of real data, formatting, scripts, stylesheet information…). Moreover there are further problems: for example, a web shop record must not be uniform even inside the same page – certain records can miss some cells which others have, may containt the information on a detail page, while others not and vice versa – so in some cases, identifying a data model is impossible or very complicated – and I did not even talk about scraping the records yet!

So what could be the solution?

Intuitively, there is a need for an interpreter which understands the human query and translates it to XPaths (or any querying mechanism a machine understands). This is more or less what scRUBYt! does. Let me explain how – it will be the easiest through a concrete example.

Suppose you would like to monitor stock information on finance.yahoo.com! This is how I would do it with scRUBYt!:

#Navigate to the page
fetch ‘http://finance.yahoo.com/’

#Grab the data!
stockinfo do
symbol ‘Dow’
value ‘31.16’
end

output:

  <root>
    <stockinfo>
      <symbol>Dow</symbol>
      <value>31.16</value>
    </stockinfo>
    <stockinfo>
      <symbol>Nasdaq</symbol>
      <value>4.95</value>
    </stockinfo>
    <stockinfo>
      <symbol>S&P 500</symbol>
      <value>2.89</value>
    </stockinfo>
    <stockinfo>
      <symbol>10-Yr Bond</symbol>
      <value>0.0100</value>
    </stockinfo>
  </root>

Explanation: I think the navigation step does not require any further explanation – we fetched the page of interest and fed it into the scraping module.

The scraping part is more interesting at the moment. Two things happened here: we have defined a hierarchical structure of the output data (like we would define an object – we are scraping StockInfos which have Symbol and Value fields, or children), and showed scRUBYt! what to look for on the page in order to fill the defined structure with relevant data.

How did I know I had to specify ‘Dow’ and ‘31.16’ to get these nice results? Well, by manually pointing my browser to ‘http://finance.yahoo.com/’, and observing an example of the stuff I wanted to scrape – and leave the rest to scRUBYt!. What actually happens under the hood is that scRUBYt! finds the XPath of these examples, figures out how to extract the similar ones and arranges the data nicely into a result XML (well, there is much more going on, but this is the rough idea). If anyone is interested, I can explain this in a further post.

You could think now ‘O.K., this is very nice and all, but you have been talking about *monitoring* and I don’t really see how – the value 31.16 will change sooner or later and then you have to go to the page and re-specify the example again – I would not call this monitoring’.

Great observation. It’s true scRUBYt! would not be of much use if the situation of changing examples would not be handled (unless you would like to get the data only once, that is) – fortunately, the situation is dealt with in a powerful way!

Once you run the extractor and you think the data it scrapes is correct, you can export it. Let’s see how the exported finances.yahoo.com extractor looks like:

#Navigate to the page
fetch ‘http://finance.yahoo.com/’

#Construct the wrapper
stockinfo “/html/body/div/div/div/div/div/div/table/tbody/tr” do
symbol “/td[1]/a[1]”
value “/td[3]/span[1]/b[1]”
end

As you can see, there are no concrete examples any more – the system generalized the information and now you can use this extractor to scrape the information automatically whenever – until the moment the guys at yahoo change the structure of the page – which fortunately not happening every other day. In this case the extractor should be regenerated with up-to date examples (in the future I am planning to add automatic regeneration in such cases) and the fun can begin from the start once again.

This example just scratched the surface of what scRUBYt is capable of – there are tons of advanced stuff to fine-tune the scraping process and get the data you need. If you are interested, check out http://scrubyt.org for more information!

Conclusion

The first two steps of information acquisition (retrieval and extraction) are dealing with the question ‘How to get the data I am interested in (querying)’. Up to the present version (0.2.0) scRUBYt! contains just these two steps – however, to do even these properly, I will need a lot of testing, feedback, bug fixing, stabilization, adding heaps of new features and enhancements – because as you have seen, web scraping is not a straightforward thing to do at all.

The last two steps (integration and delivery) are addressing the question ‘what to do with the data once it is collected, and how to do that (orchestration)’. These facets will be covered in a next installment – most probably when scRUBYt! will contain these features as well.

If you liked this article and you are interested in web scraping in practice, be sure to install scRUBYt! and check out the community page for further instructions – the site is just taking off, so there is not too much yet – but hopefully enough to get you started. I am counting on your feedback, suggestions, bug reports, extractors you have created etc. to enhance both scrubyt.org and scRUBYt! user experience in general. Be sure to share your experience and opinion!

To launch a tutorial site is comparatively much easier today than it was a few years ago. You can easily buy domain name at a very low cost and do domain parking until your site is ready. Get a good business hosting package from one of the many providers listed on the internet, go for a company which hires people with cisco certifications such as 642-143. Create a professional web design with the help of adobe. Get online training that can guide you through the site’s development. Use your laptop wireless internet connection to upload from anywhere conveniently.

Book Review: Ruby Cookbook

Since I am relatively new to Ruby, I have no idea how life could have been in the dark ages of the non-Japanese-speaking Ruby community (1995 – 2000), when there was no English Ruby book on the market. The ice was broken by Andy Hunt and Dave Thomas with a pickaxe – err… actually the Pickaxe (a.k.a. “Programming Ruby”), which has undoubtedly become an all-famous Ruby-classic since then.

In the foreword, Matz, the author of Ruby, explains that since he is much better in coding than in documentation writing, probably the authors did not have an easy job – what they could not find in the (rather scant) documentation, had to figure out directly from the Ruby source code.

The Ruby book scene looks radically different today. In fact we are facing the opposite problem: there are so much books on Ruby that sometimes it can be hard to choose which ones to read and in which order. Probably it won’t be any easier to find the answers for these questions in the future: judging from the blogs and announcements, the bigger part of the books is yet to come. If you are new to Ruby you will most probably have a hard time figure out how to spend your money wisely [1] – so what’s the solution?

Of course there is no definitive answer for this question – I can only tell you what worked for me.

First I would definitely recommend David A. Black’s Ruby for Rails [2]. It is absolutely suited for newcomers (and for advanced hackers, too), no matter if you are new to Ruby and/or coming from a different programming language [3]. I was also a Python enthusiast (through doing most of my everyday work in Java) when I have discovered Ruby – and David’s book was a perfect choice to switch very fast.

Currently I am undecided between the 2nd and the 3rd place, so let’s say you should check them out in parallel – They are (of course) the pickaxe and Hal Fulton’s “The Ruby Way”. They are both time-tested Ruby classics, hence a must read. However, if you have time and/or money to read only one of the above books, in my opinion it should be “Ruby for Rails”.

Although these three masterpieces are – in my opinion – among the most well-written and informative tech books available today, you have to remember the good old rule: No matter how much books you read or how good they are – you will never become a true Ruby hacker until you actually begin to use the acquired knowledge and put it into practice.

After reading these books I wanted to jump into writing some cool stuff – Ruby seemed to be so elegant, easy, succinct – and to my greatest surprise, I could not write too much sensible code 🙂 (at least not without referring to these books and/or google and/or ruby-talk more frequently that I considered o.k. to call it programming on my own).

This is exactly the situation where the Ruby Cookbook should enter the scene. The first three books give you a hint
about *what* can be done with Ruby[4]. The cookbook offers you well organized content in forms of recipes to show you *how* it can be done elegantly, quickly and effectively in a ruby-esque way.

Probably the most frequent answer to the question ‘How should I improve my Ruby skills’ on the ruby-talk mailing list sounds: By starting your own project. Since I put this advice into practice myself and it worked for me, I have to agree: armed with the goodies from Learning Ruby, The Pickaxe and the Ruby way, the best thing to do is to grab a copy of the Cookbook and jump into your own project. When I started my one, a web extraction framework, I had no idea about documenting Ruby code, packaging the whole program into a gem, logging, writing unit tests (in Ruby) and automatizing these tasks (and a lot of other things – this post would be considerably longer if I would like to state everything). However, with the Ruby Cookbook by my side, learning and putting things into practice from writing the first line until packaging the whole framework into a gem was a piece of cake.

If you are unfamiliar with the O’Reilly cookbook series format, it is a set of ‘recipes’ (problem statement, solution, discussion) divided into categories (like Strings, Arrays, Hashes… in this case) for easy lookup of the problem at hand. While it would be possible and certainly edifying to read the book cover to cover from the start (in this case you should also consider that it has 873 pages), I found that it really shines when you are stuck with a problem: you search for the relevant category and the relevant problem, apply the solution, read the discussion to understand what’s going on under the hood, rinse, repeat and after the 3rd or so cycle you will find out that you are not reaching for the book anymore (at least not because of this problem).

OK, time to take a more detailed look at the content.

I would divide the book into five categories: Essentials, Ruby Specific Constructs, Advanced Techniques, Internet and networking and Software Management/Distribution. I will review them one by one briefly.

  • Essentials include Strings, Numbers, Arrays, Hashes, Date and Time, Files and Directories. For a beginner Ruby journeyman, these chapters are a real gold mine. Though the cookbook is not really intended for total beginners (it assumes a fair amount of Ruby knowledge), it certainly would not be impossible for a skilled (non-Ruby) programmer to understand most of the recipes since they are going from simple to complicated (e.g. the String chapter begins with concatenating strings and closes
    with showing off text classification with a Bayesian classificator).

    In this category I have probably learned the most Ruby best-practices from the chapters Arrays and Hashes [5]. As a constant lurker on the ruby-talk mailing list, I have had some hard time figuring out all those inject()s and collect()s and each_slice()s and each_cons()s and other enumerator/iterator things – when I have thought I already understood them, somebody came with an even more complicated example and I was not so sure once again – until the moment I bought the book, that is.

    The cookbook is very good at eliminating these vague and wobbly things like I had: you will not only understand what’s going on, but actually get comfortable using the idioms so typical for Ruby. That’s so great about it.

  • Ruby Specific Constructs featuring Objects and Classes, Modules and Namespaces, and Reflection and Metaprogramming. Every newcomer to Ruby encounters the wonders that (not exclusively but most characteristically) make the language so beautiful: code blocks, closures, mixins, the vast possibilities offered by metaprogramming and reflection just to mention some of them. This chapter is written exactly to examine and discuss these constructs.

    While probably I learned the most new things from this section, I have to say that I have been missing a meta-level here: The chapters (especially about metaprogramming) presented a lot of fancy LEGO bricks but did not show how to build a Statue of Liberty or Eiffel tower out of them (well, not even a simple medieval castle in my opinion :-). Of course this does not need to be a problem – metaprogramming techniques should have a book on their own, and anyway a cookbook is not intended to solve concrete problems but rather reoccurring/frequent ones. Probably I am just too curious about the ways of the meta :-).

    To sum it up, this and the previous section (Essentials) together helped to beef up my rubyish programming style by an enormous magnitude in the practice – nearly all information you need is there in the other books as well, but reading them does not make you comfortable with these techniques.

  • Advanced Techniques include XML and HTML, Graphics and Other File Formats, Databases and Persistence, Multitasking and Multithreading, User Interface, Extending Ruby with Other Languages, and System Administration. I was kind of unsure about this category – pairing UI with databases or system administration for example seemed odd for the first glance – but since I did not want to create even more categories, I have decided to put everything here which did not fit into the other ones, thus it can be viewed as a ‘miscellaneous’ section as well.

    I would like to review two chapters here – HTML/XML and Databases and Persistence since these are the closest to my field of expertise and I also believe these two were the most deep in this category. Again, this does not mean that the other chapters were not good, but in my opinion they just scratched the surface compared to above two.

    The HTML/XML chapter really has it all: parsing, validating, transforming, extracting data from XML documents, encoding and XPath handling to highlight some interesting topics. The coverage is surprisingly thorough for a language which is promoting YAML (Yaml Ain’t Markup Language) over XML. The HTML chapters, though there is just a few of them, are also very useful:-downloading content from Web pages, extracting data from HTML, converting plain text to HTML and vice versa. My only concern here is that I missed some third party package coverage (like RedCloth, BlueCloth, Hpricot or Mechanize) – but this is really nitpicking: if the author would take all my wishes into account, the book would have several thousand pages 🙂

    Databases and Persistence starts off with serialization recipes (using YAML, Marhsal and Madeleine). Chapters on indexing unstructured as well as structured text (SimpleSearch, Ferret) are a pleasant surprise before the must-have topics take off: connecting and using different kinds of databases (MySQL, PostgreSQL, Berkley DB)
    as well as Object Relational Mapping frameworks (Rails ActiveRecord and Nitro Og) and doing every kind of SQL voodoo magic of course. What should I add? Probably nothing.[6]

    I would really like to write something about the other chapters in this category, too, but since I am constantly bashed for the length of my posts, just believe me that they are great as well :-).

  • Internet and networking consists of Web Services and Distributed Programming, Internet Services and (surprise! surprise!) Web Development: Ruby on Rails. It would be really a cliché to write about why and how much the Internet is so important nowadays, how much Web 2.0 rocks, SOA and WS and REST and FOO and BAR rules etc. so I won’t do that ;-). However, it is a fact that Web application development never mattered this much in the history – so these chapters were basically compulsory.

    I would divide the category into two subcategories – Internet/Web stuff and distributed programming.

    There is really not too much to add to the first category – there is an unbelievable amount of information crammed into two chapters: ‘abstract’ techniques (HTTP headers and requests, DNS lookup etc), using every kind of protocols (HTTP(s), POP, IMAP, FTP, telnet, SSH…), servlet, client/server and CGI programming as well as talking to Web APIs (amazon, flickr, google) and Web services of course (XML-RPC, SOAP). In my opinion, the category offers more than enough information to get started and/or explore advanced techniques.

    It’s a shame that Distributed Programming got the half of a chapter only – O.K., I admit I am somewhat inclined to these techniques and they are maybe not used by that much people. The action is revolving mostly around DrB and Rinda, with an exception of 2 MemcCached recipes. The chapter closes with a nice ‘putting things together’ recipe by creating a remote-controlled Jukebox.

    I did not get too deep into the Ruby on Rails chapter, since I read Agile Web Development with Rails as well as Ruby for Rails and a lot of much more advanced Rails stuff previously – but judging from the recipe titles and skimming through some of them, the chapter looks very informative and unquestionably helpful if you have had no prior experience with Rails.

  • Last but not least, Managing and Distributing Software includes Testing, Debugging, Optimizing, and Documenting, Packaging and Distributing Software and Automating Tasks with Rake. If you plan to use Ruby for any other task than system administration (or writing very short scripts/one liners for whatever reason), documenting, testing, debugging and automating tasks is absolutely crucial. I know that lot of coders does not like to hear this – since they want to code and not write tests, documentation etc. – but I think nowadays, a serious programmer, no matter how much she would like to concentrate on hacking up feature MyNextCoolStuffWhichWillShakeTheEarth has to master these things. In the long run, any software that is undocumented, tested and continuously refactored will turn into Spaghetti quite easily.

    That said, these chapters were excellent for me. I have experience with these tasks in Java – however, the toolset is radically different in some cases (like Ant vs. Rake) and even if it is similar (Unit tests, rdoc vs. JavaDoc) the re-learning of them was inevitable. Fortunately, with the help of these recipes it was a breeze to learn them in Ruby (well, I have to add that actually these things (as nearly everything else) are considerably easier to do in Ruby, so the ease of learning stems from this fact as well).

    Rake absolutely rocks. Maybe I am also concerned because I have been working with Apache Ant a lot – well, if the ratio between Ruby and Java code is say 1:10, then the ratio between Rake and Ant files is 1:50 if we also consider simplicity, maintainability and understandability.

    Finally, if you also plan to release your software, the chapter Managing and Distributing Software can come handy. I think if you would like to distribute your stuff to the masses, packaging it into a gem is inevitable – rubygems are so cool that they made Rubyists too lazy to download something from a site instead of launching ‘gem my_cool_software’.

Conclusion

If you would like to become a serious Ruby hacker, don’t hesitate to buy this book. In my opinion it is absolutely worth every cent – and even more. My only problem is that there are no more recipes – however this is not a critique but rather a compliment: you simply can not get enough – not even from nearly 900 pages. One could argue that some things are missing or he would rather see this instead of that (I believe the authors themselves have had some tough time deciding these matters) – but I guess everyone agrees that the material which made it to the book is absolutely top-notch. 5 out of 5 stars – a great addition to anyone’s Ruby bookshelf.

Notes

[1] It is absolutely possible to learn Ruby withouth spending a nickel – there are excellent Ruby tutorials out there, like Why’s poignant guide to Ruby ( with cartoon foxes and chunky bacon :-)) or the first edition of the Pickaxe book which is available online for free, or Learning Ruby by Satish Talim, and a lot of other ones, too. For some beginner ruby exercises you can also check out my earlier post: 15 exercises for learning a new programming language – or just use google…Back

[2] I am not sure whether it was the best move to include ‘Rails’ in the title – it may turn down some who would like to learn Ruby but not Rails. However, I can
assure you that this book is a true Ruby masterpiece. Though there are some interesting Rails techniques included, the primary focus is unquestionably Ruby. Back

[3] There is one possible exception: If you are new not only to Ruby but also to programming, you should probably check out Chris Pine’s Learning to program first. Back

[4] Of course there will be always some overlapping and not every book can be absolutely correctly categorized in every case (for example, the Ruby Way has also a cookbook-like chapters) Back

[5] Of course this does not mean that the rest of the chapters were not that helpful – just coming from Python, I did not have so much ‘wow’ moments. Nevertheless,
they also teach a lot of idioms and are in no way less informative than the other two. Back

[6] Devil’s advocate(tm) says: maybe some chapters on SQLite and Oracle, as well as advanced SQL stuff would be cool – however, this is really mega-über nitpicking since then the title should be ‘Ruby and SQL cookbook’ 🙂 Back

Implementing ’15 Exercises for Learning a new Programming Language’

A short time ago in a galaxy not so far, far away I came across a nice blog post: 15 Exercises for Learning a new Programming Language.

One could argue if these are *really* the most appropriate 15(+) exercises to learn a new programming language – however, the task of answering this rather complex question is left as an exercise for the reader. Instead of this I will show you their implementation in Ruby – rubyrailways.com style.

Why did I bother to solve these problems (including not really trivial ones, like a scientific calculator with a GUI) ? Well, actually to learn a new programming language! I still consider myself a beginner Ruby apprentice just playing it by ear in my somewhat scarce free time, so I thought that systematically implementing a task list like this will mean great step forward for me compared to just coding random things at random times. Fortunately I was perfectly right!

Before we move onto the code, one last disclaimer: the fact that I am still a Ruby n00b implies that the code can be somewhat hairy/not optimal/[insert any other language than Ruby here]-ish so don’t use these snippets as a textbook solution of the problems or anything like that. I would be glad if someone could suggest a bit of refactoring of the bad parts but I also hope that that there are some nice parts which you can learn from (actually I am quite sure about this since I used some magick formulas from a few Ruby (grand)masters in some cases).

OK, enough talk for now. Let’s see the stuff!

1. Problem: “Display series of numbers (1,2,3,4, 5….etc) in an infinite loop. The program should quit if someone hits a specific key (Say ESCAPE key).”

Solution: Hmm, well, errr…uh-oh… I could not solve this problem fully (what a terrific start :-)). If Henry Ford would sit beside me now, he would say : You can hit any key to exit – so long as it’s ‘C’ – and one more advice: don’t forget to hold CTRL during this action :-). More on this after the code snippet:

i = 0
loop { print "#{i+=1}, " }

Comments :
If anyone knows how to add code which will cause this program to stop with a specific keyhit (say ‘ESC’) please, please, please drop me a note. I have been researching this for at least 10% of the time of solving all the tasks, nearly spitting blood when I gave up :-). It seems (to me) that there is no simple (i.e. no threads and similar) and clean platform-independent solution for this problem. I guess (hope) the author’s idea here was different than to introduce threading or writing platform specific-code…

2. Problem: “Fibonacci series, swapping two variables, finding maximum/minimum among a list of numbers.”

Solution:

#Fibonacci series
Fib = Hash.new{ |h, n| n < 2 ? h[n] = n : h[n] = h[n - 1] + h[n - 2] }
puts Fib[50]

#Swapping two variables
x,y = y,x

#Finding maximum/minimum among a list of numbers
puts [1,2,3,4,5,6].max
puts [7,8,9,10,11].min

Comments: The Fibonacci code was written by Andrew Johnson (found via Ruby Quiz). I like it so much that I think it would be a shame to present a trivial version here. I guess the rest of the code is self-explanatory.

3. Problem: "Accepting series of numbers, strings from keyboard and sorting them ascending, descending order."

Solution:

a = []
loop { break if (c = gets.chomp) == 'q'; a << c }
p a.sort
p a.sort { |a,b| b<=>a }

Comments: This version is accepting strings - I think anybody who got to this point can adapt it to work with numbers.

4. Problem: "Reynolds number is calculated using formula (D*v*rho)/mu Where D = Diameter, V= velocity, rho = density mu = viscosity Write a program that will accept all values in appropriate units (Don't worry about unit conversion) If number is < 2100, display Laminar flow, If it’s between 2100 and 4000 display 'Transient flow' and if more than '4000', display 'Turbulent Flow' (If, else, then...)"

Solution:

vars = %w{D V Rho Mu}

vars.each do |var|
  print "#{var} = "
  val = gets
  eval("#{var}=#{val.chomp}")
end

reynolds = (D*V*Rho)/Mu.to_f

if (reynolds < 2100)
  puts "Laminar Flow"
elsif (reynolds > 4000)
  puts "Turbulent Flow"
else
  puts "Transient Flow"
end

Comments: Can you spot the trick in the part which is filling up the variables? They don't go out of scope after the loop ends because they are constants. Other possibility would be to use $global variables but I guess it is usually not a very good programming practice to do that.

5. Problem: "Modify the above program such that it will ask for 'Do you want to calculate again (y/n), if you say 'y', it'll again ask the parameters. If 'n', it'll exit. (Do while loop)
While running the program give value mu = 0. See what happens. Does it give 'DIVIDE BY ZERO' error? Does it give 'Segmentation fault..core dump?'. How to handle this situation. Is there something built in the language itself? (Exception Handling)"

Solution:

vars = { "d" => nil, "v" => nil, "rho" => nil, "mu" => nil }

begin
  vars.keys.each do |var|
    print "#{var} = "
    val = gets
    vars[var] = val.chomp.to_i
  end

  reynolds = (vars["d"]*vars["v"]*vars["rho"]) / vars["mu"].to_f
  puts reynolds

  if (reynolds < 2100)
    puts "Laminar Flow"
  elsif (reynolds > 4000)
    puts "Turbulent Flow"
  else
    puts "Transient Flow"
  end

  print "Do you want to calculate again (y/n)? "
end while gets.chomp != "n"

Comments: As you can see, I could not use the same trick here when asking for the variables, because when somebody wants to calculate again, Ruby will complain (although by printing a warning only) that the constants have been already set up. Therefore I went for the hash solution. I think the do-you-want-to-calculate-again part is straightforward so I won't analyze that here.
"While running the program give value mu = 0."
Ruby gives a rather interesting result in this case: infinity :-).
"Is there something built in the language itself?"
Sure: exception handling. Division by zero could be caught with a ZeroDivisionError rescue clause.

6. Problem: "Scientific calculator supporting addition, subtraction, multiplication, division, square-root, square, cube, sin, cos, tan, Factorial, inverse, modulus"

Solution:
Since this code snippet is longer It would look ugly here - you can download it from here instead.

Screenshot:


screenshot of the scientific calculator in action

If you would like to try it, you will need the Tk bindings for Ruby (maybe you have them already, here on Ubuntu I did not). Also note that only the regular 0-9 keys (and of course the mouse) work, the numpad ones do not. One more little detail: % stands for modulo, not percent.

Comments: Phew, this was a real challenge, mostly because I never did any GUI in Ruby before. I was amazed that I could code up a relatively feature rich calculator in 100+ lines of code, without any golfing or trying to optimize for shortness. What I wanted to say with this is that the shortness does not praise my programming skills (since I did not eve try to golf) but the superb terseness of Ruby. OK, of course there are some problems (e.g. cube, cos, tan, inverse are not implemented) but the usability/amount of code ratio is unbelievably high.

The GUI is also not the nicest since I have used Tk - wxRuby or qt-ruby would produce much nicer results, but since I did not code any GUI in Ruby previously, I have decided to try the good-old-skool Tk for the first time.

7. Problem: "Printing output in different formats (say rounding up to 5 decimal places, truncating after 4 decimal places, padding zeros to the right and left, right and left justification)(Input output operations)"

Solution:

#rounding up to 5 decimal pleaces
puts sprintf("%.5f", 124.567896)

#truncating after 4 decimal places
def truncate(number, places)
  (number * (10 ** places)).floor / (10 ** places).to_f
end

puts truncate(124.56789, 4)

#padding zeroes to the left
puts 'hello'.rjust(10,'0')

#padding zeroes to the right
puts 'hello'.ljust(10,'0')

#right justification
puts ">>#{'hello'.rjust(20)}<<"

#left justification
puts ">>#{'hello'.ljust(20)}<<"

Comments: Amazingly lot of things can be done with sprintf() - I could solve nearly all the problems with it - but that would not really be rubyish, so I have decided for built-in (and one homegrown) functions. However, mastering (s)printf() is a very handy thing, since nearly all big players (C (of course :-)), C++, Java, PHP, ... ) have it so you get a powerful function in more languages for the price of learning one). As you can see, r/ljust is a nice one, too.

8. Problem: "Open a text file and convert it into HTML file. (File operations/Strings)"

Solution: Well, this problem was not specified in a great detail, to say the least - or to put it otherwise, the solvers are given a great freedom to provide a solution spiced up with their fantasy. This is what I came up with:

doc = <strong tag.
DOC

final_doc = <
  
    Text to HTML fun!
  
  
    

embed_doc_here

FINAL_DOC rules = {'*something*' => 'something', '/something/' => 'something'} rules.each do |k,v| re = Regexp.escape(k).sub(/something/) {"(.+?)"} doc.gsub!(Regexp.new(re)) do content = $1 v.sub(/something/) { content } end end doc.gsub!("\n\n") {"

\n

"} final_doc.sub!(/embed_doc_here/) {doc} puts final_doc

Comments: As you can see, besides that the text is wrapped around with a minimal HTML, every occurrence of words between asterisks is outputted in strong and between slashes in italic. You can add as many such rules as you like, they will be (hopefully) substituted in the final output.

9. Problem: "Time and Date : Get system time and convert it in different formats 'DD-MON-YYYY', 'mm-dd-yyyy', 'dd/mm/yy' etc."

Solution: Well, it was not really clear (for me) what should be the difference between 'yyyy' and 'YYYY' (resp. 'dd' vs 'DD') so again I had to use my imagination. However, I guess it does not matter too much, the solution has to be changed by 1-2 characters only if the original author had something different on his mind.

require 'date'

time = Time.now
#'DD-MON-YYYY', e.g. 12-Nov-2006 in my interpetation
puts time.strftime("%d-%b-%Y")

#'mm-dd-yyyy', e.g. 11-12-2006 in my interpetation
puts time.strftime("%m-%d-%Y")

#'dd/mm/yy', e.g. 12/11/2006 in my interpetation
puts time.strftime("%d/%m/%Y")

10. Problem: "Create files with date and time stamp appended to the name"

Solution:

#Create files with date and time stamp appended to the name
require 'date'

def file_with_timestamp(name)
  t = Time.now
  open("#{name}-#{t.strftime('%m.%d')}-#{t.strftime('%H.%M')}", 'w')
end

my_file = file_with_timestamp('test.txt')
my_file.write('This is a test!')
my_file.close

Comments: Maybe a more elegant solution could be to subclass File and override its constructor - but maybe that would be an overkill. I have voted for the latter option in this case :-).

11. Problem: "Input is HTML table. Remove all tags and put data in a comma/tab separated file."

Solution: Since web extraction is both my PhD topic and my everyday job (and even my free-time activity :-)) I will present 3 solutions for this problem. First, the classic old-school regexp way (by Paul Lutus), then with HPricot and finally with scRUBYt!, a simple yet powerful Ruby web extraction framework currently developed by me.

table = <
  
    1
    2
  
  
    3
    4
    5
  
  
    6
  

DOC

rows = table.scan(%r{.*?}m)

rows.each do |row|
   fields = row.scan(%r{(.*?)}m)
   puts fields.join(",")
end

Now for the HPricot solution (in the further examples let's consider that table is initialized as in the previous example):

require 'rubygems'
require 'hpricot'

h_table = Hpricot(table)

rows = h_table/"//tr"
rows.each do |row|
  child_text = (row/"//td").collect {|elem| elem.innerHTML }
  puts child_text.join(',')
end

and last, but not least scRUBYt!

require 'scrubyt'

table_data = P.table do
               P.cell '1'
             end

table_data.generalize :cell

puts table_data.to_csv

Some explanation: first of all, at the moment scRUBYt! is avaliable on my hard disk (and partially in my head) only - it should be released around XMAS 2006. I am using this solution for a little bit of self-promotion :-).

The example works like this: extract something (in this case a HTML <table>) which has something (in this case <td>) which has '1' as its text (well in reality much more is going on in the background, but roughly along these lines). This little code snippet will extract the first <td>s of ALL <tables> on a HTML page. With the 'generalize' call we tell the extractor that it should not extract just the first <td> in a table (which is the default setting), but all of them.

scRUBYt! can handle much, much, MUCH more complicated examples than this (like an ebay or amazon page) and has loads of sophisticated functions... so stay tuned!

12. Problem: "Extract uppercase words from a file, extract unique words."

Solution: (you can find some_uppercase_words.txt here and some_repeating_words.txt here

open('some_uppercase_words.txt').read.split().each { |word| puts word if word =~ /^[A-Z]+$/ }

words = open('some_repeating_words.txt').read.split()
histogram = words.inject(Hash.new(0)) { |hash, x| hash[x] += 1; hash}
histogram.each { |k,v| puts k if v == 1 }

13. Problem: "Implement word wrapping feature (Observe how word wrap works in windows 'notepad')."

Solution: Unfortunately I am not a Windows user and I have seen notepad a *quite* long time ago - so I am not sure the task and it's implementation are fully in-line - I have tried my best. Here we go:

input = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

def wrap(s, len)
  result = ''
  line_length = 0
  s.split.each do |word|
    if line_length + word.length + 1  < len
      line_length += word.length + 1
      result += (word + ' ')
    else
      result += "\n"
      line_length = 0
    end
  end
  result
end

puts wrap(input, 30)

14. Problem: "Adding/removing items in the beginning, middle and end of the array."

Solution:

x = [1,3]

#adding to beginning
x.unshift(0)

#adding to the end
x << 4

#adding to the middle
x.insert(2,2)

#removing from the beginning
x.shift

#removing from the end
x.pop

#removing from the middle
x.delete(2)

#we have arrived at the original array!

15. Problem: "Are these features supported by your language: Operator overloading, virtual functions, references, pointers etc."

Solution: Well this is not a real problem (not in Ruby, at least). Ruby is a very high level language ant these things are a must :).

Finally, you can download all the solutions in a single archive from here.
I would like to see the implementation of these tasks in both Ruby (different (more optimal) solutions of course) as well as in anything else. If you set out to do something like that, be sure to drop me a note.

Internet contains huge number of opportunities to earn money online. Simply create a site that you think has the potential to sell hot items using ruby on rails. Register a relevant domain name and purchase a web hosting service through hostgator, one of the better web host out there today. Get a internet connection through one of the wireless internet providers to upload your site. Work on search engine optimization to get a better traffic and also use affiliate marketing program for the same reason. Finally get a free voip phone service to contact customers directly. The pc to phone system is the most effective method of marketing.