header image

Archive for the 'Ruby' Category

acts_as_state_machine for Dummies, part I

Monday, May 5th, 2008

I recently applied this great plugin a few times to tackle different tasks and would like to share with you the joys of thinking in state machines!

Disclaimer: This installment is ‘just’ an intro to the topic (a teaser if you like), it doesn’t contain actual instructions or code on how to use actsasstate_machine - that comes in part II. Though the original intent was to write up a small tutorial with some example code, I started with an intro and the article grew so long that I decided to split it up - so stay tuned for part deux!

State what…?!?

A finite state machine (FSM for short) a.k.a. finite state automaton (FSA) is a 5-tuple (Σ,S,s0,δ,F)… OK, just kidding. I doubt too much people are interested in the rigorous definition of the FSM (for the rest, here it is), so let’s see a more down-to earth description.

According to wikipedia, Finite state machine is “a model of behavior composed of a finite number of states, transitions between those states, and actions“. Somewhat better than those gammas and sigmas and stuff but if you are not the abstract thinker type, it might take some time to wrap your brain around it. I believe a good example can help here!

Everybody knows and loves regular expressions - but probably it’s not that wide known fact that regular expression matching can be solved with an FSM (and in fact, a lot of implementations are using some kind of FSM on steroids). So let’s see a simple example. Suppose we would like to match a string against the following, simple regexp:

ab+(a|c)b*

First we have to construct the FSM, which will be fed with the string we would like to match. An FSM for the above regular expression might look like this:

fsm_correct.png

String matching against this FSM is basically answering the question ’starting from the initial state, can we reach the finish after feeding the FSM the whole string?’. Pretty easy - the only thing we have to define is ‘feeding’.

So let’s take the string ‘abbbcbbb’ as an example and feed the FSM! The process looks like this:

  1. we are in q0, the initial state (where the ’start’ label is). Starting to consume the string
  2. we receive the first character, it’s an ‘a‘. We have an ‘a‘ arrow to state q1, so we make a transition there
  3. we receive the next character, ‘b‘. We have two b-arrows: to q1 and q2. We choose to go to q1 (in fact, staying in q1) - remember, the question is whether we _can_ reach the finish, not whether all roads lead to the finish - so the choice is ours!
  4. identical to the above
  5. after the two above steps, we are still in q1. We still get a ‘b‘ but this time we decide to move to q2.
  6. we are in q2 and the input is ‘c‘. We have no analysis-paralysis here since the only thing we can do is to move to q4 - so let’s do that!
  7. Whoa! We reached the finish line! (q4 is one of the terminal states). However, we didn’t consume the whole string yet, so we can’t yet tell whether the regexp matches or not.
  8. So we eat the rest of the string (the ‘how’ is left as an exercise to the reader) and return ‘match!’

Let’s see a very simple non-matching example on the string ‘abac’

  1. in q0
  2. got an ‘a‘, move to q1
  3. in q1, got a ‘b‘, move to q2
  4. in q2, got an ‘a‘, move to q3 - we reached the finish, but still have a character to consume
  5. in q3, got a ‘c‘… oops. We have no ‘c’ arrow from q3 so we are stuck. return ‘no match!’

Of course the real-life scenarios are much more complicated than the above one and sometimes FSMs are not enough (for example to my knowledge it’s not possible to tell about a number whether it is prime or not with a vanilla FSM - but a regexp doing just that has been floating around some time ago) but to illustrate the concept this example served fine.

This is cool and all but why should I care?!?

Well, yeah, you are obviously not going to model an FSM the next time you would like to match a regexp - that would be wheel-reinvention at it’s finest. However there are some practical scenarios where an FSM can come handy:

  • sometimes the logic flow is just too complicated to model - an if-forrest is rarely a good solution (on the flip side, don’t model an if-else with an FSM :-) )
  • encapsulate complex logic flow into a pattern and not clutter your code with it.
  • you are in a stateless world - for example HTTP
  • asynchronous and/or distributed processing where you explicitly need to maintain your state and act upon it

Some real life examples of FSM usage in the Ruby/Rails world are why the lucky stiff’s Hpricot (using Ragel) or Rick Olson’s restful authentication plugin (using actsasstate_machine)

The Next Episode

In the next installment I’d like to focus on the practical usage of the actsasstate_machine plugin - I’ll attempt to create an asynchronous messaging system in a Rails app using it.

tags:

Today Midnight

Thursday, April 24th, 2008

I have been always uncertain about the exact expression denoting today midnight (or any day midnight, for that matter). Is 00:00 on e.g. April 24th the midnight between 23rd and 24th or 24th and 25th? If I want something to happen at today midnight, is that today’s date at 00:00? (for the impatient: no, it isn’t :-) ).

Chronic to the rescue! (If you don’t know chronic, be sure to check it out - it’s a great natural language date/time parser). All I had to do is:

  1. >> Chronic.parse(‘today midnight’)
  2. => Fri Apr 25 00:00:00 +0200 2008

so actually it turns out it’s tomorrow’s date at 00:00.

I couldn’t find time zone support though (I am not saying it’s not there, just that I couldn’t find it by looking at the API) - so what if I want to meet someone in Madrid today midnight? Why, I install the tzinfo gem and ask Ruby!

  1. >> TzinfoTimezone["Madrid"].utc_to_local(Chronic.parse(‘today midnight’).getutc)
  2. => Fri Apr 25 00:00:00 UTC 2008
tags:

Random Links from the Web, 22-04-2008 (AI edition)

Tuesday, April 22nd, 2008
tags:

Random Links from the Web, 21-04-2008

Monday, April 21st, 2008
tags:

The Top 10 Ruby/Rails Blogs

Sunday, June 10th, 2007

ubuntu In my quest to whip my feed reader’s Ruby/Rails related content into shape a bit, I made a little research to find out which Ruby/Rails blogs are the most popular at the moment. I had given up on following most of the blogs systematically a long time ago - it is becoming increasingly hard to keep track of even the aggregators, not to talk about the blogs themselves. There are hundreds of Ruby/Rails blogs out there right now (I am talking about the ones found on the few most popular aggregators - in reality there must be much more of them), so it is clear that you need to pick carefully - unless you happen to be a well-paid, full time Ruby/Rails blog reader (in which case you still would have to crank a lot to do your work properly).

OK, enough nonsense for today - let’s see the results counting down from the 10th place! If you are interested in the method they were created with, or a longer, top 30 list from technorati and alexa, check out this blog entry.

10. http://weblog.jamisbuck.org/ by Jamis Buck.

jamisbuck

Jamis Buck “is a software developer who has the good fortune to be both employed by 37signals and counted among those who maintain the Ruby on Rails web framework”. He is mostly blogging about (surprise, surprise!) Rails - of course on a very high level, which could be expected from a Rails core developer. Very insightful posts on ActiveRecord, Capistrano and other essential Rails topics delivered in a professional way.

9. http://weblog.rubyonrails.org by the Rails core team

weblog_rubyonrails

This is the “default” Ruby on Rails blog, used for announcements, sightings, manuals and whatever else the RoR team finds interesting :-) .

8. http://www.slash7.com by Amy Hoy.

slash7

This is a really cool little site - Amy is a very gifted writer and designer, publishing very insightful articles as well as the nicest (hands down!) cheat sheets about different Web2.0, Ajax, Rails and that sort of stuff. Definitely worth checking out!

7. http://errtheblog.com by PJ Hyett and Chris Wanstrath.

err_the_blog

A very serious blog of two Rails-geeks about advanced topics (but very well explained - so if you are not totally green (#00FF00) you should do fine). Among other things, they have contributed Sexy Migrations to Rails recently.

6. http://nubyonrails.com/ by Geoffrey Grosenbach

nubyonrails

Geoffrey is the author of more than twenty of Rails plugins, (including gruff, my favorite graph drawing gem), a horde of professional-quality articles and the PeepCode screencast site. Do I need to say more?!

5. http://redhanded.hobix.com/ by _why the lucky stiff.

redhanded

_why is probably the most interesting guy in the Ruby community. He is the author of (among tons of other things) Why’s Poignant Guide to Ruby, HPricot, the coolest Ruby HTML parser, Try Ruby! (a must see!) and Hackety Hack, for aspiring wannabe programmers who want to hack like in the movies! The list goes on and on… This guy never stops. If someone will ever invent the perpetuum mobile, he will be it (in Ruby, of course).

4.http://hivelogic.com/ by Dan Benjamin.

hivelogic

Dan’s recent work include Cork’d, a web2.0 wine community site or the A List Apart publishing system. He does great podcasts with various guys.

3. http://mephistoblog.com/ by Rick Olson and Justin Palmer

mephisto

Personally I was quite surprised that a blog concentrating on such a narrow topic (in this case the mephisto blogging system) could grab the 3rd place - so I have checked both alexa and technorati by hand just to be sure, and it seems that everything is OK - mephistoblog is ranked very high on both of them, justifying this position. After all, mephisto is the leading blog system of Rails!

2. http://www.rubyinside.com/ by Peter Cooper.

rubyinside

This blog is my absolute favorite from this top 10 list (actually, from all the Ruby blogs I have encountered so far). I am definitely with Amy Hoy, who said If you had to subscribe to just one Ruby blog, it should be this one. If you would like to know what’s happening in the Ruby/Rails community, rubyinside is the place to check. If there is no new post here, it’s because most probably nothing happened!

And the winner is: http://www.loudthinking.com/ by David Heinemeier Hansson.

loudthinking

Well, what should I add? David is the author of Ruby on Rails, so no wonder his blog topped the list!


Conclusion
It’s interesting to note that nearly all the blogs listed here are mostly pure Rails ones - rubyinside (mixed Ruby/Rails) and redhanded (pure Ruby) being the two exceptions. It would be interesting to generate such a list for Ruby blogs - though I am not sure how. The sources I have used (most notably rubycorner) aggregate both Ruby and Rails blogs) - so it seems there are much more Rails bloggers out there (or they are much better (with the exception of _why) than the Ruby bloggers).

I would really like to hear your opinion on this little experiment - whether you think it makes sense or it is completely off, how could it be improved in the future, what features could be added etc. If I’ll receive some positive feedback, I think I will work on the algorithm a bit more, and run it once in say every 3 months to see what’s happening around the Ruby/Rails blogosphere. Let me know what do you think!


If one is thinking about creating a site for affiliate marketing to earn some extra cash they should thoroughly research a few things. To start with look for a cheap company that sell domains for your domain name registration. Next get a cheap, yet reliable web hosting company to host your site on. These can be easily distinguished as they hire many cisco certified professionals. The generally carry 642-371 certifications. Then look into online backup for your files to avoid data loss. More over perform directory submission to get indexed in the search engines. Getting a+ certificate yourself is not a bad idea since it will help you get through the process with much ease.


tags:

Book Review: Build Your Own Ruby on Rails Applications

Wednesday, May 23rd, 2007

Build Your Own Ruby on Rails Applications
Author: Patrik Lenz
Publisher: SitePoint
Pages: 447
Intended Audience: Beginners/Pre-intermediate
Rating: 5/5

I would like to begin with a few words about SitePoint. According to their definition, ‘SitePoint specializes in publishing fun, practical, and easy-to-understand content for web professionals.’. So far I had the pleasure to read three of their books: (obviously) Build Your Own Ruby on Rails Web Applications, The CSS Anthology: 101 Essential Tips, Tricks & Hacks and The Principles of Beautiful Web Design. If I had to judge the publisher based on these three books, I could not agree more: I have found all their claims (fun, practical and easy-to-understand) to be unquestionably true.

After a brief overview of the book I would like to concentrate on the question that popped up in most of you I guess: Why should I prefer this book over Agile Web Development with Rails or other Rails books available? We’ll look into that in a minute, but first things first: let’s see what has Build Your Own Ruby on Rails Web Applications to offer!

The book starts off with installing Ruby, RubyGems, Rails and even MySQL on different operating systems, presented in painstaking detail - which is very good in my opinion, since advanced users will skip this section anyway, and it offers great step-by-step walkthrough for novices.

The second chapter is the compulsory ‘introduction to Ruby’. I have to admit I did not read it - but judging from the contents and a quick skim-through, it offers at least the same knowledge as the other similar Rails-books, which is more than enough to get you started. If you would like to go deeper into both Ruby and Rails, I suggest to check out David A. Black’s excellent Ruby for Rails.

Chapter 4, ‘Rails Revealed’ is the only more-or-less theoretical chapter, discussing the architecture, components and conventions used in Ruby on Rails.

The real action starts from Chapter 5 in the form of building a digg-clone from scratch. You will learn how to build a Rails application, beginning with generating the necessary files and ending up with a nicely working, (relatively) feature rich digg-like site, dealing with user management (even showing an user view with submitted stories), allowing you to submit and vote on stories (just as you would expect from an application like this), sprinkled with a lot of tasty tidbits like tagging (also introducing polymorphic associations in a very easy-to-understand way) or (of course) AJAX.

The book finishes with some advanced topics: Debugging, Testing and Benchmarking, followed by Deploying and Production use, providing instructions to deploy your application on Apache with Mongrel.

If the review would end right now, you could (rightfully) ask: ‘So what? These are exactly the things I would expect from a Rails book’ - and you would be perfectly right. So let’s see why is this book different from all the other ones available on the market!

First of all, it is written in a very understandable and easy-to-digest way: it explains everything as simply as possible, making even the more complicated topics clear right away. I don’t remember reading anything twice, no matter how advanced the topic was. I think this alone makes Build Your Own Ruby on Rails Web Applications one of the best hands-on RoR books today (definitively the best one I have seen so far, but since I did not read all the competitors, I can not unambiguously claim this is the best one).

What I also like about this book is that it does not require nearly any preliminaries at all - the bare minimum that is needed is explained on the side during the application creation, or can be learned from the book.

A big difference compared to Agile Web Development with Rails - which is the de facto Rails book today - is that testing of the created components is described in great detail. The usual workflow is thus problem statement, solution and creating unit tests to verify the code - explaining the why’s and how’s as well. I am not aware of any RoR book currently available that would explain and demonstrate testing your code to this extent.

One could argue that Build Your Own Ruby on Rails Web Applications is not deep enough, which is more-or-less true (compared to e.g. Agile Web Development with Rails) - but I think this is perfectly fine, since going too deep is not the purpose of the book at all! If you need in-depth coverage of Rails internals, would like to go into advanced topics like caching, scaling or deployment in a great detail then this is not the book to get. However, if you would like to try Ruby on Rails right away, without the need to google for blogs helping you to install the preliminaries or get this and that right, be sure to check it out!

tags:

Ruby’s Growth Comes to an End?!

Thursday, May 17th, 2007

According to O’Reilly’s latest report on the state of the computer book market focusing on programming books, Ruby has the definitive lead. Check out this treemap view - I believe it does not need too much additional explanation (The percentages reflect the relative book sale compared to 2006/Q1):

Now, I would not like to start a language war here at all - there is neither a need to draw zealous consequences from the Ruby camp nor to come up with explanation from proponents of other languages. The diagram shows that compared to the same period of 2006, there is the biggest demand for Ruby (and other Ruby-based/related) books currently - and nothing more. It does not tell anything about the number of people using the given language or related frameworks, job opportunities or the absolute market share - this is just a relative indicator based on the programming book market.

However, if you take a peek at the TIOBE index for May - entitled ‘Ruby’s growth comes to an end’ - you can see that Ruby is the fastest growing language at the moment (again, compared to the same period of 2006). If this is the ‘end of the growth’, then how does the growth look like?!

It is also interesting to check out this graph from TIOBE:

It tells me that starting from July 2006, none of the programming languages shows so big (and steady) growth than Ruby.

I don’t know based on what did the TIOBE guys come to the conclusion that Ruby is losing steam… I have talked to a few Ruby on Rails freelancers recently, and each of them confirmed independently that there is a bigger need for Ruby/Rails programmers than ever. Based on (not only) these data I would say quite the opposite is true: my personal feeling is that Ruby/Rails is just going to be a *lot* bigger than it is currently!

tags:

Partitioning Sets in Ruby

Thursday, April 26th, 2007

During hacking on various tasks, I needed to partition a set of elements quite a few times. I have attacked the problem with different homegrown implementations, mostly involving select-ing every element belonging into the same basket in turn. Fortunately I run across divide recently, which does exactly this… No more wheel reinvention! Let’s see a concrete example.

I have an input file like this:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

The goal is to sum up all the numbers in rows beginning with the same character (e.g. to sum up all the numbers that are in a row beginning with ‘a’). The result should look like:

[{"a"=>241}, {"b"=>246}, {"c"=>145}]

This is an ideal task for divide! Let’s see one possible solution for the problem:

  1. require ’set’
  2.  
  3. input = Set.new open(‘input.txt).readlines.map{|e| e.chomp}
  4. groups = input.divide {|x,y| x.map[0][0] == y.map[0][0] }
  5. #build the array of hashes
  6. p groups.map.inject([]) {|a,g|
  7.    #build the hashes for the number sequences with same letters
  8.     a << g.map.inject(Hash.new(0)) {|h,v|
  9.     #for every sequence, sum the numbers it contains
  10.     h[v[0..0]] += v[2..-1].split(‘ ‘).inject(0) {|c,x|
  11.       c+=x.to_i; c}; h
  12.   }; a
  13. }

The output is:

  1. [{"a"=>241}, {"b"=>246}, {"c"=>145}]

Great - it works! Now let’s take a look into the code…

The 3rd line loads the lines into a set like this:

  1. <Set: {"b 1 23 ", "c 5 16 2 34", "a 9 0 31", "a 7 83 1 23 3", "b 1 14 4 15 16 2", "a 53 2 3", "c 3 42 2 31 4 6", "b 4 45 4 16 7", "b 8 62 1 23", "a 1 3 22 "}>

The real thing happens on line 4. After it’s execution, groups looks like:

  1. <Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>, <Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>, <Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}>

As you can see, the set is correctly partitioned now - with almost no effort! We did not even need to require an external library…
The rest of the code is out of the scope of this article (everybody is always complaining about the long articles here, so I am trying to keep them short) - and anyway, the remaining snippet is just a bunch of calls to inject. If inject does not feel too natural to you, don’t worry - it took me months until I got used to it, and some people (despite of the fact that they fully understand and are able to use it) never reach after it - I guess it’s a matter of taste…’

tags:

Getting Beast up and Running on Dreamhost (for the Truly Lazy)

Thursday, March 22nd, 2007

Though dreamhost offers phpBB as one of their one-click install goodies (ergo it is the easiest to install of all forums since you almost don’t have to do anything), I have been looking for something different. To me, phpBB’s interface was always quite unintuitive and too heavy - I wanted something smaller, easier, more compact. The problem was I did not know what should I search for - until I came across beast, a lightweight forum written in Ruby on Rails. It was love at the first sight!

When it comes to tools I am using, I am really language agnostic - this very blog uses WordPress (PHP), I am using Trac (Python) to track my projects, mediaWiki (PHP) is my preferred wiki etc - so even if it may seem so, I did not choose beast because it is written in Rails (although +1 for that :-) ), but because of the design and ease of use. My first thought after trying it was ‘wow, this is as easy to use as a 37signals app’ - it’s really that intuitive and well designed!

Well, this sounds fine and all, but installation on dreamhost was a different story. Thanks God I have found a superb, step by step HOWTO here. However, even after following all the steps, I got ‘incomplete headers’ and other problems, which I have managed to fix - here are some additional comments to the HOWTO:

6. You can forget about this point; as the HOWTO says, it is already installed on DH and it will work without any problems.
7. Forget about ‘development’ and ‘test’, however be sure to get ‘production’ right, as the next step will not work otherwise. It should look something like this:

production:
  adapter: mysql
  database: beast_prod
  host: mysql.myhost.com
  username: us3r
  password: p4ss
  port: 3306
8. For me it worked only *with* the RAILS_ENV=production parameter specified.
9. You can change the salt to anything - it just must not stay the same. The easiest thing is to add or remove a random character from the string.
12. The shebang should be updated to #!/usr/bin/ruby
13. The || should be removed, i.e. it should read:
ENV[‘RAILS_ENV’] = ‘production’
14. Make sure you change the permission of those directories only - I have changed everything recursively, destroying the executable flag of dispatch.fcgi :-) .

Now you should apply the ‘GetText patch’ - it can be found later in the thread. After you should be up and running!

After playing around, I have found that the user listing is not working - fortunately I have found this as well in the forum. The solution is:
app/views/users/index.rhtml line 3 should be modified to

%lt;% form_tag '', :method => 'get' do -%>
Enjoy this great forum!

tags:

Data Extraction for Web 2.0: Screen Scraping in Ruby/Rails, Episode 1

Sunday, February 4th, 2007

This article is a follow-up to the quite popular first part on web scraping - well, sort of. The relation is closer to that between Star Wars I and IV - i.e., in chronological order, the 4th comes first. To continue the analogy, probably I am in the same shoes as George Lucas was after creating the original trilogy : the series became immensely popular and there was demand for more - in both quantity and depth.

After I have realized - not exclusively, but also - through the success of the first artcile that there is need for this sort of stuff, I begun to work on the second part. As stated at the end of the previous installment, I wanted to create a demo web scraping application to show some advanced concepts. However, I left out a major coefficient from my future-plan-equation: the power of Ruby.

Basically this web scraping code was my first serious Ruby program: I came to know Ruby just a few weeks earlier, and I have decided to try it out on some real-life problem. After hacking on this app for a few weeks, suddenly a reusable web scraping toolkit - scRUBYt! - begun to materialize which caused a total change of the plan: instead of writing a follow-up, I decided to finish the toolkit and sketch a big picture of the topic as well as placing scRUBYt! inside this frame and illustrating the theoretical things with it described here.

The Big Picture: Web Information Acquisition

The whole art of systematically getting information from the Web is called ‘Web information acquisition’ in the literature. The process consists of 4 parts (see the illustration), which are executed in this order: Information Retrieval (IR), Information Extraction(IE), Information Integration (II) and Information Delivery (ID).

Information Retrieval

Navigate to and download the input documents which are the subject of the next steps. This is probably the most intuitive step to make - clearly, the information acquisition system has to be pointed to the document which contains the data first, before it can perform the actual extraction.

The absolute majority of the information on the Web resides in the so-called deep web - backend databases and different legacy data stores which are not contained in static web documents. This data is accessible via interaction with web pages (which serve as a frontend to these databases) - by filling and submitting forms, clicking links, stepping through wizards etc. A typical example could be an airpot web page: an airport has all the schedules of the flights they offer in their databases, yet you can access this information only on the fly by submitting a form containing your concrete request.

The opposite of the deep web is the surface web - static pages with a ‘constant’ URL, like the very page you are reading. In such a case, the information retrieval step consist of just downloading the URL. Not a really tough task.

However, as I said two paragraphs earlier, most of the information is stored in the deep web - different actions, like filling input fields, setting checkboxes and radio buttons, clicking links etc. are needed to get to the actual page of interest which can be then downloaded as the result of navigation.

Besides that this is not trivial to do automatically from a programming language just because of the nature of the task, there are a lot of pitfalls along the way, stemming from the fact that the HTTP protocol is stateless: the information provided to a request is lost when making the next request. To remedy this problem, sessions, cookies, authorizations, navigation history and other mechanisms were introduced - so a decent information retrieval module has to take care about these as well.

Fortunately, in Ruby there are packages which are offering exactly this functionality. Probably the most well-known is WWW::Mechanize which is able to automatically navigate through Web pages as a result of interaction (filling forms etc.) while keeping cookies, automatically following redirects and simulating everything else what a real user (or the browser in response to that) would do. Mechanize is awesome - from my perspective it has one major flaw: you can not interact with JavaScript websites. Hopefully this feature will be added soon.

Until that happy day, if someone wants to navigate through JS powered pages, there is a solution: (Fire)Watir. Watir is capable to do similar things as Mechanize (I never did a head-to-head comparison, though it would be interesting) with the added benefit of JavaScript handling.

scRUBYt! comes with a navigation module, which is built upon Mechanize. In the future releases I am planning to add FireWatir, too (just because of the JavaScript issue). scRUBYt! is basically a DSL for web scraping with lot of heavy lifting behind the scenes. Through the real power lies the extraction module, there are some goodies here at the navigation module, too. Let’s see an example!

Goal: Go to amazon.com. Type ‘Ruby’ into the search text field. To narrow down the results, click ‘Books’, then for further narrowing ‘Computers & Internet’ in the left sidebar.

Realization:

  fetch           'http://www.amazon.com/'
  fill_textfield  'field-keywords', 'ruby'
  submit
  click_link      'Books'
  click_link      'Computers & Internet'

Result: This document.

As you can see, scRUBYt’s DSL hides all the implementation details, making the description of the navigation as easy as possible. The result of the above few lines is a document - which is automatically fed into the scraping module, but this is already the topic of the next section.

Information Extraction

I think there is no need to write about why does one need to extract information from the Web today - the ‘how’ is a much more interesting question.

Why is Web extraction such a tedious task? Because the data of interest is stored in HTML documents (after navigating to them, that is), mixed with other stuff like formatting elements, scripts or comments. Because the data is missing any semantic description, a machine has no idea what a web shop record is or how a news article might look like - it just perceives the whole document as a soup of tags and text.

Querying objects in systems which are formally defined and thus understandable for a machine is easy: For instance, if I want to get the first element of an array in Ruby, One can do it easily like this:

  1. my_array.first

Another example for a machine-queryable structure could be an SQL table: to pull out the elements matching the given criteria, all that needs to be done is to execute an SQL query like this:

  1. SELECT name FROM students WHERE age > 25

Now, try to do similar queries for a Web page. For example, suppose that you already navigated to an ebay page by searching for the term ‘Notebook’. Say you would like to execute the following query: ‘give me all the records with price lower than $400′ (and get the results into a data structure of course - not rendered inside your browser, since that works naturally without any problems).

The query was definitely an easy one, yet without implementing a custom script extracting the needed information and saving it to a data structure (or using stuff like scRUBYt! - which does exactly this instead of you) you have no chance to get this information from the source code.

There are ongoing efforts to change this situation - most notably the semantic Web, common ontologies, different Web2.0 technologies like taxonomies, folksonomies, microformats or tagging. The goal of these techniques is to make the documents understandable for machines to eliminate the problems stated above. While there are some promising results in this area already, there is a long way to go until the whole Web will be such a friendly place - my guess is that this will happen around Web88.0 in the optimistic case.

However, at the moment we are only at version 2.0 (at most), so if we would like to scrape a web page for whatever reason today, we need to cope with the difficulties we are facing. I wrote an overview on how to do this with the tools available in Ruby (update: there is a new kid on the block - HPricot - which is not mentioned there).

The rough idea of those packages is to parse the Web page source into some meaningful structure (usually a tree) then provide a querying mechanism (like XPaths, CSS selectors or some other tree navigation model). You could think now: ‘A-ha! So actually a web page can be turned into something meaningful for machines, and there is a formal model to query this structure - so where is the problem described in the previous paragraphs? You just write queries like you would in a case of a database, evaluate them against the tree or whatever and you are done’.

The problem is that the machine’s understanding of the page and human thinking about querying this information are entirely different, and there is no formal model (yet) to eliminate this discrepancy. Humans want to scrape ‘websop records with Canon cameras with maximal price $1000′, while the machine sees this as ‘the third <td> tag inside the eight <tr> tag inside the fifth <table> … (lot of other tags) inside the <body>> tag inside the <html> tag, where the text of the seventh <td> tag contains the string ‘Canon’ and the text of the ninth <td>, is not bigger than 1000 (to even get the value 1000 you have to use a regular expression or something to get rid of the most probably present currency symbol and other possible additional information).

So why is this so easy with a database? Because the data stored in there has a formal model (specified by the CREATE TABLE keyword). Both you and the computer know exactly how a Student or a Camera looks like, and both of you are speaking the same language (most probably an SQL dialect).

This is totally different in the case of a Web page. A web shop record, a camera detail page or a news item can look just anyhow and your only chance to find out for the concrete Web page of interest is to exploit it’s structure. This is a very tedious task on it’s own (as I have said earlier, a Web page is a mess of real data, formatting, scripts, stylesheet information…). Moreover there are further problems: for example, a web shop record must not be uniform even inside the same page - certain records can miss some cells which others have, may containt the information on a detail page, while others not and vice versa - so in some cases, identifying a data model is impossible or very complicated - and I did not even talk about scraping the records yet!

So what could be the solution?

Intuitively, there is a need for an interpreter which understands the human query and translates it to XPaths (or any querying mechanism a machine understands). This is more or less what scRUBYt! does. Let me explain how - it will be the easiest through a concrete example.

Suppose you would like to monitor stock information on finance.yahoo.com! This is how I would do it with scRUBYt!:

#Navigate to the page
fetch 'http://finance.yahoo.com/'

#Grab the data!
stockinfo do
  symbol  'Dow'
  value   '31.16'
end

output:

  <root>
    <stockinfo>
      <symbol>Dow</symbol>
      <value>31.16</value>
    </stockinfo>
    <stockinfo>
      <symbol>Nasdaq</symbol>
      <value>4.95</value>
    </stockinfo>
    <stockinfo>
      <symbol>S&P 500</symbol>
      <value>2.89</value>
    </stockinfo>
    <stockinfo>
      <symbol>10-Yr Bond</symbol>
      <value>0.0100</value>
    </stockinfo>
  </root>

Explanation: I think the navigation step does not require any further explanation - we fetched the page of interest and fed it into the scraping module.

The scraping part is more interesting at the moment. Two things happened here: we have defined a hierarchical structure of the output data (like we would define an object - we are scraping StockInfos which have Symbol and Value fields, or children), and showed scRUBYt! what to look for on the page in order to fill the defined structure with relevant data.

How did I know I had to specify ‘Dow’ and ‘31.16′ to get these nice results? Well, by manually pointing my browser to ‘http://finance.yahoo.com/’, and observing an example of the stuff I wanted to scrape - and leave the rest to scRUBYt!. What actually happens under the hood is that scRUBYt! finds the XPath of these examples, figures out how to extract the similar ones and arranges the data nicely into a result XML (well, there is much more going on, but this is the rough idea). If anyone is interested, I can explain this in a further post.

You could think now ‘O.K., this is very nice and all, but you have been talking about monitoring and I don’t really see how - the value 31.16 will change sooner or later and then you have to go to the page and re-specify the example again - I would not call this monitoring’.

Great observation. It’s true scRUBYt! would not be of much use if the situation of changing examples would not be handled (unless you would like to get the data only once, that is) - fortunately, the situation is dealt with in a powerful way!

Once you run the extractor and you think the data it scrapes is correct, you can export it. Let’s see how the exported finances.yahoo.com extractor looks like:

#Navigate to the page
fetch 'http://finance.yahoo.com/'

#Construct the wrapper
 stockinfo "/html/body/div/div/div/div/div/div/table/tbody/tr" do
   symbol "/td[1]/a[1]"
   value "/td[3]/span[1]/b[1]"
end

As you can see, there are no concrete examples any more - the system generalized the information and now you can use this extractor to scrape the information automatically whenever - until the moment the guys at yahoo change the structure of the page - which fortunately not happening every other day. In this case the extractor should be regenerated with up-to date examples (in the future I am planning to add automatic regeneration in such cases) and the fun can begin from the start once again.

This example just scratched the surface of what scRUBYt is capable of - there are tons of advanced stuff to fine-tune the scraping process and get the data you need. If you are interested, check out http://scrubyt.org for more information!

Conclusion

The first two steps of information acquisition (retrieval and extraction) are dealing with the question ‘How to get the data I am interested in (querying)’. Up to the present version (0.2.0) scRUBYt! contains just these two steps - however, to do even these properly, I will need a lot of testing, feedback, bug fixing, stabilization, adding heaps of new features and enhancements - because as you have seen, web scraping is not a straightforward thing to do at all.

The last two steps (integration and delivery) are addressing the question ‘what to do with the data once it is collected, and how to do that (orchestration)’. These facets will be covered in a next installment - most probably when scRUBYt! will contain these features as well.

If you liked this article and you are interested in web scraping in practice, be sure to install scRUBYt! and check out the community page for further instructions - the site is just taking off, so there is not too much yet - but hopefully enough to get you started. I am counting on your feedback, suggestions, bug reports, extractors you have created etc. to enhance both scrubyt.org and scRUBYt! user experience in general. Be sure to share your experience and opinion!

To launch a tutorial site is comparatively much easier today than it was a few years ago. You can easily buy domain name at a very low cost and do domain parking until your site is ready. Get a good business hosting package from one of the many providers listed on the internet, go for a company which hires people with cisco certifications such as 642-143. Create a professional web design with the help of adobe. Get online training that can guide you through the site’s development. Use your laptop wireless internet connection to upload from anywhere conveniently.
tags:

Book Review: Ruby Cookbook

Thursday, December 21st, 2006

Since I am relatively new to Ruby, I have no idea how life could have been in the dark ages of the non-Japanese-speaking Ruby community (1995 - 2000), when there was no English Ruby book on the market. The ice was broken by Andy Hunt and Dave Thomas with a pickaxe - err… actually the Pickaxe (a.k.a. “Programming Ruby”), which has undoubtedly become an all-famous Ruby-classic since then.

In the foreword, Matz, the author of Ruby, explains that since he is much better in coding than in documentation writing, probably the authors did not have an easy job - what they could not find in the (rather scant) documentation, had to figure out directly from the Ruby source code.

The Ruby book scene looks radically different today. In fact we are facing the opposite problem: there are so much books on Ruby that sometimes it can be hard to choose which ones to read and in which order. Probably it won’t be any easier to find the answers for these questions in the future: judging from the blogs and announcements, the bigger part of the books is yet to come. If you are new to Ruby you will most probably have a hard time figure out how to spend your money wisely [1] - so what’s the solution?

Of course there is no definitive answer for this question - I can only tell you what worked for me.

First I would definitely recommend David A. Black’s Ruby for Rails [2]. It is absolutely suited for newcomers (and for advanced hackers, too), no matter if you are new to Ruby and/or coming from a different programming language [3]. I was also a Python enthusiast (through doing most of my everyday work in Java) when I have discovered Ruby - and David’s book was a perfect choice to switch very fast.

Currently I am undecided between the 2nd and the 3rd place, so let’s say you should check them out in parallel - They are (of course) the pickaxe and Hal Fulton’s “The Ruby Way”. They are both time-tested Ruby classics, hence a must read. However, if you have time and/or money to read only one of the above books, in my opinion it should be “Ruby for Rails”.

Although these three masterpieces are - in my opinion - among the most well-written and informative tech books available today, you have to remember the good old rule: No matter how much books you read or how good they are - you will never become a true Ruby hacker until you actually begin to use the acquired knowledge and put it into practice.

After reading these books I wanted to jump into writing some cool stuff - Ruby seemed to be so elegant, easy, succinct - and to my greatest surprise, I could not write too much sensible code :-) (at least not without referring to these books and/or google and/or ruby-talk more frequently that I considered o.k. to call it programming on my own).

This is exactly the situation where the Ruby Cookbook should enter the scene. The first three books give you a hint about what can be done with Ruby[4]. The cookbook offers you well organized content in forms of recipes to show you how it can be done elegantly, quickly and effectively in a ruby-esque way.

Probably the most frequent answer to the question ‘How should I improve my Ruby skills’ on the ruby-talk mailing list sounds: By starting your own project. Since I put this advice into practice myself and it worked for me, I have to agree: armed with the goodies from Learning Ruby, The Pickaxe and the Ruby way, the best thing to do is to grab a copy of the Cookbook and jump into your own project. When I started my one, a web extraction framework, I had no idea about documenting Ruby code, packaging the whole program into a gem, logging, writing unit tests (in Ruby) and automatizing these tasks (and a lot of other things - this post would be considerably longer if I would like to state everything). However, with the Ruby Cookbook by my side, learning and putting things into practice from writing the first line until packaging the whole framework into a gem was a piece of cake.

If you are unfamiliar with the O’Reilly cookbook series format, it is a set of ‘recipes’ (problem statement, solution, discussion) divided into categories (like Strings, Arrays, Hashes… in this case) for easy lookup of the problem at hand. While it would be possible and certainly edifying to read the book cover to cover from the start (in this case you should also consider that it has 873 pages), I found that it really shines when you are stuck with a problem: you search for the relevant category and the relevant problem, apply the solution, read the discussion to understand what’s going on under the hood, rinse, repeat and after the 3rd or so cycle you will find out that you are not reaching for the book anymore (at least not because of this problem).

OK, time to take a more detailed look at the content.

I would divide the book into five categories: Essentials, Ruby Specific Constructs, Advanced Techniques, Internet and networking and Software Management/Distribution. I will review them one by one briefly.

  • Essentials include Strings, Numbers, Arrays, Hashes, Date and Time, Files and Directories. For a beginner Ruby journeyman, these chapters are a real gold mine. Though the cookbook is not really intended for total beginners (it assumes a fair amount of Ruby knowledge), it certainly would not be impossible for a skilled (non-Ruby) programmer to understand most of the recipes since they are going from simple to complicated (e.g. the String chapter begins with concatenating strings and closes with showing off text classification with a Bayesian classificator).

    In this category I have probably learned the most Ruby best-practices from the chapters Arrays and Hashes [5]. As a constant lurker on the ruby-talk mailing list, I have had some hard time figuring out all those inject()s and collect()s and each_slice()s and each_cons()s and other enumerator/iterator things - when I have thought I already understood them, somebody came with an even more complicated example and I was not so sure once again - until the moment I bought the book, that is.

    The cookbook is very good at eliminating these vague and wobbly things like I had: you will not only understand what’s going on, but actually get comfortable using the idioms so typical for Ruby. That’s so great about it.

  • Ruby Specific Constructs featuring Objects and Classes, Modules and Namespaces, and Reflection and Metaprogramming. Every newcomer to Ruby encounters the wonders that (not exclusively but most characteristically) make the language so beautiful: code blocks, closures, mixins, the vast possibilities offered by metaprogramming and reflection just to mention some of them. This chapter is written exactly to examine and discuss these constructs.

    While probably I learned the most new things from this section, I have to say that I have been missing a meta-level here: The chapters (especially about metaprogramming) presented a lot of fancy LEGO bricks but did not show how to build a Statue of Liberty or Eiffel tower out of them (well, not even a simple medieval castle in my opinion :-) . Of course this does not need to be a problem - metaprogramming techniques should have a book on their own, and anyway a cookbook is not intended to solve concrete problems but rather reoccurring/frequent ones. Probably I am just too curious about the ways of the meta :-) .

    To sum it up, this and the previous section (Essentials) together helped to beef up my rubyish programming style by an enormous magnitude in the practice - nearly all information you need is there in the other books as well, but reading them does not make you comfortable with these techniques.

  • Advanced Techniques include XML and HTML, Graphics and Other File Formats, Databases and Persistence, Multitasking and Multithreading, User Interface, Extending Ruby with Other Languages, and System Administration. I was kind of unsure about this category - pairing UI with databases or system administration for example seemed odd for the first glance - but since I did not want to create even more categories, I have decided to put everything here which did not fit into the other ones, thus it can be viewed as a ‘miscellaneous’ section as well.

    I would like to review two chapters here - HTML/XML and Databases and Persistence since these are the closest to my field of expertise and I also believe these two were the most deep in this category. Again, this does not mean that the other chapters were not good, but in my opinion they just scratched the surface compared to above two.

    The HTML/XML chapter really has it all: parsing, validating, transforming, extracting data from XML documents, encoding and XPath handling to highlight some interesting topics. The coverage is surprisingly thorough for a language which is promoting YAML (Yaml Ain’t Markup Language) over XML. The HTML chapters, though there is just a few of them, are also very useful:-downloading content from Web pages, extracting data from HTML, converting plain text to HTML and vice versa. My only concern here is that I missed some third party package coverage (like RedCloth, BlueCloth, Hpricot or Mechanize) - but this is really nitpicking: if the author would take all my wishes into account, the book would have several thousand pages :-)

    Databases and Persistence starts off with serialization recipes (using YAML, Marhsal and Madeleine). Chapters on indexing unstructured as well as structured text (SimpleSearch, Ferret) are a pleasant surprise before the must-have topics take off: connecting and using different kinds of databases (MySQL, PostgreSQL, Berkley DB) as well as Object Relational Mapping frameworks (Rails ActiveRecord and Nitro Og) and doing every kind of SQL voodoo magic of course. What should I add? Probably nothing.[6]

    I would really like to write something about the other chapters in this category, too, but since I am constantly bashed for the length of my posts, just believe me that they are great as well :-) .

  • Internet and networking consists of Web Services and Distributed Programming, Internet Services and (surprise! surprise!) Web Development: Ruby on Rails. It would be really a cliché to write about why and how much the Internet is so important nowadays, how much Web 2.0 rocks, SOA and WS and REST and FOO and BAR rules etc. so I won’t do that ;-) . However, it is a fact that Web application development never mattered this much in the history - so these chapters were basically compulsory.

    I would divide the category into two subcategories - Internet/Web stuff and distributed programming.

    There is really not too much to add to the first category - there is an unbelievable amount of information crammed into two chapters: ‘abstract’ techniques (HTTP headers and requests, DNS lookup etc), using every kind of protocols (HTTP(s), POP, IMAP, FTP, telnet, SSH…), servlet, client/server and CGI programming as well as talking to Web APIs (amazon, flickr, google) and Web services of course (XML-RPC, SOAP). In my opinion, the category offers more than enough information to get started and/or explore advanced techniques.

    It’s a shame that Distributed Programming got the half of a chapter only - O.K., I admit I am somewhat inclined to these techniques and they are maybe not used by that much people. The action is revolving mostly around DrB and Rinda, with an exception of 2 MemcCached recipes. The chapter closes with a nice ‘putting things together’ recipe by creating a remote-controlled Jukebox.

    I did not get too deep into the Ruby on Rails chapter, since I read Agile Web Development with Rails as well as Ruby for Rails and a lot of much more advanced Rails stuff previously - but judging from the recipe titles and skimming through some of them, the chapter looks very informative and unquestionably helpful if you have had no prior experience with Rails.

  • Last but not least, Managing and Distributing Software includes Testing, Debugging, Optimizing, and Documenting, Packaging and Distributing Software and Automating Tasks with Rake. If you plan to use Ruby for any other task than system administration (or writing very short scripts/one liners for whatever reason), documenting, testing, debugging and automating tasks is absolutely crucial. I know that lot of coders does not like to hear this - since they want to code and not write tests, documentation etc. - but I think nowadays, a serious programmer, no matter how much she would like to concentrate on hacking up feature MyNextCoolStuffWhichWillShakeTheEarth has to master these things. In the long run, any software that is undocumented, tested and continuously refactored will turn into Spaghetti quite easily.

    That said, these chapters were excellent for me. I have experience with these tasks in Java - however, the toolset is radically different in some cases (like Ant vs. Rake) and even if it is similar (Unit tests, rdoc vs. JavaDoc) the re-learning of them was inevitable. Fortunately, with the help of these recipes it was a breeze to learn them in Ruby (well, I have to add that actually these things (as nearly everything else) are considerably easier to do in Ruby, so the ease of learning stems from this fact as well).

    Rake absolutely rocks. Maybe I am also concerned because I have been working with Apache Ant a lot - well, if the ratio between Ruby and Java code is say 1:10, then the ratio between Rake and Ant files is 1:50 if we also consider simplicity, maintainability and understandability.

    Finally, if you also plan to release your software, the chapter Managing and Distributing Software can come handy. I think if you would like to distribute your stuff to the masses, packaging it into a gem is inevitable - rubygems are so cool that they made Rubyists too lazy to download something from a site instead of launching ‘gem my_cool_software’.

Conclusion


If you would like to become a serious Ruby hacker, don’t hesitate to buy this book. In my opinion it is absolutely worth every cent - and even more. My only problem is that there are no more recipes - however this is not a critique but rather a compliment: you simply can not get enough - not even from nearly 900 pages. One could argue that some things are missing or he would rather see this instead of that (I believe the authors themselves have had some tough time deciding these matters) - but I guess everyone agrees that the material which made it to the book is absolutely top-notch. 5 out of 5 stars - a great addition to anyone’s Ruby bookshelf.

Notes


[1] It is absolutely possible to learn Ruby withouth spending a nickel - there are excellent Ruby tutorials out there, like Why’s poignant guide to Ruby ( with cartoon foxes and chunky bacon :-) ) or the first edition of the Pickaxe book which is available online for free, or Learning Ruby by Satish Talim, and a lot of other ones, too. For some beginner ruby exercises you can also check out my earlier post: 15 exercises for learning a new programming language - or just use google…Back


[2] I am not sure whether it was the best move to include ‘Rails’ in the title - it may turn down some who would like to learn Ruby but not Rails. However, I can assure you that this book is a true Ruby masterpiece. Though there are some interesting Rails techniques included, the primary focus is unquestionably Ruby. Back


[3] There is one possible exception: If you are new not only to Ruby but also to programming, you should probably check out Chris Pine’s Learning to program first. Back


[4] Of course there will be always some overlapping and not every book can be absolutely correctly categorized in every case (for example, the Ruby Way has also a cookbook-like chapters) Back


[5] Of course this does not mean that the rest of the chapters were not that helpful - just coming from Python, I did not have so much ‘wow’ moments. Nevertheless, they also teach a lot of idioms and are in no way less informative than the other two. Back


[6] Devil’s advocate(tm) says: maybe some chapters on SQLite and Oracle, as well as advanced SQL stuff would be cool - however, this is really mega-über nitpicking since then the title should be ‘Ruby and SQL cookbook’ :-) Back


tags:

Implementing ‘15 Exercises for Learning a new Programming Language’

Thursday, November 16th, 2006

A short time ago in a galaxy not so far, far away I came across a nice blog post: 15 Exercises for Learning a new Programming Language.

One could argue if these are *really* the most appropriate 15(+) exercises to learn a new programming language - however, the task of answering this rather complex question is left as an exercise for the reader. Instead of this I will show you their implementation in Ruby - rubyrailways.com style.

Why did I bother to solve these problems (including not really trivial ones, like a scientific calculator with a GUI) ? Well, actually to learn a new programming language! I still consider myself a beginner Ruby apprentice just playing it by ear in my somewhat scarce free time, so I thought that systematically implementing a task list like this will mean great step forward for me compared to just coding random things at random times. Fortunately I was perfectly right!

Before we move onto the code, one last disclaimer: the fact that I am still a Ruby n00b implies that the code can be somewhat hairy/not optimal/[insert any other language than Ruby here]-ish so don’t use these snippets as a textbook solution of the problems or anything like that. I would be glad if someone could suggest a bit of refactoring of the bad parts but I also hope that that there are some nice parts which you can learn from (actually I am quite sure about this since I used some magick formulas from a few Ruby (grand)masters in some cases).

OK, enough talk for now. Let’s see the stuff!

1. Problem: “Display series of numbers (1,2,3,4, 5….etc) in an infinite loop. The program should quit if someone hits a specific key (Say ESCAPE key).”

Solution: Hmm, well, errr…uh-oh… I could not solve this problem fully (what a terrific start :-) ). If Henry Ford would sit beside me now, he would say : You can hit any key to exit - so long as it’s ‘C’ - and one more advice: don’t forget to hold CTRL during this action :-) . More on this after the code snippet:

  1. i = 0
  2. loop { print "#{i+=1}, " }

Comments : If anyone knows how to add code which will cause this program to stop with a specific keyhit (say ‘ESC’) please, please, please drop me a note. I have been researching this for at least 10% of the time of solving all the tasks, nearly spitting blood when I gave up :-) . It seems (to me) that there is no simple (i.e. no threads and similar) and clean platform-independent solution for this problem. I guess (hope) the author’s idea here was different than to introduce threading or writing platform specific-code…

2. Problem: “Fibonacci series, swapping two variables, finding maximum/minimum among a list of numbers.”

Solution:

  1. #Fibonacci series
  2. Fib = Hash.new{ |h, n| n < 2 ? h[n] = n : h[n] = h[n - 1] + h[n - 2] }
  3. puts Fib[50]
  4.  
  5. #Swapping two variables
  6. x,y = y,x
  7.  
  8. #Finding maximum/minimum among a list of numbers
  9. puts [1,2,3,4,5,6].max
  10. puts [7,8,9,10,11].min

Comments: The Fibonacci code was written by Andrew Johnson (found via Ruby Quiz). I like it so much that I think it would be a shame to present a trivial version here. I guess the rest of the code is self-explanatory.

3. Problem: “Accepting series of numbers, strings from keyboard and sorting them ascending, descending order.”

Solution:

  1. a = []
  2. loop { break if (c = gets.chomp) == ‘q’; a << c }
  3. p a.sort
  4. p a.sort { |a,b| b<=>a }

Comments: This version is accepting strings - I think anybody who got to this point can adapt it to work with numbers.

4. Problem: “Reynolds number is calculated using formula (D*v*rho)/mu Where D = Diameter, V= velocity, rho = density mu = viscosity Write a program that will accept all values in appropriate units (Don’t worry about unit conversion) If number is < 2100, display Laminar flow, If it’s between 2100 and 4000 display 'Transient flow' and if more than '4000', display 'Turbulent Flow' (If, else, then...)"

Solution:

  1. vars = %w{D V Rho Mu}
  2.  
  3. vars.each do |var|
  4.   print "#{var} = "
  5.   val = gets
  6.   eval("#{var}=#{val.chomp}")
  7. end
  8.  
  9. reynolds = (D*V*Rho)/Mu.to_f
  10.  
  11. if (reynolds < 2100)
  12.   puts "Laminar Flow"
  13. elsif (reynolds > 4000)
  14.   puts "Turbulent Flow"
  15. else
  16.   puts "Transient Flow"
  17. end

Comments: Can you spot the trick in the part which is filling up the variables? They don’t go out of scope after the loop ends because they are constants. Other possibility would be to use $global variables but I guess it is usually not a very good programming practice to do that.

5. Problem: “Modify the above program such that it will ask for ‘Do you want to calculate again (y/n), if you say ‘y’, it’ll again ask the parameters. If ‘n’, it’ll exit. (Do while loop) While running the program give value mu = 0. See what happens. Does it give ‘DIVIDE BY ZERO’ error? Does it give ‘Segmentation fault..core dump?’. How to handle this situation. Is there something built in the language itself? (Exception Handling)”

Solution:

  1. vars = { "d" => nil, "v" => nil, "rho" => nil, "mu" => nil }
  2.  
  3. begin
  4.   vars.keys.each do |var|
  5.     print "#{var} = "
  6.     val = gets
  7.     vars[var] = val.chomp.to_i
  8.   end
  9.  
  10.   reynolds = (vars["d"]*vars["v"]*vars["rho"]) / vars["mu"].to_f
  11.   puts reynolds
  12.  
  13.   if (reynolds < 2100)
  14.     puts "Laminar Flow"
  15.   elsif (reynolds > 4000)
  16.     puts "Turbulent Flow"
  17.   else
  18.     puts "Transient Flow"
  19.   end
  20.  
  21.   print "Do you want to calculate again (y/n)? "
  22. end while gets.chomp != "n"

Comments: As you can see, I could not use the same trick here when asking for the variables, because when somebody wants to calculate again, Ruby will complain (although by printing a warning only) that the constants have been already set up. Therefore I went for the hash solution. I think the do-you-want-to-calculate-again part is straightforward so I won’t analyze that here.
“While running the program give value mu = 0.”
Ruby gives a rather interesting result in this case: infinity :-) .
“Is there something built in the language itself?”
Sure: exception handling. Division by zero could be caught with a ZeroDivisionError rescue clause.

6. Problem: “Scientific calculator supporting addition, subtraction, multiplication, division, square-root, square, cube, sin, cos, tan, Factorial, inverse, modulus”

Solution:
Since this code snippet is longer It would look ugly here - you can download it from here instead.

Screenshot:

screenshot of the scientific calculator in action

If you would like to try it, you will need the Tk bindings for Ruby (maybe you have them already, here on Ubuntu I did not). Also note that only the regular 0-9 keys (and of course the mouse) work, the numpad ones do not. One more little detail: % stands for modulo, not percent.

Comments: Phew, this was a real challenge, mostly because I never did any GUI in Ruby before. I was amazed that I could code up a relatively feature rich calculator in 100+ lines of code, without any golfing or trying to optimize for shortness. What I wanted to say with this is that the shortness does not praise my programming skills (since I did not eve try to golf) but the superb terseness of Ruby. OK, of course there are some problems (e.g. cube, cos, tan, inverse are not implemented) but the usability/amount of code ratio is unbelievably high.

The GUI is also not the nicest since I have used Tk - wxRuby or qt-ruby would produce much nicer results, but since I did not code any GUI in Ruby previously, I have decided to try the good-old-skool Tk for the first time.

7. Problem: “Printing output in different formats (say rounding up to 5 decimal places, truncating after 4 decimal places, padding zeros to the right and left, right and left justification)(Input output operations)”

Solution:

  1. #rounding up to 5 decimal pleaces
  2. puts sprintf("%.5f", 124.567896)
  3.  
  4. #truncating after 4 decimal places
  5. def truncate(number, places)
  6.   (number * (10 ** places)).floor / (10 ** places).to_f
  7. end
  8.  
  9. puts truncate(124.56789, 4)
  10.  
  11. #padding zeroes to the left
  12. puts ‘hello’.rjust(10,’0)
  13.  
  14. #padding zeroes to the right
  15. puts ‘hello’.ljust(10,’0)
  16.  
  17. #right justification
  18. puts ">>#{’hello’.rjust(20)}<<"
  19.  
  20. #left justification
  21. puts ">>#{’hello’.ljust(20)}<<"

Comments: Amazingly lot of things can be done with sprintf() - I could solve nearly all the problems with it - but that would not really be rubyish, so I have decided for built-in (and one homegrown) functions. However, mastering (s)printf() is a very handy thing, since nearly all big players (C (of course :-) ), C++, Java, PHP, … ) have it so you get a powerful function in more languages for the price of learning one). As you can see, r/ljust is a nice one, too.

8. Problem: “Open a text file and convert it into HTML file. (File operations/Strings)”

Solution: Well, this problem was not specified in a great detail, to say the least - or to put it otherwise, the solvers are given a great freedom to provide a solution spiced up with their fantasy. This is what I came up with: