Sometimes less is more

Update: A lot of people were disappointed that 10.minutes.ago etc. is not working in pure Ruby. Well, after executing the line require ‘active_support’ it does – I think this is a fairly small thing to do to enable these powerful features.

Every guide published on favorable writing principles emphasizes the power of brief and concise style. This is especially true in the case of technical texts, and in my opinion, in the case of well-designed programming languages as well.

Note the word well-designed. I did not say in the case of any (programming) language, since that would just not be true: conciseness can come at the cost of readability. (If you ever tried to read kanji, you know what I am talking about 😉 . However, I am claiming that in the case of a really well-designed programming language, succinctness helps readability, reduces bloat and leads to easier and faster understanding of the code. In my experience, the amount of boilerplate code to write is decreasing proportionally with the terseness of the programming language, ultimately leading to a coding style where you (nearly) don’t need to write boilerplate at all.

I will demonstrate this on a few Java vs. Ruby code examples. However, this is NOT a Ruby-bashing-Java article, but a few examples of idioms and interesting constructs; C++ vs Haskell or Lisp could serve equally well (sometimes even better), but since I am currently working with Java and Ruby on a daily basis, it is easier for me to use them.

If you are a pro Ruby and/or Java programmer, and/or you think the article is too long for you, please jump to the “Random Code Snippets” section.

Possibly the most straightforward reason why Ruby code is more readable even in shorter form is that really everything is an object [1] in Ruby-land. For example in Java, primitives need wrapper classes to ‘become’ objects., while in Ruby they are first class objects on their own. This makes constructs like

10.times { print "ho" }  #=> "hohohohohohohohohoho"

or (will output the same string)

print "ho" * 10 #=> "hohohohohohohohohoho"

possible.

There are a handful of other reasons which make Ruby more readable and elegant, but before I get bogged down in the explanation too much, let’s see the examples!

Whetting your appetite

In the first part I will describe some basic constructs which would make the life of any Java developer much easier. These techniques are neat, but they are not using any really sophisticated stuff yet: I will try to take a look at those in the next bigger section.

The empty program

Java:

class Test
{
    public static void main(String args[]) {}
}

Ruby:

   

I did not forget the Ruby snippet; You can not see anything there because actually a Ruby program doing nothing is exactly 0 characters long.
On the other hand, the Java version is slightly longer. I is kind of weird to explain to a newcomer what do ‘class’, ‘public’, ‘static’, ‘void’, ‘String’, the [] operator and several braces here and there mean, and why are they needed if the program does literally nothing

Fun with numbers

Note:For some of the next examples you will need to use Rails Active Support.
Java:

if ( 1 % 2 == 1 ) System.err.println("Odd!") #=> Odd!

Ruby:

if 11.odd? print "Odd!" #=> Odd!

Does not the first example make more sense (even for a non-programmer)?. I believe it does. More of this type:

Java:

102 * 1024 * 1024 + 24 * 1024 + 10 #=> 106979338

Ruby:

102.megabytes + 24.kilobytes + 10.bytes #=> 106979338

OK, maybe this is an unfair comparison since Java does not have (?) those functions. However, the point is that even if it had,
the best I could come up with would look like:

Util.megaBytes(102) + Util.kiloBytes(24) + Util.bytes(10) #=> 106979338

Which is far from the elegance and readability of the Ruby example.
In the next example we will assume that we have a Java function similar to ordinalize in Ruby.

Java:

System.err.println("Currently in the" + Util.ordinalize(2) + "trimester");

Ruby:

 print "Currently in the #{2.ordinalize} trimester"    #=> "Currently in the 2nd trimester"

In this example we can observe variable interpolation: anything wrapped in #{} inside double quotes gets evaluated
and substituted in the string, providing a more readable form without a lot of + + Java constructs (which is cool mainly if you have more variables inside the double quotes).

Dates

In my opinion, handling dates and times is a great PITA in Java, especially if you are implementing some complex code.
Java:

System.out.println("Running time: " + (3600 + 15 * 60 + 10) + "seconds");

Ruby:

puts "Running time: #{1.hour + 15.minutes + 10.seconds} seconds"

Java:

new Date(new Date().getTime() - 20 * 60 * 1000)

Ruby:

20.minutes.ago

Java:

Date d1 = new GregorianCalendar(2006,9,6,11,00).getTime();
Date d2 = new Date(d1.getTime() - (20 * 60 * 1000));

Ruby:

20.minutes.until("2006-10-9 11:00:00".to_time)

I think you do not have to be biased towards Ruby at all to admit which code makes more sense instantly…

I have recently found a very cool way of parsing dates in Ruby: using Chronic. However, I would not like to present it here since it is not a feature of the language, ‘just’ a nifty natural-language date parser [2].

A little bit more advanced stuff

Classes

Java:

Class Circle
  private Coordinate center, float radius;

  public void setCenter(Coordinate center)
  {
    this.center = center;
  }

  public Coordinate getCenter()
  {
    return center;
  }

  public void setRadius(float radius)
  {
    this.radius = radius;
  }

  public Coordinate getRadius()
  {
    return radius;
  }
end;

Ruby:

class Circle
  attr_accessor :center, :radius
end

Believe it or not, the two code snippets are absolutely equal; The getter and setter methods in Ruby code are generated automatically, so not only you do not have to write them, but they are not even there to clutter the code.

I have seen argumentation from Java guys that stuff like this (i.e. the public static void main … thing, getters/setters and other boilerplate code) can be generated with any decent GUI like Eclipse (or by tools like XDoclet etc) is a non-issue. Well, as for their generation, let us say this is true. But for the readability of code it is absolutely not!

For example. take getters/setters: Every variable in Java ads 8 more lines of code (not counting the lines between the function declarations) compared to the Ruby :attr_accessor idiom. That is, a simple class definition having 10 fields in Java will have 80+ lines of code compared to 1 lines of the same code in Ruby. For me, this definitely means a big difference.

Arrays (and other containers)

This section was inspired by a blog entry by Steve Yegge.

Arrays are interesting citizens of Java: They are not really objects in the “classical” sense , so they have very limited functionality compared to first-class Java objects. On the other hand, they are offering a huge advantage over the other container classes: they can be easily initialized.

Java:

String languages[] = new String[] {"Java", "Ruby", "Python", "Perl"};

instead of

List<String> languages = new LinkedList<String>();
languages.add("Java");
languages.add("Ruby");
languages.add("Python");
languages.add("Perl");

which is kind of lame when you quickly need to hack up some testing data.

However, they have also some serious problems: you have to define the number of the elements upon construction time, like so:

Java:

String someOtherLanguages<String>[] = new String[15];

which sometimes really cripples their functionality. [3]

How does this work in Ruby? Let’s see on three different examples (All three code snippets provide the same result):
Ruby:

stuff = [] #An empty array - as you can see there is no need to define the size
stuff << "Java", "Ruby", "Python" #Add some elements
#Initialize the array with the values
stuff = ["Java", "Ruby", "Python"]
#Yet another method yielding the same result
stuff = %w(Java Ruby Python)

In my opinion, these forms (especially the last one) are more straightforward and can save a lot of typing.

Another major shortcoming of Java arrays is that besides the [] operator you have only the methods inherited from Object and a single instance variable : length [4]. This means that even essential functionality like sorting, selecting elements based on something etc. has to be done via a ‘third party’ function, like this:

Java:

Arrays.sort(languages);

which seemed quite normal to me when I have been learning Java and have had no previous experience with dynamic languages, but now it looks kind of annoying.

Another Java-container-woe compared to Ruby is that in Java, an array is an array. A list is a list. A stack is a stack.
If you are wondering what the hell I am talking about, check out these Ruby code snippets:

Ruby:

stuff = %w(Java Ruby Python)
#Add the string "Perl" to the array
stuff << "Perl"
#Prepend the string "Ocaml" 
stuff.unshift "Ocaml"  
=> ["OCaml", "Java", "Ruby", "Python", "Perl"]
#Use the array as a stack
stuff.pop 
=> "Perl"  #stuff is now ["OCaml", "Java", "Ruby", "Python"] 
stuff.push "C", "LISP"
=> ["OCaml", "Java", "Ruby", "Python", "C", "LISP"]
#Update C to C++ 
stuff[4] = "C++"
=> ["OCaml", "Java", "Ruby", "Python", "C++", "LISP"]
#Remove the fisrt element
stuff.shift
=> "OCaml" #stuff is now ["Java", "Ruby", "Python", "C++", "LISP"] 
#Let's just stick with Java and Ruby - slice out the  rest!
stuff.slice!(2..4)
=> ["Python", "C++", "LISP"] #stuff is now ["Java", "Ruby"]

As you can see, the Ruby Array class offers functionality that could be achieved only by mixing up several Java containers into one (to my knowledge, at least) [5]. In practice, this usually speeds things up a lot.

Another thing that really annoys me when using containers in Java is the lack of this functionality:
Ruby:

stuff = ["OCaml", "Java", "Ruby", "Python", "C++", "LISP"]
#Lua is just gaining steam, add it to the 7th place
stuff[7] = "Lua"
=> ["OCaml", "Java", "Ruby", "Python", "C", "LISP", nil, "Lua"]

i.e. that if I am adding an element to an index which is bigger than the size of the array, the empty space inbetween is filled with nils. Now seriously, who would not exchange this for the Java behaviour (an IndexOutOfBoundsException is thrown) – after all, if I would need this functionality (which is VERY seldomly the case) in Ruby, I could check it myself and raise an exception if I don’t like what I see.

I wanted to write a bit about differences between hashes and files in Ruby and Java, but the post is already longer now then I wanted it to be so I guess I will just show some concrete code snippets to conclude.

Random Code Snippets

In this section I would like to present some real cases I have been solving with Ruby recently. Since I am still new to Ruby, I was totally amazed just how much more simpler, shorter yet much more understandable the code can be in Ruby compared to Java.

Files and Regular Expressions

As Bruce Eckel once put it, In Java, it’s a research project to open a file. Well, I have to agree. Maybe I am the only one Java programmer (besides Bruce) who – even after using Java professionally for five years – still can not write to a file without using google first. Maybe I should learn it one day?

Regular expression support in java is OK (at least one does not have to use external packages as in the pre-1.4 era), however, compared to Ruby the syntax is quite heavy.

Let’s see a demonstration on the following task: Open the file ‘test.txt’ and write all the sentences to the console (one sentence per line) which contain the word ‘Ruby’. First, the Java solution:

Java

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test 
{
	public static void main(String[] args)
	{
	    try {
	        BufferedReader in = new BufferedReader(new FileReader("test.txt"));
	        StringBuffer sb = new StringBuffer();
	        String str;
	        while ((str = in.readLine()) != null) 
	          { sb.append(str + "\n"); }	        
	        in.close();
	        String result = sb.toString();
	        Pattern sentencePattern = Pattern.compile("(.*?\\.)\\s+?");
	        Pattern javaPattern = Pattern.compile("Ruby");
	        Matcher matcher = sentencePattern.matcher(result);
	        while (matcher.find()) {
	            String match = matcher.group();
	            Matcher matcher2 = javaPattern.matcher(match);
	            if (matcher2.find())
	            	System.err.println(match);
	        }
	    } catch (IOException e) 
	      {
	    	e.printStackTrace();
	      }		
	}
}

It is quite straightforward what this relatively simple code snippet doing – but if this is straightforward, what should I say about the Ruby version?
Ruby

File.read('test.txt').scan(/.*?\. /).each { |s| puts s if s =~ /Ruby/ }

Well, umm… I guess this example quite much expresses the point I am talking about from the beginning: sometimes less is more, a.k.a. Succinctness is Power!

Again, I wanted to show much more examples, but I have the feeling that since the article is already too long, no one would read it 🙂 It is a big pity since I did not even talk about hashes, blocks, closures, metaprogramming (well, I will mention it briefly in the last (really :-)) example) and other goodies – maybe in part 2?

If this is still not enough…

Although I find it very easy and natural to express a lots of things in Ruby, the language can not offer anything I would ever need. However, there is a powerful concept to invoke in such situations, called metaprogramming.

A few days ago I needed to test some algorithms on trees, so I needed to hack up a lot of tree test data. Here is how I would go about this in Java using the example below (let’s assume in both languages that we have a simple data structure Tree):

               a
            /      \
          b         c
         /  \      / | \
        d    e    f  g  h

Java

Tree a = new Tree("a");

Tree b = new Tree("b");
Tree c = new Tree("c");
a.addChild(b);
a.addChild(c);

Tree d = new Tree("d");
Tree e = new Tree("e");
b.addChild(d);
b.addchild(e);

Tree f = new Tree("f");
Tree g = new Tree("g");
Tree h = new Tree("h");
c.addChild(f);
c.addChild(g);
c.addChild(h);

Another possibility would be to create an XML file with the description of the tree and parse it from there. This solution is even more convenient since though you have to write the parsing code, you just have to edit an XML file once it is written. One possibility how the tree of this example could look something like

XML

  <node name="a">
      <node name="b">
          <node name="d"/>
          <node name="e"/>
      </node>
      <node name="c">
          <node name="f"/>
          <node name="g"/>
          <node name="h"/>
      </node>
  </node>

The latter solution is quite cool. After all you do not need to write any code, just alter the XML file and that’s it.

Now let’s see the Ruby solution I came up with:
Ruby

tree = a {
            b { d e }
            c { f g h }
          }

Well… suddenly even the XML file seems too heavy, does not it? 🙂 Not to mention the fact that the latter example is pure Ruby code – there is no need to open an external file and parse it – you just run it and the variable tree will contain your tree. That’s it.

Of course Ruby can not handle this code as it is – for this we need to invoke some metaprogramming magic.
[6]

Metaprogramming is a way to drive Ruby with Ruby. Java (especially J2EE) is usually driven by XML (which is not always really a good thing in my opinion) As you could see, Ruby is driven by Ruby instead 🙂

This example merely scratched the surface of Ruby’s possibilities through metaprogramming. However, as with the other examples, my goal was not to advocate a concrete pattern/method over a different one, but rather to show how a specific toolset can change the way of thinking about the task at hand, and the way of code design/implementation in general.

Final thoughts

When I was a child, I spoke as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish things. – The Bible, I Corinthians 13:11

This thought pretty well expresses how I felt about Java/C++/(substitute any non-dynamic language here) when I came to know (some of) Ruby’s true dynamism and expressive power through terse yet powerful idioms which transformed my whole thinking about programming. Of course I do not claim that I ‘became a man’ because that’s still a very long way to go, but still, even with my very limited knowledge of Ruby, the way to express things in Java/C++ now seems… well… childish ;-). [7]

Creating a site on online tutorials is not a completely bad idea. One should include courses for 642-054, 642-054, 642-176, ruby on rails, cissp certification. If you have the knowledge for creating the tutorials, the rest is very simple. By using dot5hosting company’s site you can purchase a web hosting package that is economical. Even dedicated servers can be found for low prices. Then through the use of ip telephony one can directly reach its potential clients. Other internet marketing methods should also be considered to create awareness the site.

Notes

[1] I wonder whether this déjà vu will happen to me in the future once again: I have had this ‘Wow, everything is an object’ feeling when coming from C++ to Java; Then after coming from Java to Python; and most recently, after coming from Python to Ruby. Back

[2] Some examples that Chronic can handle:

  summer
  6 in the morning
  tomorrow
  this tuesday
  last winter
  this morning
  3 years ago
  1 week hence
  7 hours before tomorrow at noon
  3rd wednesday in november
  4th day last week

Kudos…
Back

[3] In the previous array initialization example this was not needed since it is trivial when all the elements are stated beforehand.Back

[4]which somewhat confusing given that all the other containers use the method size() (and not a field!) to determine the count of elements. To nicely mesh with the confusion, the String object provides the method length() (and not a field, as with array) to query the number of characters… Back

[5] I mean it is not possible to construct any container – other than an array – with literals, you can use the [] operator on arrays only, you can not get the i-th element of a stack etc. Back

[6] The code I have been using to accomplish this task relies on the method_missing idiom:

class Object
  @stack = []
  @parent = nil

  def method_missing(method_name, *args, &block)
    tree = Tree.new(method_name)
    @parent.add_child(tree) if @parent != nil
    if block_given?
      @stack ||=[]
      @parent = tree
      @stack.push @parent
      yield block
      @stack.pop
      @parent = @stack.last
    end
    tree
  end
end

Back

[7] This does not necessarily mean that Java is bad and Ruby is good – just that it was the ‘Ruby way’ that struck a chord in me after trying/playing around with programming in many programming languages. Many of the features I adore in Ruby are there in Java as well, but they did not ‘came through’ whereas with Ruby there was a point of enlightenment when I really understood a lot of generic, non-language specific principles. Back

Evaluating XPaths with indices in HPricot

Currently I am working on a Ruby web-extraction framework (more on this in a future post) and I have chosen Hpricot for the HTML parsing and XPath evaluation. So far, it seems to be the perfect choice – it is lightning fast (nothing I have encountered so far – REXML, Htree, RubyfulSoup – was even near in terms of speed), very intuitive to use, and it simply works for nearly everything you will need for day-to-day tasks.

However, since my Web-extractor is relying on XPath very heavily, sometimes I need things not included in the ‘nearly-everything’ bag of goodies – but so far I have always managed to come up with a solution. Last time it was the evaluation of XPaths with indices. This is what I hacked up:

def to_hpricot(xpath)
  "#{'(' * (xpath.split(/\d+/).size-1)}" +                 
  xpath.gsub(/\d+/) { |num| (num.to_i - 1).to_s }. 
        gsub(/\/(.*?)\[/) { |p| "/'#{$1}')[" }            
end

Whew, that’s ugly – but it works quite well. Let’s see a simple example:

doc = Hpricot("<p>A<b>very</b>simple<b>
  <i>small</i>test<i>document</i>.</b>Very cool.</p>")

eval(to_hpricot('doc/p[1]/b[2]/i[2]')) 
=> {elem <i> {text "document"} </i>}

OK, it is really just a hack – it does not consider special cases, does not handle anything besides indices (i.e. no axes, @’s or anything) etc. – but anyway this is hopefully just a temporary solution and _why will add evaluation of indexed XPaths to Hpricot soon. There would be at least one HPricot user who would be really delighted about it 🙂