header image

scrubyt_logo.png I am just working on a brand new release of scRUBYt!, with the intent of bringing AJAX/javascript scraping to the masses (and other great stuff - will announce the release soon).

Scraping js-heavy pages is not that trivial, among other things because of the asynchronous nature of Javascript. Quite frequently you click on a link (or do any other action triggering an AJAX call) which inserts/fills/pops up a div on the page and you want to navigate/get some data from the new content. However, it’s hard to impossible to determine when did the browser finish displaying the new data - the easiest solution is to wait a few seconds after an AJAX update, until the data is properly loaded.

In practice this means that all scRUBYt! navigation methods (clicklink, filltextfield, checkcheckbox, selectitem, …) need a decorated version, which waits a given amount of time after executing the navigation step. So for example, given the original method:

  1. def click_link(xpath)end

we want a decorated version:

  1. def click_link_and_wait(xpath, seconds)
  2.   click_link xpath #the original method
  3.   sleep seconds if seconds > 0
  4. end

For each and every method of the NavigationAction class.

Fortunately Ruby makes this really easy! Decorating the existing methods explicitly upon class creation:

  1. (instance_methods - Object.instance_methods).each do |old_method|
  2.     define_method "#{old_method}_and_wait" do |seconds|
  3.       send old_method ; sleep seconds
  4.     end
  5.   end

or implicitly, runtime:

  1. alias_method :throw_method_missing, :method_missing
  2.  
  3. def method_missing(method_name, *args, &block)
  4.   original_method_name = method_name.to_s[/(.+)_and_wait/,1]
  5.   if (method_name.to_s =~ /_and_wait/) && (respond_to? original_method_name)
  6.     self.send original_method_name ; sleep args[0]
  7.   else
  8.     throw_method_missing(method_name, *args, &block)
  9.   end
  10. end

As you can see, we are executing the decorated method only if it is defined on our class and it ends in andwait. In all other cases we simulate the normal methodmissingbehavior.

Since we know in advance that we want to decorate all the methods of the class, the second way doesn’t make much sense in this case - it’s slower because it has to go through method_missing every time, while there are no advantages - however the technique is interesting and applicable in other scenarios (e.g. when you don’t know in advance which methods are you going to decorate - for example adding a constraint to a filter in scRUBYt! (filters are also created dynamically runtime))



If you liked the article, subscribe to the feed   and follow me on twitter!.


      

2 Responses to “Decorating Instance Methods of a Class”

  1. Glenn Says:

    You might want to check out how the clickAndWait methods are implemented in Selenium. We’ve also got some custom code kicking around somewhere to wait for all AJAX requests to finish, there’s a method available (name I can’t recall) which tells you how many waiting/active connections there are.

  2. karlie Says:

    Wow this was potentially one of the most intelligent articles I have come across on the topic so far. I don’t know where you learn all of your data but I am impressed! I’m gunna send some people to this site to take a look at this post. Awesome, just plain fantastic. I’m have just started getting into writing articles myself, nothing remotely close to your writing skills (lol) but I would love for you to take my stuff someday! bowfelx series 7 treadmill

Leave a Reply




Bad Behavior has blocked 866 access attempts in the last 7 days.