ActiveResource is a great concept which consumes rails-style REST API but unfortunately most of the REST API's are not rails-style. This means that very frequently you will end up modifying ActiveResource to consume non rails-style REST API's. This article is about understanding ActiveResource and how to tweak/extend it to consume non rails-style REST API's. We will mainly concentrate on reading data i.e. the GET method.
Active Youtube 11
- Namespace in class names: Video, User, StandardFeed and Playlist classes have been moved to "Youtube" module, to prevent any conflicts with your ActiveRecord models.
- CustomMethods related change: In last version, only response from "find" was converting "entry" object to array of "entry" object. Now, the same behavior is implemented for custom http calls like Video.find().get(:comments)
- Small patch for better namespacing: Its basically some code from the rails trunk on ActiveResource, for better handling of namespaces while creating ActiveResource objects.
Gem Installation:
sudo gem install active_youtube |
Example Usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
#### Video ## search for videos on 'ruby' search = Youtube::Video.find(:first, :params => {:vq => 'ruby', :"max-results" => '5'}) puts search.entry.length ## video information of id = ZTUVgYoeN_o vid = Youtube::Video.find("ZTUVgYoeN_o") puts vid.group.content[0].url ## video comments comments = Youtube::Video.find_custom("ZTUVgYoeN_o").get(:comments) puts comments.entry[0].link[2].href ## searching with category/tags results = Youtube::Video.search_by_tags("Comedy") puts results[0].entry[0].title #### STANDARDFEED ## retrieving standard feeds most_viewed = Youtube::Standardfeed.find(:most_viewed, :params => {:time => 'today'}) puts most_viewed.entry[0].group.content[0].url #### USER ## user's profile - guthrie user_profile = Youtube::User.find("guthrie") puts user_profile.link[1].href #### PLAYLIST ## get playlist - multiple elements in playlist playlist = Youtube::Playlist.find("EBF5D6DC4589D7B7") puts playlist.entry[0].group.content[0].url |
Scrapi enhancements 2
We have been using ruby library Scrapi quite a lot for HTML Scraping in QuarkRank and other projects. Most of the times, I want to extract/scrape specific information from a page and directly dump it into the database. There were a few processes which were regularly repeated in my code, so as to make my code more DRY, I have enhanced Scrapi so that manipulations of extracted information becomes easier.
Lets consider an example, for each of the top 250 movies at IMDB, I want to extract and store in DB the following properties :
- imdb_id
- name
- release date
- rating
- tagline
- runtime
- director
ActiveResource and YouTube 8
This article is about consuming YouTube API in your Ruby/Rails project using ActiveResource. Moreover, this article is an example of how to extend ActiveResource to consume non rest-style API.
Benefits of using our extension to ActiveResource :
- ActiveResource provides a ActiveRecord style interface.
- You can modify our extension according to your interface requirement.
- No not need to use and rely on Ruby library for YouTube REST API.
Writing a web widget 6
Web widgets are widely used across the Internet but still lacks good documentation. From online advertisement to videos to blogs, widgets are highly used. Some of the popular widgets being Google Adsense, Youtube, MyBlogLog widgets and Twitter badges.
Note: This page will be slow to load because of many widget examples.
Table of Contents
To formally define : The web widget is a portable chunk of code that can be installed and executed within any separate HTML-based web page by an end user without requiring additional compilation. A widget adds some content to that page that is (mostly) not static. Generally widgets are third party originated, though they can be home made. Widgets are also known as modules, snippets, and plug-ins.
This article is my journey of understanding and making a widget myself. I have tried to make things look simple and insightful by taking a lot of examples.
Quarkshop : next-generation shopping 2
QuarkShop is a next-generation shopping experience to find product of your choice based on opinions across the Internet. You give your preference and Quarkshop will fetch the best matched products for YOU.
For launch, currently we have following products :
- Camera : http://www.quarkshop.com/cameras
- Mp3 player : http://www.quarkshop.com/mp3players
- Flat panel TV : http://www.quarkshop.com/flatpaneltvs
- Cell phone : http://www.quarkshop.com/cellphones
Search products
You can give your preference by selecting features you LIKE and we will find the best matched products. Selection of preference can be used along with other navigation parameters like brand, price, etc. These features are automatically extracted from reviews.
Then click submit, and you will see the three best matched products with their most related information. There is also an option to see more products. But we think that top 3 products is what consumer care about after they have given their preference, this makes searching for products simple and fast.
Choose your features
Top 3 products
Comparison : QuarkGraph
You can compare the top three products on the spot!. When you click on Compare Top Three button, you will see QuarkGraph, the graphical comparison showing scores of feature for each product. The graph is just not an image but you can play with it.
Features & Snippets : Voice of consumer
Features are keywords that are reviewed in a review. When you read a review you are always looking for such keywords and people's comment about this keywords. Quarkshop gives an option to choose features/keywords that you like and it will give you the best matched products. You can also read sentence from actual reviews giving opinion on keywords (Snippets).
Feedback
We would really appreciate if you could give your feedback. It would help us to know where we stand and what is required to make useful shopping experience.introducing QuarkRank 0

We're back to blogging after taking a leave for more than a month. We have been very busy developing QuarkRank, a summarized reviews repository. It is a result of more than 18 months of dedicated research on Natural language processing, HTML Scrapping and User interface. Finally, we are happy to make this product live!!!
Currently, the repository is accessible using RESTful API or Widget. Moreover, its absolutely free!
About QuarkRank
"From product reviews, restaurant reviews, hotel reviews, to others, QuarkRank provides the information for making decisions at the point of purchase. Proprietary technology lies at the core of QuarkRank's ability to automatically summarize the opinions of millions of consumer reviews on the internet."
QuarkRank is an intelligent engine which crawls the web for opinions on various products/services and automatically summarize them feature-by-feature using its natural language processing technique.

QuarkRank will help consumers to quickly educate themselves, based on the most unbiased information possible, without spending hours reading review online.
If you use QuarkRank data, your customers will feel confident in making purchase decision at your site, without going to competitors, and at the same time reducing the return rate of impulse purchases.
QuarkRank provides
- Reviews gathered and combined at one place! for various products and services.
- QuarkRank, no. of reviews for a product/service
- Top 5 feature, Worst 3 feature, All features that people talk about a product/service. For example : sound quality, design, screen of a Mp3 player.
- Feature score, buzz and SNIPPETS from reviews which have opinion about a feature.
No need to waste time analyzing reviews at Cnet and Amazon anymore!
Where can you use it?
- Boost online shopping experience of your users.
- Best and Worst about a product.
- Graphical feature-by-feature comparison.
- Power your navigation by giving feature as an option to choose.
- Summarized form of reviews.
- Add product widget to your blog/article.
- On your social profile, add widget of your favourite or owned products.
- Show feature-by-feature opinions and comparison for products at your retail store.
API
QuarkRank provides a RESTful API to access our huge repository of summarized reviews. Send us simple HTTP requests and it will send back basic XML responses, which means you can interact with our API from any language.
It provides data in XML and JSON format. There is no limit is using the api. For detailed information, visit : ActiveResource can be used to access QuarkRank's RESTful API in Ruby on Rails. Note : You need to apply this tiny patch to ActiveResource.ActiveResource code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
class Product < ActiveResource::Base self.site = "http://username:password@quarkrank.com" def self.list options={:category=>"camera"} find(:all, :params=>options) end def self.show sku, site=nil, all_features=false params = {} params[:site] = site unless site.nil? params[:all_features]="true" if all_features find(sku, :params=>params) end def self.search query find(:all,:params=>{:search=>query}) end end class Snippet < ActiveResource::Base self.site = "http://username:password@www.quarkrank.com/products/:product_id" def self.snippets product, name find name.gsub(" ","%20"),:params=>{:product_id=>product} end end |
Widget
QuarkRank provides two kinds of customizable widget.- Top 5 features
- Interactive widget for features and review snippets of a product
More informatin at:
Technologies and tools used
A lot of them ....- HTML scrapping : NLP, Scrapi, Firequark, CSS Selectors. Implemented in Ruby.
- Text-mining : Statistics, Text parsing, chunking, cleaning and many enhancements.
- API and QuarkRank site : Ruby on Rails, REST, Acts_as_solr, Request routing.
Coming next
QuarkShop : a mashup of QuarkRank, Cnet, Amazon, Shopping.com and Yahoo.com.Why I moved from Prototype to jQuery 31
jQuery is a JavaScript library which follows unobtrusive paradigm for application development using JavaScript. jQuery inherently supports Behavior driven development and is based on traversing HTML documents using CSS Selectors. On the other hand, Prototype is a JavaScript library for Class driven development which makes life easier working with JavaScript. Prototype library has a good support in Ruby on Rails via helper functions.
I have always used Prototype library for most of my projects until I was introduced to jQuery three months back ... and it enchanted me.
Sessions and cookies in Ruby on Rails 16
Table of Contents
Introduction
HTTP is a stateless protocol which creates problem in uniquely tracking a visitor to a web application. The process of managing the state between browser and server is through the use of session IDs which uniquely identifies a client browser.
Session IDs can be stored and communicated in one of the following ways :- Embedded in URL
- In form field
- Using cookies.
Information stored between multiple client browser request is called Session Data. Session data for each visitor can be stored at the server or in cookies. Upon client request to server, session data is extracted from session storage using session ID send by client browser. A good common example for session data is user information for authentication.
In the present times, its hard to imagine a good web application not using Sessions.
A wonderful article on implementation techniques of Session ID.Ruby on Rails Security Guide 129
Ruby on Rails does a decent job in handling security concerns in the background. You will have to configure your application to avoid few security attacks while plugins would be required for many security concerns which are not at all or poorly managed by rails.
In this article I have described the security issues related to a ruby on rails web application. I have followed DRY by linking to articles with good explanation and solutions to security concerns wherever required. This guide can also be used as a quick security check for your current web application.
Table of Contents
Advanced acts_as_solr 7
This article extends our acts_as_solr : search and faceting tutorial and talks about how to manage rails associations, solr indexes and more with acts_as_solr.
Table Of Contents
- Rebuild Solr index
- Import existing Solr index or your custom Solr schema.xml
- Highlight search term
- Rails associations and acts_as_solr
- Tips
Rebuild Solr Index
rebuild_solr_index is a class method to re-build your model indexes on import of external data. For large tables rebuilding Solr index is a time consuming process. See the fifth line in the pseudo code below (index optimization call), it makes rebuild_solr_index a slow process. For large tables, you do not want optimization to take place for each object added to the table. Whereas, removing optimization calls slows down the process of updating solr index.
1 2 3 4 5 6 7 |
## pseudo code
def rebuild_solr_index
for_each_row_in_table do |doc|
doc.save_to_solr_index
index.optimize
end
end |
The solution to the problem is to use batch_size in #rebuild_solr_index. With batch size, say for example 100, the index optimization call is executed after indexing 100 rows.
Firequark : quick html screen scraping 4
Table of Contents
Firequark is an extension to Firebug to aid the process of HTML Screen Scraping. Firequark automatically extracts css selector for a single or multiple html node(s) from a web page using Firebug (a web development plugin for Firefox). The css selector generated can be given as an input to html screen scrapers like Scrapi to extract information. Firequark is built to unleash the power of css selector for use of html screen scraping.
HTML screen scraping is a common technique of extracting information about specific and useful elements from a web page. Independent of programming language, for extracting an element from a web page one need to know its exact location or a key to uniquely identify the element. There are two approaches for uniquely identifying an element: using XPath or CSS Selectors.
Firebug has an inbuilt functionality of generating XPath for an html element. Ilya Grigorik has written a good article on using XPath for HTML screen scraping. Whereas, Firequark extends Firebug for generating CSS Selector for elements on a web page.
Example case : Lets take a practical example where you want to scrape Amazon.com. My goal is to get product name, price and rating for all the products from the Amazon point-and-shoot camera catalog page. I will use this example in screencast and explanation below.
HTML::Tag class in scrapi 1
While working with scrapi, I found there is no external documentation for HTML::Tag class. This article is to ensure no one says this again.
In scrapi, HTML::Node represents a html node which can be of 2 types: HTML::Text and HTML::Tag for a text node and html tag node respectively. Here is a code snippet in which scrapi returns the html node as a HTML::Tag object.
Domain Forwarding or URL Redirection 0
also known as URL Forwarding or Domain Redirection. Its a technique of making webpage available through many URLs.
Checkout wikipedia article on URL redirection for uses of redirection.
In Short,- Client Side Fowarding : URL in client browser changes.
- Server Side Redirection : URL in client browser does NOT change. User remains on same website/domain.
- Server Side Forwarding or DNS Forwarding : URL in client browser does NOT change. User is moved to NEW website.
All the above methods are explained below in detail. I will be using Ruby on Rails for illustration.
QScraper : hpricot interface to scrapi
QScraper is a wrapper over scrapi to provide Hpricot like interface.
Motivation: Hpricot interface is simple and easy to use while scrapi is more powerful because of bundle scraping and anonymous classes. I was using hpricot for quick testing and checking but scrapi for project implementation. To avoid working with two html scrapers, I wrote this wrapper over scrapi.
Bundle Scraping: It refers to extraction of multiple attributes of an element from a web page in a single parse. Most screen scraping tools extract only multiple elements but not multiple attributes of an element. Lets take an example of blog scraping, each blog post would be an element and I would like to extract multiple attributes of blog post like info about author, published on, title and content. Rather than making individual calls like doc.search(author_selector), doc.search(published_selector) etc., I would like to do doc.find(author_selector, date_selector, title ...).
