Consume non rails-style REST API's  4


ActiveResource is a great concept which consumes rails-style REST API but unfortunately most of the REST API's are not rails-style. This means that very frequently you will end up modifying ActiveResource to consume non rails-style REST API's. This article is about understanding ActiveResource and how to tweak/extend it to consume non rails-style REST API's. We will mainly concentrate on reading data i.e. the GET method.

Table of Contents

  1. Introduction
  2. Consume non rails-style REST API
    1. Create URL for remote resources
    2. Make a GET request
    3. Handling (Custom) Response
    4. Parse Response
    5. Create ActiveResource object from parsed response
    6. Other things to keep in mind
  3. Custom HTTP GET method tweaks
  4. Data Format
Filed in ruby tutorials
Tagged as activeresource api 
Posted on 11 March
4 comment Bookmark   AddThis Social Bookmark Button

Active Youtube  11


ActiveYoutube is a gem to access YouTube API using ActiveResource. This gem wraps code from our previous post on extending ActiveResource to access YouTube. There have been minor changes, which are :
  1. Namespace in class names: Video, User, StandardFeed and Playlist classes have been moved to "Youtube" module, to prevent any conflicts with your ActiveRecord models.
  2. CustomMethods related change: In last version, only response from "find" was converting "entry" object to array of "entry" object. Now, the same behavior is implemented for custom http calls like Video.find().get(:comments)
  3. Small patch for better namespacing: Its basically some code from the rails trunk on ActiveResource, for better handling of namespaces while creating ActiveResource objects.

Gem Installation:


sudo gem install active_youtube

Example Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#### Video
  ## search for videos on 'ruby'
  search = Youtube::Video.find(:first, :params => {:vq => 'ruby', :"max-results" => '5'})
  puts search.entry.length

  ## video information of id = ZTUVgYoeN_o
  vid = Youtube::Video.find("ZTUVgYoeN_o")
  puts vid.group.content[0].url

  ## video comments
  comments = Youtube::Video.find_custom("ZTUVgYoeN_o").get(:comments)
  puts comments.entry[0].link[2].href

  ## searching with category/tags
  results = Youtube::Video.search_by_tags("Comedy")
  puts results[0].entry[0].title

#### STANDARDFEED
  ## retrieving standard feeds
  most_viewed = Youtube::Standardfeed.find(:most_viewed, :params => {:time => 'today'})
  puts most_viewed.entry[0].group.content[0].url

#### USER
  ## user's profile - guthrie
  user_profile = Youtube::User.find("guthrie")
  puts user_profile.link[1].href

#### PLAYLIST
  ## get playlist - multiple elements in playlist
  playlist = Youtube::Playlist.find("EBF5D6DC4589D7B7")
  puts playlist.entry[0].group.content[0].url
Filed in our tools ruby
Tagged as activeresource api plugin 
Posted on 12 February
11 comment Bookmark   AddThis Social Bookmark Button

Scrapi enhancements  2


We have been using ruby library Scrapi quite a lot for HTML Scraping in QuarkRank and other projects. Most of the times, I want to extract/scrape specific information from a page and directly dump it into the database. There were a few processes which were regularly repeated in my code, so as to make my code more DRY, I have enhanced Scrapi so that manipulations of extracted information becomes easier.

Lets consider an example, for each of the top 250 movies at IMDB, I want to extract and store in DB the following properties :

  1. imdb_id
  2. name
  3. release date
  4. rating
  5. tagline
  6. runtime
  7. director
Filed in ruby on rails
Tagged as html scraping scrapi 
Posted on 30 January
2 comment Bookmark   AddThis Social Bookmark Button

ActiveResource and YouTube  8


This article is about consuming YouTube API in your Ruby/Rails project using ActiveResource. Moreover, this article is an example of how to extend ActiveResource to consume non rest-style API.

Benefits of using our extension to ActiveResource :

  1. ActiveResource provides a ActiveRecord style interface.
  2. You can modify our extension according to your interface requirement.
  3. No not need to use and rely on Ruby library for YouTube REST API.

Filed in ruby
Tagged as activeresource api 
Posted on 15 January
8 comment Bookmark   AddThis Social Bookmark Button

Writing a web widget  6


Web widgets are widely used across the Internet but still lacks good documentation. From online advertisement to videos to blogs, widgets are highly used. Some of the popular widgets being Google Adsense, Youtube, MyBlogLog widgets and Twitter badges.

Note: This page will be slow to load because of many widget examples.

Table of Contents

  1. Before you start
  2. Flash widget
  3. HTML widget
  4. JavaScript widget
    1. Passive widgets
    2. Active widgets
  5. Examples
  6. Key issues

To formally define : The web widget is a portable chunk of code that can be installed and executed within any separate HTML-based web page by an end user without requiring additional compilation. A widget adds some content to that page that is (mostly) not static. Generally widgets are third party originated, though they can be home made. Widgets are also known as modules, snippets, and plug-ins.

This article is my journey of understanding and making a widget myself. I have tried to make things look simple and insightful by taking a lot of examples.

Filed in tutorials
Tagged as javascript widget 
Posted on 07 January
6 comment Bookmark   AddThis Social Bookmark Button

Quarkshop : next-generation shopping  2



QuarkShop is a next-generation shopping experience to find product of your choice based on opinions across the Internet. You give your preference and Quarkshop will fetch the best matched products for YOU.

For launch, currently we have following products :

Search products

You can give your preference by selecting features you LIKE and we will find the best matched products. Selection of preference can be used along with other navigation parameters like brand, price, etc. These features are automatically extracted from reviews.

Then click submit, and you will see the three best matched products with their most related information. There is also an option to see more products. But we think that top 3 products is what consumer care about after they have given their preference, this makes searching for products simple and fast.


Choose your features



Top 3 products

Comparison : QuarkGraph

You can compare the top three products on the spot!. When you click on Compare Top Three button, you will see QuarkGraph, the graphical comparison showing scores of feature for each product. The graph is just not an image but you can play with it.

Features & Snippets : Voice of consumer

Features are keywords that are reviewed in a review. When you read a review you are always looking for such keywords and people's comment about this keywords. Quarkshop gives an option to choose features/keywords that you like and it will give you the best matched products. You can also read sentence from actual reviews giving opinion on keywords (Snippets).

Feedback

We would really appreciate if you could give your feedback. It would help us to know where we stand and what is required to make useful shopping experience.
Filed in our tools quarkrank
Tagged as mashup shopping 
Posted on 19 December
2 comment Bookmark   AddThis Social Bookmark Button

introducing QuarkRank  0



We're back to blogging after taking a leave for more than a month. We have been very busy developing QuarkRank, a summarized reviews repository. It is a result of more than 18 months of dedicated research on Natural language processing, HTML Scrapping and User interface. Finally, we are happy to make this product live!!!

Currently, the repository is accessible using RESTful API or Widget. Moreover, its absolutely free!

About QuarkRank

"From product reviews, restaurant reviews, hotel reviews, to others, QuarkRank provides the information for making decisions at the point of purchase. Proprietary technology lies at the core of QuarkRank's ability to automatically summarize the opinions of millions of consumer reviews on the internet."

QuarkRank is an intelligent engine which crawls the web for opinions on various products/services and automatically summarize them feature-by-feature using its natural language processing technique.




Best and worst of Apple Ipod Touch 8gb
QuarkGraph: QuarkRank data displayed using amcharts at Quarkshop


QuarkRank will help consumers to quickly educate themselves, based on the most unbiased information possible, without spending hours reading review online.

If you use QuarkRank data, your customers will feel confident in making purchase decision at your site, without going to competitors, and at the same time reducing the return rate of impulse purchases.

QuarkRank provides

  • Reviews gathered and combined at one place! for various products and services.
  • QuarkRank, no. of reviews for a product/service
  • Top 5 feature, Worst 3 feature, All features that people talk about a product/service. For example : sound quality, design, screen of a Mp3 player.
  • Feature score, buzz and SNIPPETS from reviews which have opinion about a feature.

No need to waste time analyzing reviews at Cnet and Amazon anymore!

Where can you use it?

  • Boost online shopping experience of your users.
    • Best and Worst about a product.
    • Graphical feature-by-feature comparison.
    • Power your navigation by giving feature as an option to choose.
    • Summarized form of reviews.
  • Add product widget to your blog/article.
  • On your social profile, add widget of your favourite or owned products.
  • Show feature-by-feature opinions and comparison for products at your retail store.

API

QuarkRank provides a RESTful API to access our huge repository of summarized reviews. Send us simple HTTP requests and it will send back basic XML responses, which means you can interact with our API from any language.

It provides data in XML and JSON format. There is no limit is using the api. For detailed information, visit : ActiveResource can be used to access QuarkRank's RESTful API in Ruby on Rails. Note : You need to apply this tiny patch to ActiveResource.
ActiveResource code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Product < ActiveResource::Base
  self.site = "http://username:password@quarkrank.com"

  def self.list options={:category=>"camera"}
    find(:all, :params=>options)
  end

  def self.show sku, site=nil, all_features=false
    params = {}
    params[:site] = site unless site.nil?
    params[:all_features]="true" if all_features
    find(sku, :params=>params)
  end

  def self.search query
    find(:all,:params=>{:search=>query})
  end

end

class Snippet < ActiveResource::Base
  self.site = "http://username:password@www.quarkrank.com/products/:product_id"

  def self.snippets product, name
    find name.gsub(" ","%20"),:params=>{:product_id=>product}
  end
end

Widget

QuarkRank provides two kinds of customizable widget.
  1. Top 5 features
  2. Interactive widget for features and review snippets of a product

More informatin at:

Technologies and tools used

A lot of them ....
  • HTML scrapping : NLP, Scrapi, Firequark, CSS Selectors. Implemented in Ruby.
  • Text-mining : Statistics, Text parsing, chunking, cleaning and many enhancements.
  • API and QuarkRank site : Ruby on Rails, REST, Acts_as_solr, Request routing.

Coming next

QuarkShop : a mashup of QuarkRank, Cnet, Amazon, Shopping.com and Yahoo.com.
Filed in our tools quarkrank
Posted on 17 December
0 comment Bookmark   AddThis Social Bookmark Button

Why I moved from Prototype to jQuery  31



jQuery is a JavaScript library which follows unobtrusive paradigm for application development using JavaScript. jQuery inherently supports Behavior driven development and is based on traversing HTML documents using CSS Selectors. On the other hand, Prototype is a JavaScript library for Class driven development which makes life easier working with JavaScript. Prototype library has a good support in Ruby on Rails via helper functions.

I have always used Prototype library for most of my projects until I was introduced to jQuery three months back ... and it enchanted me.

Filed in ruby on rails
Tagged as css selector javascript 
Posted on 06 November
31 comment Bookmark   AddThis Social Bookmark Button

Sessions and cookies in Ruby on Rails  16


An important issue rarely talked about with little documentation on Internet. So, here we go ... a guide to session and cookies in Rails. Session and cookies are an integral part of any good web application and rails has a good support for them. Continuing with our DRY approach, this guide contains link to cool articles with good description wherever necessary.

Table of Contents

  1. Introduction
  2. Sessions
    1. Session in rails
    2. Configure your sessions
    3. Storage options
    4. Session storage limitations
    5. Session and Security
    6. HowTo
      1. Implement session expiration
      2. Delete stale sessions
      3. Find out active users
      4. Access session data using session_id
    7. Miscellaneous
  3. Cookies
    1. Cookie on rails
    2. cookies vs. request.cookies
    3. CookieJar
    4. Miscellaneous

Introduction

HTTP is a stateless protocol which creates problem in uniquely tracking a visitor to a web application. The process of managing the state between browser and server is through the use of session IDs which uniquely identifies a client browser.

Session IDs can be stored and communicated in one of the following ways :
  1. Embedded in URL
  2. In form field
  3. Using cookies.

Information stored between multiple client browser request is called Session Data. Session data for each visitor can be stored at the server or in cookies. Upon client request to server, session data is extracted from session storage using session ID send by client browser. A good common example for session data is user information for authentication.

In the present times, its hard to imagine a good web application not using Sessions.

A wonderful article on implementation techniques of Session ID.
Tagged as sessions 
Posted on 21 October
16 comment Bookmark   AddThis Social Bookmark Button Updated on 24 October

Ruby on Rails Security Guide  129


Ruby on Rails does a decent job in handling security concerns in the background. You will have to configure your application to avoid few security attacks while plugins would be required for many security concerns which are not at all or poorly managed by rails.

In this article I have described the security issues related to a ruby on rails web application. I have followed DRY by linking to articles with good explanation and solutions to security concerns wherever required. This guide can also be used as a quick security check for your current web application.

Table of Contents

  1. Authentication
  2. Model
    1. SQL Injection
    2. Activerecord Validation
    3. Creating records directly from parameters
  3. Controller
    1. Exposing methods
    2. Authorize parameters
    3. Filter sensitive logs
    4. Cross Site Reference(or Request) Forgery (CSRF)
    5. Minimize session attacks
    6. Stop spam on your website from DNS Blacklist
    7. Caching authenticated pages
  4. View
    1. Cross site scripting(XSS) attack
    2. Anti-spam form protection
    3. Hide mailto links
    4. Use password strength evaluators
  5. Miscellaneous
    1. Transmission of Sensitive information
    2. File upload
    3. Secure your setup / environment
    4. Mysql configuration
    5. Use good passwords
  6. Security plugins directory
Tagged as security 
Posted on 20 September
129 comment Bookmark   AddThis Social Bookmark Button Updated on 20 October

Advanced acts_as_solr  7


This article extends our acts_as_solr : search and faceting tutorial and talks about how to manage rails associations, solr indexes and more with acts_as_solr.

Table Of Contents

  1. Rebuild Solr index
  2. Import existing Solr index or your custom Solr schema.xml
  3. Highlight search term
  4. Rails associations and acts_as_solr
  5. Tips

Rebuild Solr Index

rebuild_solr_index is a class method to re-build your model indexes on import of external data.

For large tables rebuilding Solr index is a time consuming process. See the fifth line in the pseudo code below (index optimization call), it makes rebuild_solr_index a slow process. For large tables, you do not want optimization to take place for each object added to the table. Whereas, removing optimization calls slows down the process of updating solr index.

1
2
3
4
5
6
7
## pseudo code
def rebuild_solr_index
  for_each_row_in_table do |doc|
    doc.save_to_solr_index
    index.optimize
  end
end

The solution to the problem is to use batch_size in #rebuild_solr_index. With batch size, say for example 100, the index optimization call is executed after indexing 100 rows.

Tagged as plugin solr 
Posted on 14 September
7 comment Bookmark   AddThis Social Bookmark Button Updated on 19 December

Firequark : quick html screen scraping  4



Table of Contents
  1. Introduction
  2. Why Firequark?
    1. XPath vs. CSS Selector
    2. Find CSS Selector manually
    3. Bundle Scraping
  3. Usage - screencast
  4. Installation
  5. Documentation
  6. Todo

Firequark is an extension to Firebug to aid the process of HTML Screen Scraping. Firequark automatically extracts css selector for a single or multiple html node(s) from a web page using Firebug (a web development plugin for Firefox). The css selector generated can be given as an input to html screen scrapers like Scrapi to extract information. Firequark is built to unleash the power of css selector for use of html screen scraping.

HTML screen scraping is a common technique of extracting information about specific and useful elements from a web page. Independent of programming language, for extracting an element from a web page one need to know its exact location or a key to uniquely identify the element. There are two approaches for uniquely identifying an element: using XPath or CSS Selectors.

Firebug has an inbuilt functionality of generating XPath for an html element. Ilya Grigorik has written a good article on using XPath for HTML screen scraping. Whereas, Firequark extends Firebug for generating CSS Selector for elements on a web page.

Example case : Lets take a practical example where you want to scrape Amazon.com. My goal is to get product name, price and rating for all the products from the Amazon point-and-shoot camera catalog page. I will use this example in screencast and explanation below.

Filed in our tools
Posted on 05 September
4 comment Bookmark   AddThis Social Bookmark Button Updated on 06 September

HTML::Tag class in scrapi  1


While working with scrapi, I found there is no external documentation for HTML::Tag class. This article is to ensure no one says this again.

In scrapi, HTML::Node represents a html node which can be of 2 types: HTML::Text and HTML::Tag for a text node and html tag node respectively. Here is a code snippet in which scrapi returns the html node as a HTML::Tag object.

Filed in ruby
Tagged as html scraping scrapi 
Posted on 28 August
1 comment Bookmark   AddThis Social Bookmark Button

Domain Forwarding or URL Redirection  0


also known as URL Forwarding or Domain Redirection. Its a technique of making webpage available through many URLs.

Checkout wikipedia article on URL redirection for uses of redirection.

In Short,
  • Client Side Fowarding : URL in client browser changes.
  • Server Side Redirection : URL in client browser does NOT change. User remains on same website/domain.
  • Server Side Forwarding or DNS Forwarding : URL in client browser does NOT change. User is moved to NEW website.

All the above methods are explained below in detail. I will be using Ruby on Rails for illustration.

Filed in ruby on rails
Tagged as domain forwarding 
Posted on 25 August
0 comment Bookmark   AddThis Social Bookmark Button Updated on 28 August

QScraper : hpricot interface to scrapi


QScraper is a wrapper over scrapi to provide Hpricot like interface.

Motivation: Hpricot interface is simple and easy to use while scrapi is more powerful because of bundle scraping and anonymous classes. I was using hpricot for quick testing and checking but scrapi for project implementation. To avoid working with two html scrapers, I wrote this wrapper over scrapi.

Bundle Scraping: It refers to extraction of multiple attributes of an element from a web page in a single parse. Most screen scraping tools extract only multiple elements but not multiple attributes of an element. Lets take an example of blog scraping, each blog post would be an element and I would like to extract multiple attributes of blog post like info about author, published on, title and content. Rather than making individual calls like doc.search(author_selector), doc.search(published_selector) etc., I would like to do doc.find(author_selector, date_selector, title ...).

Filed in our tools ruby
Tagged as html scraping scrapi 
Posted on 23 August
0 comment Bookmark   AddThis Social Bookmark Button Updated on 05 November