Moving from Mysql to Microsoft SQL Server in Ruby on Rails  1157


Recently one of our client asked us to make a shift from Mysql database to Microsoft SQL Server (Mssql) in a project where we have been developing in Ruby on Rails collecting online shopping data. We agreed thinking RoR comes with ActiveRecord ORM, so just changing adapter and configuration in config/database.yml should make it an easy solution. Right?

Not exactly. Why?
  1. Setting up MS Sql connection from rails application is a serious pain in the a** and I have to do days to research to get it right. I have shared my findings in following two sections.
Tagged as   
Posted on 24 May
1157 comment Bookmark   AddThis Social Bookmark Button

UTF-8 and html screen scraping in Ruby on Rails


Before I start, if you have any doubts or you are unaware about character sets (i.e. you are not familiar with words like utf-8, unicode etc), I would recommend you to read Joel's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

The problem statement of this article is How to handle foreign or accented characters in html screen scraping. We encountered it while working on our website information project Quarkbase.com

Configuration & Tools we will be using are rails - 2.1.2, scrapi - 1.2.0, hpricot - 0.8.1, curb - 0.4.4.0. The example website that we will try to scrape is http://196m.cn

At Quarkbase, when we aggregate information from a website, scraping information (having non-english characters) doesn't always work. For example, when we get description for http://196m.cn from its HTML page, the extracted information is the first line after 'overview' in screenshot below :



Filed in ruby on rails
Posted on 22 September
143 comment Bookmark   AddThis Social Bookmark Button Updated on 25 June

Consume non rails-style REST API's


ActiveResource is a great concept which consumes rails-style REST API but unfortunately most of the REST API's are not rails-style. This means that very frequently you will end up modifying ActiveResource to consume non rails-style REST API's. This article is about understanding ActiveResource and how to tweak/extend it to consume non rails-style REST API's. We will mainly concentrate on reading data i.e. the GET method.

Table of Contents

  1. Introduction
  2. Consume non rails-style REST API
    1. Create URL for remote resources
    2. Make a GET request
    3. Handling (Custom) Response
    4. Parse Response
    5. Create ActiveResource object from parsed response
    6. Other things to keep in mind
  3. Custom HTTP GET method tweaks
  4. Data Format
Filed in ruby tutorials
Tagged as   
Posted on 11 March
396 comment Bookmark   AddThis Social Bookmark Button Updated on 25 June

Active Youtube


ActiveYoutube is a gem to access YouTube API using ActiveResource. This gem wraps code from our previous post on extending ActiveResource to access YouTube. There have been minor changes, which are :
  1. Namespace in class names: Video, User, StandardFeed and Playlist classes have been moved to "Youtube" module, to prevent any conflicts with your ActiveRecord models.
  2. CustomMethods related change: In last version, only response from "find" was converting "entry" object to array of "entry" object. Now, the same behavior is implemented for custom http calls like Video.find().get(:comments)
  3. Small patch for better namespacing: Its basically some code from the rails trunk on ActiveResource, for better handling of namespaces while creating ActiveResource objects.

Gem Installation:

1
2

sudo gem install active_youtube

Example Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

#### Video
  ## search for videos on 'ruby'
  search = Youtube::Video.find(:first, :params => {:vq => 'ruby', :"max-results" => '5'})
  puts search.entry.length

  ## video information of id = ZTUVgYoeN_o
  vid = Youtube::Video.find("ZTUVgYoeN_o")
  puts vid.group.content[0].url

  ## video comments
  comments = Youtube::Video.find_custom("ZTUVgYoeN_o").get(:comments)
  puts comments.entry[0].link[2].href

  ## searching with category/tags
  results = Youtube::Video.search_by_tags("Comedy")
  puts results[0].entry[0].title

#### STANDARDFEED
  ## retrieving standard feeds
  most_viewed = Youtube::Standardfeed.find(:most_viewed, :params => {:time => 'today'})
  puts most_viewed.entry[0].group.content[0].url

#### USER
  ## user's profile - guthrie
  user_profile = Youtube::User.find("guthrie")
  puts user_profile.link[1].href

#### PLAYLIST
  ## get playlist - multiple elements in playlist
  playlist = Youtube::Playlist.find("EBF5D6DC4589D7B7")
  puts playlist.entry[0].group.content[0].url
Filed in our tools ruby
Tagged as    
Posted on 12 February
156 comment Bookmark   AddThis Social Bookmark Button Updated on 25 June

Scrapi enhancements


We have been using ruby library Scrapi quite a lot for HTML Scraping in QuarkRank and other projects. Most of the times, I want to extract/scrape specific information from a page and directly dump it into the database. There were a few processes which were regularly repeated in my code, so as to make my code more DRY, I have enhanced Scrapi so that manipulations of extracted information becomes easier.

Lets consider an example, for each of the top 250 movies at IMDB, I want to extract and store in DB the following properties :

  1. imdb_id
  2. name
  3. release date
  4. rating
  5. tagline
  6. runtime
  7. director
Filed in ruby on rails
Tagged as   
Posted on 30 January
7 comment Bookmark   AddThis Social Bookmark Button Updated on 23 February

ActiveResource and YouTube


This article is about consuming YouTube API in your Ruby/Rails project using ActiveResource. Moreover, this article is an example of how to extend ActiveResource to consume non rest-style API.

Benefits of using our extension to ActiveResource :

  1. ActiveResource provides a ActiveRecord style interface.
  2. You can modify our extension according to your interface requirement.
  3. No not need to use and rely on Ruby library for YouTube REST API.

Filed in ruby
Tagged as   
Posted on 15 January
10 comment Bookmark   AddThis Social Bookmark Button Updated on 23 February

Writing a web widget


Web widgets are widely used across the Internet but still lacks good documentation. From online advertisement to videos to blogs, widgets are highly used. Some of the popular widgets being Google Adsense, Youtube, MyBlogLog widgets and Twitter badges.

Note: This page will be slow to load because of many widget examples.

Table of Contents

  1. Before you start
  2. Flash widget
  3. HTML widget
  4. JavaScript widget
    1. Passive widgets
    2. Active widgets
  5. Examples
  6. Key issues

To formally define : The web widget is a portable chunk of code that can be installed and executed within any separate HTML-based web page by an end user without requiring additional compilation. A widget adds some content to that page that is (mostly) not static. Generally widgets are third party originated, though they can be home made. Widgets are also known as modules, snippets, and plug-ins.

This article is my journey of understanding and making a widget myself. I have tried to make things look simple and insightful by taking a lot of examples.

Filed in tutorials
Tagged as   
Posted on 07 January
20 comment Bookmark   AddThis Social Bookmark Button Updated on 15 June

Quarkshop : next-generation shopping



QuarkShop is a next-generation shopping experience to find product of your choice based on opinions across the Internet. You give your preference and Quarkshop will fetch the best matched products for YOU.

For launch, currently we have following products :

Search products

You can give your preference by selecting features you LIKE and we will find the best matched products. Selection of preference can be used along with other navigation parameters like brand, price, etc. These features are automatically extracted from reviews.

Then click submit, and you will see the three best matched products with their most related information. There is also an option to see more products. But we think that top 3 products is what consumer care about after they have given their preference, this makes searching for products simple and fast.


Choose your features



Top 3 products

Comparison : QuarkGraph

You can compare the top three products on the spot!. When you click on Compare Top Three button, you will see QuarkGraph, the graphical comparison showing scores of feature for each product. The graph is just not an image but you can play with it.

Features & Snippets : Voice of consumer

Features are keywords that are reviewed in a review. When you read a review you are always looking for such keywords and people's comment about this keywords. Quarkshop gives an option to choose features/keywords that you like and it will give you the best matched products. You can also read sentence from actual reviews giving opinion on keywords (Snippets).

Feedback

We would really appreciate if you could give your feedback. It would help us to know where we stand and what is required to make useful shopping experience.
Filed in our tools quarkrank
Tagged as   
Posted on 19 December
3 comment Bookmark   AddThis Social Bookmark Button Updated on 23 February

introducing QuarkRank



We're back to blogging after taking a leave for more than a month. We have been very busy developing QuarkRank, a summarized reviews repository. It is a result of more than 18 months of dedicated research on Natural language processing, HTML Scrapping and User interface. Finally, we are happy to make this product live!!!

Currently, the repository is accessible using RESTful API or Widget. Moreover, its absolutely free!

About QuarkRank

"From product reviews, restaurant reviews, hotel reviews, to others, QuarkRank provides the information for making decisions at the point of purchase. Proprietary technology lies at the core of QuarkRank's ability to automatically summarize the opinions of millions of consumer reviews on the internet."

QuarkRank is an intelligent engine which crawls the web for opinions on various products/services and automatically summarize them feature-by-feature using its natural language processing technique.




Best and worst of Apple Ipod Touch 8gb
QuarkGraph: QuarkRank data displayed using amcharts at Quarkshop


QuarkRank will help consumers to quickly educate themselves, based on the most unbiased information possible, without spending hours reading review online.

If you use QuarkRank data, your customers will feel confident in making purchase decision at your site, without going to competitors, and at the same time reducing the return rate of impulse purchases.

QuarkRank provides

  • Reviews gathered and combined at one place! for various products and services.
  • QuarkRank, no. of reviews for a product/service
  • Top 5 feature, Worst 3 feature, All features that people talk about a product/service. For example : sound quality, design, screen of a Mp3 player.
  • Feature score, buzz and SNIPPETS from reviews which have opinion about a feature.

No need to waste time analyzing reviews at Cnet and Amazon anymore!

Where can you use it?

  • Boost online shopping experience of your users.
    • Best and Worst about a product.
    • Graphical feature-by-feature comparison.
    • Power your navigation by giving feature as an option to choose.
    • Summarized form of reviews.
  • Add product widget to your blog/article.
  • On your social profile, add widget of your favourite or owned products.
  • Show feature-by-feature opinions and comparison for products at your retail store.

API

QuarkRank provides a RESTful API to access our huge repository of summarized reviews. Send us simple HTTP requests and it will send back basic XML responses, which means you can interact with our API from any language.

It provides data in XML and JSON format. There is no limit is using the api. For detailed information, visit : ActiveResource can be used to access QuarkRank's RESTful API in Ruby on Rails. Note : You need to apply this tiny patch to ActiveResource.
ActiveResource code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Product < ActiveResource::Base
  self.site = "http://username:password@quarkrank.com"

  def self.list options={:category=>"camera"}
    find(:all, :params=>options)
  end

  def self.show sku, site=nil, all_features=false
    params = {}
    params[:site] = site unless site.nil?
    params[:all_features]="true" if all_features
    find(sku, :params=>params)
  end

  def self.search query
    find(:all,:params=>{:search=>query})
  end

end

class Snippet < ActiveResource::Base
  self.site = "http://username:password@www.quarkrank.com/products/:product_id"

  def self.snippets product, name
    find name.gsub(" ","%20"),:params=>{:product_id=>product}
  end
end

Widget

QuarkRank provides two kinds of customizable widget.
  1. Top 5 features
  2. Interactive widget for features and review snippets of a product

More informatin at:

Technologies and tools used

A lot of them ....
  • HTML scrapping : NLP, Scrapi, Firequark, CSS Selectors. Implemented in Ruby.
  • Text-mining : Statistics, Text parsing, chunking, cleaning and many enhancements.
  • API and QuarkRank site : Ruby on Rails, REST, Acts_as_solr, Request routing.

Coming next

QuarkShop : a mashup of QuarkRank, Cnet, Amazon, Shopping.com and Yahoo.com.
Filed in our tools quarkrank
Posted on 17 December
1 comment Bookmark   AddThis Social Bookmark Button Updated on 18 September

Why I moved from Prototype to jQuery



jQuery is a JavaScript library which follows unobtrusive paradigm for application development using JavaScript. jQuery inherently supports Behavior driven development and is based on traversing HTML documents using CSS Selectors. On the other hand, Prototype is a JavaScript library for Class driven development which makes life easier working with JavaScript. Prototype library has a good support in Ruby on Rails via helper functions.

I have always used Prototype library for most of my projects until I was introduced to jQuery three months back ... and it enchanted me.

Filed in ruby on rails
Tagged as   
Posted on 06 November
33 comment Bookmark   AddThis Social Bookmark Button Updated on 23 February

Sessions and cookies in Ruby on Rails


An important issue rarely talked about with little documentation on Internet. So, here we go ... a guide to session and cookies in Rails. Session and cookies are an integral part of any good web application and rails has a good support for them. Continuing with our DRY approach, this guide contains link to cool articles with good description wherever necessary.

Table of Contents

  1. Introduction
  2. Sessions
    1. Session in rails
    2. Configure your sessions
    3. Storage options
    4. Session storage limitations
    5. Session and Security
    6. HowTo
      1. Implement session expiration
      2. Delete stale sessions
      3. Find out active users
      4. Access session data using session_id
    7. Miscellaneous
  3. Cookies
    1. Cookie on rails
    2. cookies vs. request.cookies
    3. CookieJar
    4. Miscellaneous

Introduction

HTTP is a stateless protocol which creates problem in uniquely tracking a visitor to a web application. The process of managing the state between browser and server is through the use of session IDs which uniquely identifies a client browser.

Session IDs can be stored and communicated in one of the following ways :
  1. Embedded in URL
  2. In form field
  3. Using cookies.

Information stored between multiple client browser request is called Session Data. Session data for each visitor can be stored at the server or in cookies. Upon client request to server, session data is extracted from session storage using session ID send by client browser. A good common example for session data is user information for authentication.

In the present times, its hard to imagine a good web application not using Sessions.

A wonderful article on implementation techniques of Session ID.
Tagged as  
Posted on 21 October
15 comment Bookmark   AddThis Social Bookmark Button Updated on 23 February

Ruby on Rails Security Guide


Ruby on Rails does a decent job in handling security concerns in the background. You will have to configure your application to avoid few security attacks while plugins would be required for many security concerns which are not at all or poorly managed by rails.

In this article I have described the security issues related to a ruby on rails web application. I have followed DRY by linking to articles with good explanation and solutions to security concerns wherever required. This guide can also be used as a quick security check for your current web application.

Table of Contents

  1. Authentication
  2. Model
    1. SQL Injection
    2. Activerecord Validation
    3. Creating records directly from parameters
  3. Controller
    1. Exposing methods
    2. Authorize parameters
    3. Filter sensitive logs
    4. Cross Site Reference(or Request) Forgery (CSRF)
    5. Minimize session attacks
    6. Stop spam on your website from DNS Blacklist
    7. Caching authenticated pages
  4. View
    1. Cross site scripting(XSS) attack
    2. Anti-spam form protection
    3. Hide mailto links
    4. Use password strength evaluators
  5. Miscellaneous
    1. Transmission of Sensitive information
    2. File upload
    3. Secure your setup / environment
    4. Mysql configuration
    5. Use good passwords
  6. Security plugins directory
Tagged as  
Posted on 20 September
85 comment Bookmark   AddThis Social Bookmark Button Updated on 23 February

Advanced acts_as_solr


This article extends our acts_as_solr : search and faceting tutorial and talks about how to manage rails associations, solr indexes and more with acts_as_solr.

Table Of Contents

  1. Rebuild Solr index
  2. Import existing Solr index or your custom Solr schema.xml
  3. Highlight search term
  4. Rails associations and acts_as_solr
  5. Tips

Rebuild Solr Index

rebuild_solr_index is a class method to re-build your model indexes on import of external data.

For large tables rebuilding Solr index is a time consuming process. See the fifth line in the pseudo code below (index optimization call), it makes rebuild_solr_index a slow process. For large tables, you do not want optimization to take place for each object added to the table. Whereas, removing optimization calls slows down the process of updating solr index.

1
2
3
4
5
6
7
8

## pseudo code
def rebuild_solr_index
  for_each_row_in_table do |doc|
    doc.save_to_solr_index
    index.optimize
  end
end

The solution to the problem is to use batch_size in #rebuild_solr_index. With batch size, say for example 100, the index optimization call is executed after indexing 100 rows.

Tagged as   
Posted on 14 September
6 comment Bookmark   AddThis Social Bookmark Button Updated on 23 February

Firequark : quick html screen scraping



Table of Contents
  1. Introduction
  2. Why Firequark?
    1. XPath vs. CSS Selector
    2. Find CSS Selector manually
    3. Bundle Scraping
  3. Usage - screencast
  4. Installation
  5. Documentation
  6. Todo

Firequark is an extension to Firebug to aid the process of HTML Screen Scraping. Firequark automatically extracts css selector for a single or multiple html node(s) from a web page using Firebug (a web development plugin for Firefox). The css selector generated can be given as an input to html screen scrapers like Scrapi to extract information. Firequark is built to unleash the power of css selector for use of html screen scraping.

HTML screen scraping is a common technique of extracting information about specific and useful elements from a web page. Independent of programming language, for extracting an element from a web page one need to know its exact location or a key to uniquely identify the element. There are two approaches for uniquely identifying an element: using XPath or CSS Selectors.

Firebug has an inbuilt functionality of generating XPath for an html element. Ilya Grigorik has written a good article on using XPath for HTML screen scraping. Whereas, Firequark extends Firebug for generating CSS Selector for elements on a web page.

Example case : Lets take a practical example where you want to scrape Amazon.com. My goal is to get product name, price and rating for all the products from the Amazon point-and-shoot camera catalog page. I will use this example in screencast and explanation below.

Filed in our tools
Posted on 05 September
226 comment Bookmark   AddThis Social Bookmark Button Updated on 15 June

HTML::Tag class in scrapi


While working with scrapi, I found there is no external documentation for HTML::Tag class. This article is to ensure no one says this again.

In scrapi, HTML::Node represents a html node which can be of 2 types: HTML::Text and HTML::Tag for a text node and html tag node respectively. Here is a code snippet in which scrapi returns the html node as a HTML::Tag object.

Filed in ruby
Tagged as   
Posted on 28 August
0 comment Bookmark   AddThis Social Bookmark Button Updated on 09 September