Some Rails Deployment Errors & Fixes

Error #1:
libssl.so.10: cannot open shared object file: No such file or directory – /efs/www/rprod/sites/digitalreadingroom.nypl.org/shared/bundle/ruby/1.9.1/gems/mysql2-0.3.11/lib/mysql2/mysql2.so

Solution:
The fix was to symlink it from a old version to this new one or update mysql client in the QA.

Error #2:
Could not find a JavaScript runtime. See https://github.com/sstephenson/execjs for a list of available runtimes. (ExecJS::RuntimeUnavailable)

Solution:
Adding the gems ‘execjs’ and ‘therubyracer’ – most of the time that works. Or installing node.js.  But in my case it did not work, ‘therubyracer’ gem depends on the gem ‘libv8’, which has newer version that was not working, so  forced ‘rubyracer’ to downgrade to version ‘0.10.2’ which made ‘libv8’ version to be downgraded and that solved it 🙂

Error #3:
We had two asset precompile issues: 1) Error compiling the assets because of css import issues. 2) Rendering issue after deployment.

Solution:
For #1, resolve the css issues. For #2, added the following to prod env file and made sure to run precompile in deploy.rb script:

config.assets.enabled = true

config.assets.compress = true

config.assets.precompile += %w( *.js *.css )

Error #5:
Mistakenly commented the line Bundler.require(:default, :assets, Rails.env), and totally forgot to uncomment it :/ And spend a good time investigating why none of the gems were available to the app! It took me a while to realize without that line an app has no clue to look for it gems in bundle path!!

Solution:
Uncomment that line!!!!

Fragment Caching in Rails App

Did fragment caching (using memcahe) in the Rails app today. Here are the things:
– Added the following in the view:

<% cache “my_cache_partial”, :expires_in => 1.hour do %>
<%= render :partial => “partials/my_partial” %>
<% end %>

– Added the gem ‘dalli’

– Then added the following in development.rb(and all other environemnts)

config.cache_store = :dalli_store, ‘localhost:11218’
DALLI_STORE = ‘localhost:11218’
config.action_controller.perform_caching = true

Note*: expire_in will not work without memcache!

rack-throttle to limit API request

For a API project I am using rack-throttle gem, which is a Rack middleware that provides logic for rate-limiting incoming HTTP requests to Rack applications.

Now, the usage was pretty straight forward as it is explained in the gem page but the requirements I had were different. So I had to extend the gem and customize the code to support my requirements. rack-throttle gem check limitation per host, I wanted to it per authentication token. Each API call must have a authentication token passed to get API result back, and I wanted to set different limit for different authentication-token, or you can say different user.

I used memcache to keep the counts. And I wanted the limit per day, so I extended Rack::Throttle::Daily.

Here is my code snippet in /lib/api_defender.rb

At First, in the initialization I want to check the Database for all the authentication tokens and their limits and want to populate memcache with those values:

require 'rack/throttle'
class ApiDefender < Rack::Throttle::Daily 
  def initialize(app)
     options = {
       :code => 403,
       :message => "Rate Limit Exceeded"
     }
     @app, @options = app, options
     api_auth_limits = User.select('authentication_token, api_req_limit')
     api_auth_limits.each do |a_l|
        Rails.cache.write(a_l.authentication_token+'limit', a_l.api_req_limit, expires_in: TOKEN_LIMIT_EXPIRATION_MIN)
     end
     rescue
       nil
   end

After that, the allowed method check if the request link eligible for this limit check? In my case, if the request is coming from a browser I do the limit per host, and if it not browser and pass authentication token I check the authentication token limit for this request. I also keep increasing the limit in memcache ot keep track and once it reaches the final limit per day I return forbidden.

  def allowed?(request)
    if request_is_browser?(request).blank?
      token_val = get_token_val(request)
      max_per_window = Rails.cache.read(token_val+'limit')
      count_increase = cache_incr(request, max_per_window)
      if max_per_window.blank?
        api_auth_limits = User.select('authentication_token, req_limit').where(:authentication_token => token_val)
        Rails.cache.write(api_auth_limits[0].authentication_token+'limit', api_auth_limits[0].req_limit, expires_in: TOKEN_LIMIT_EXPIRATION_MIN)
        max_per_window =  api_auth_limits[0].req_limit
        max_per_window = 0 if max_per_window.blank?
      end
      need_defense?(request) ? count_increase <= max_per_window : true
    else
      if need_defense?(request)
        token_expire_seconds = TOKEN_COUNTER_EXPIRATION_MIN*60
        token_expire_seconds_for_time = TOKEN_COUNTER_EXPIRATION_MIN*60 + 120
        host_ip = request.ip.to_s
        host_ip_limit = Rails.cache.read(host_ip+'limit')
        host_ip_count = Rails.cache.read(host_ip)
        if host_ip_limit.blank?
          Rails.cache.write(host_ip+'limit', BROWSER_API_REQ_LIMIT, expires_in: TOKEN_LIMIT_EXP_MIN)
          host_ip_limit = BROWSER_API_REQ_LIMIT
        end
        if host_ip_count.blank?
          current_time = Time.now
          Rails.cache.write(host_ip+'time', current_time, expires_in: token_expire_seconds_for_time)
          Rails.cache.write(host_ip, 1, expires_in: token_expire_seconds)
          host_ip_count = 1
        elsif host_ip_count <= host_ip_limit
          host_ip_count = host_ip_count + 1
          current_time = Time.now
          last_update_time = Rails.cache.read(host_ip+'time')
          seconds_left = ((token_expire_seconds  - (current_time - last_update_time)))
          Rails.cache.write(host_ip, host_ip_count, expires_in: seconds_left)
        end
        host_ip_count  max_per_window)

    token_expire_seconds = TOKEN_COUNTER_EXPIRATION_MIN*60
    token_expire_seconds_for_time = TOKEN_COUNTER_EXPIRATION_MIN*60 + 120
    if token_count.blank?
      current_time = Time.now
      Rails.cache.write(token_val, 1, expires_in: token_expire_seconds)
      Rails.cache.write(token_val+'time', current_time, expires_in: token_expire_seconds_for_time)
      count = 0
    else
      current_time = Time.now
      last_update_time = Rails.cache.read(token_val+'time')
      seconds_left = ((token_expire_seconds  - (current_time - last_update_time)))
      count = token_count
      Rails.cache.write(token_val, count + 1, expires_in: seconds_left)
    end
    count + 1
    rescue
      true
  end

And the need_defense check if this path should be validated for request.

  def need_defense?(request)
    request.fullpath.include? "api/v1/"
  end

The call() method is the one which return forbidden if limit exceeds or pass handle to the API application if the validation of limit pass.

  def call(env)
    request = Rack::Request.new(env)
    token = get_token_val(request)
    existing_time = Rails.cache.read(token+'time')
    token = request.ip.to_s if existing_time.blank?
    token = 'Test' if token.blank?
    allowed?(request) ? app.call(env) : rate_limit_exceeded(token)
  end

The rate_limit_exceeded method is customized to return results the way I want it to.

  def rate_limit_exceeded(token)
    existing_time = Rails.cache.read(token+'time')
    time_left = ((Time.now - existing_time)/60).round(2) if !existing_time.blank?
    time_attr = "min"
    time_attr = "time"  if time_left.blank?
    time_left = "later" if time_left.blank?
    http_error("{'error':[{'message':'Sorry, Rate Limit Exceeded, please try again after #{time_left} #{time_attr}','code':403}]}")
  rescue
    time_left = "later time"
    http_error("{'error':[{'message':'Sorry, Rate Limit Exceeded, please try again after #{time_left}','code':403}]}")
 end

Also, I had to add this custom class in the environment files:

  config.middleware.insert_after Rack::Lock, ApiDefender

rSolr!

Found a very good and easy article on rSolr at http://websolr.com/guides/clients-rsolr

Here are some points from the guide:

  • RSolr is a Ruby client for Solr. It is designed to be a simple, low-level wrapper that provides direct access to the Solr API.
  • rSolr can be used to connect through the url as follows:
    rsolr = RSolr.connect :url => 'the_link_to_solr'
  • add method is used to post docs to Solr. For example:
    rsolr.add([
      { :id => 'a', :name => "Something"},
      { :id => 'b', :name => "Somethign else"}
    ])
  • rSolr commit is used to commit the docs.
  • The select method sends requests to the Solr /selecthandler. It accepts a hash of parameters, which it serializes into the query string of its request. For ex:
    search = rsolr.select :params => { :q => "search_string" }
  • NOTE: The query is performed according to the query parser settings defined in solrconfig.xml and potentially your default query field specified in schema.xml.

More on Solr!

Some Note From the Book “Solr 1.4 Enterprise Search Server”:

Solr Index:
An index is basically like a single-table database schema. Imagine a massive spreadsheet, if you will. Inspite of this limitation, there is nothing to stop you from putting different types of data (say, artists and tracks from MusicBrainz) into a single index, thereby, in effect mitigating this limitation. All you have to do is use different fields for the different document types, and use a field to discriminate between the types. An identifier field would need to be unique across all documents in this index, no matter the type, so you could easily do this by concatenating the field type and the entity’s identifier. This may appear really ugly from a relational database design standpoint, but this isn’t a database.

Single Combined Index:
<field name=”id” … /> <!– example: “artist:534445″ –>
<field name=”type” … … –> <field name=”name” … <!– track fields: –> <field name=”PUID” … /> <field name=”num” … /> <!– i.e. the track # on the release –>

Problems with that?

  • There may be namespace collision problems unless you prefix the field names by type such as: artist_startDate and track_PUID.
  • If you share the same field for different things (like the name field in the example that we have just seen), then there are some problems that can occur when using that field in a query and while filtering documents by document type.
  • Prefix, wildcard, and fuzzy queries will take longer and will be more likely to reach internal scalability thresholds.
  • Committing changes to a Solr index invalidates the caches used to speed up querying. If this happens often, and the changes are usually to one type of entity in the index, then you will get better query performance by using separate indices.

Schema Design:
While doing schema design, a key thing to come to grips with is that a Solr schema strategy is driven by how it is queried and not by a standard third normal form decomposition of the data.

  • First determine which searches are going to be powered by Solr.
  • Second determine the entities returned for each search.
  • For each entity type, find all of the data in the schema that will be needed across all searches of it. By “all searches of it,” I mean that there might actually be multiple search forms, as identified in Step 1. Such data includes any data queried for (that is, criteria to determine whether a document matches or not) and any data that is displayed in the search results.For each entity type, find all of the data in the schema that will be needed across all searches of it. By “all searches of it,” I mean that there might actually be multiple search forms, as identified in Step 1. Such data includes any data queried for (that is, criteria to determine whether a document matches or not) and any data that is displayed in the search results.
  • If there is any data shown on the search results that is not queryable, not sorted upon, not faceted on, nor are you using the highlighter feature for, and for that matter are not using any Solr feature that uses the field except to simply return it in search results, then it is not necessary to include it in the schema for this entity. Let’s say, for the sake of the argument, that the only information queryable, sortable, and so on is a track’s name, when doing a query for tracks. You can opt not to inline the artist name, for example, into the track entity. When your application queries Solr for tracks and needs to render search results with the artist’s name, the onus would be on your application to get this data from somewhere—it won’t be in the search results from Solr. The application might look these up in a database or perhaps even query Solr in its own artist entity if it’s there or somewhere else.

Field Types:

The first section of the schema is the definition of the field types. In other words, these are the data types. This section is enclosed in the <types/> tag and will consume lots of the file’s content. The field types declare the types of fields, such as booleans, numbers, dates, and various text flavors.

Using copyField:

Closely related to the field definitions are copyField directives, which are specified at some point after the fields element, not within it. A copyField directive looks like this:
<copyField source=”r_name” dest=”r_name_sort” />
These are really quite simple. At index-time, each copyField is evaluated for each input document.

Solr and Lucene

Started Learning Solr recently. Here are some of the key points(from here):

  • Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP APIs, caching, replication, and a web administration interface.
  • Its a Information Retrieval Application.
  • Index/Query Via HTTP
  • Scalability – Efficient Replication To Other Solr Search Servers
  • Highly Configurable And User Extensible Caching

Solr Configuration:

  • schema.xml: Where the data is described
  • solrconfig.xml: Where it is decribed how people can interact with the data

Loading Data:

  • Documents can be added, deleted or replaced.
  • Message Transport: HTTP POST
  • Message Format: XML
  • Example:

<add><doc>
<field name=”id”>SOLR</field>
<field name=”name”>Apache Solr</field></doc></add>

Querying Data:

  • Transport Protocol: HTTP GET
  • Example:

http://solr/select?q=electronics

schema.xml:

Decides option for various fields.

  • Is it a number? A string? A date?
●Is there a default value for documents that don’t have one?
  • Is it created by combining the values of other fields?
  • Is it stored for retrieval?
  • Is it indexed? If so is it parsed? If so how?
  • Is it a unique identifier?

Fields:

  • <field>Describes How You Deal With Specific Named Fields
  • Example:

<field name=”title” type=”text” stored=”false” />

Field Type:

  • The Underlying Storage Class (FieldType)
  • The Analyzer To Use Or Parsing If It Is A Text Field
  • Example:

<fieldType name=”sfloat” sortMissingLast=”true” omitNorms=”true” />

Analyzer:

  • ‘Analyzer’ Is A Core Lucene Class For Parsing Text
  • Example:

<fieldType name=”text_greek” class=”solr.TextField>
<analyzer class=”org.apache.lucene.analysis.el.GreekAnalyzer”/>
</fieldType>

Tokenizers And TokenFilters:

  • Analyzers Are Typical Comprised Of Tokenizers And TokenFilters
  • Tokenizer: Controls How Your Text Is Tokenized
  • TokenFilter: Mutates And Manipulates The Stream Of Tokens
  • Solr Lets You Mix And Match Tokenizers and TokenFilters In Your schema.xml To Define Analyzers On The Fly
  • Example:

<fieldType name=”text” class=”solr.TextField”> <analyzer type=”index”>
<tokenizer class=”solr.WhitespaceTokenizerFactory”/>
</analyzer>
<analyzer type=”query”>
<tokenizer class=”solr.WhitespaceTokenizerFactory”/>
<filter class=”solr.SynonymFilterFactory” synonyms=”synonyms.txt” expand=”true”/>
</analyzer>

solrconfig.xml:

This is where you configure options for how this Solr instance should behave.Low-Level Index Settings

  • Performance Settings (Cache Sizes, etc…)
  • Types of Updates Allowed
  • Types of Queries Allowed

Note:    
● solrconfig.xml depends on schema.xml.
   ● schema.xml does not depend on solrconfig.xml.

Delayed Jobs

These days am working more on infrastructure or back ground tasks. Recently, I implemented Delayed Jobs for some background processes in our project.

Delayed Jobs(Steps):

  • Added the gems “gem ‘delayed_job’ and gem ‘delayed_job_active_record'” to the Gemfile.
  • Then ran:
  • -> rails generate delayed_job:active_record
  • -> rake db:migrate
  • Ran the following to start or stop the delayed job:
  • -> RAILS_ENV=production script/delayed_job start
  • -> RAILS_ENV=production script/delayed_job stop
  • Also, for local machine, we can start by running: rake jobs:work
  • To make any changes to config, I update the file ‘/config/initializers/delayed_jobs_config’, for example, in my case, I increase the max attempts and max run time.

Customized Jobs:

  • I needed to run a delayed job every time a record is updated/created in my DB.
  • So, I created a custom task which gets called and enqueued into delayed job.
My Custom Job:

class PerformJob < Struct.new(:id)
   def perform
      Rails.logger.info "#{API_URL}#{id}"
      url = URI.parse("#{API_URL}#{id}")
      response = Net::HTTP.get_response(url)
      Delayed::Worker.logger.add(Logger::INFO, "Response from delayed_job #{response.inspect}") 
   end
 end

The Way I Called It From Model Observer:

Delayed::Job.enqueue(PerformJob.new(id), 0, 1.minutes.from_now.getutc)

Delayed Job Log:

Delayed::Worker.logger.add(Logger::INFO, “Test!”)

And update /config/initializer/delayed_job_config.rb to add following:

Delayed::Worker.logger = Rails.logger
Delayed::Worker.logger.auto_flushing = true

Note: The capistrano deployment was giving error for delay job stop/start so added the gem ‘daemons’ to avoid it!

Projects & Libraries – Begining Ruby Guide!

  • Load and require will load the file mentioned, the difference is, load will every time its called, where as require will load once within the scope. When we use this, it looks into the current and some other directories to search for the file. The place where it has the directory list is on $LOAD_PATH. We can check it:
    $:.each {|fname| puts fname}

    To add more file to it:

     $:.push '/your/directory/here'
     require 'yourfile'

PART 2: Classes/Objects/Modules – Begining Ruby Guide!

  • Classes: A class is a collection of methods and data that are used as a blueprint to create multiple objects relating to that class.
  • Objects: An object is a single instance of a class. An object of class is a single person. An object of class is a single dog. If you think of objects as real-life objects, a class is the classification, whereas an object is the actual object or “thing” itself.
  • Local variable: A variable that can only be accessed and used from the current scope. Instance/object variable: A variable that can be accessed and used from the scope of a single object. An object’s methods can all access that object’s object variables.
  • Global variable: A variable that can be accessed and used from anywhere within the current program.
  • Class variable: A variable that can be accessed and used within the scope of a class and all of its child objects.
  • Encapsulation: The concept of allowing methods to have differing degrees of visibility outside of their class or associated object.
  • Polymorphism: The concept of methods being able to deal with different classes of data and offering a more generic implementation (as with the and methods offered by your and classes).
  • Module: An organizational element that collects together any number of classes, methods, and constants into a single namespace.
  • Namespace: A named element of organization that keeps classes, methods, and constants from clashing.
  • Mix-in: A module that can mix its methods in to a class to extend that class’s functionality.
  • Enumerable: A mix-in module, provided as standard with Ruby, that implements iterators and list-related methods for other classes, such as , , , and . Ruby uses this module by default with the and classes.
  • Comparable: A mix-in module, provided as standard with Ruby, that implements comparison operators (such as , , and ) on classes that implement the generic comparison operator .