Caching in development is important

Almost a year ago, I wrote about how to override caching when developing here, and only turning caching on when testing.  As it turns out that might not be a great idea.  Too many times I have had strange bugs on production that I could never figure out locally due to caching issues.  Only after a few hours of debugging did I realize it could have to do with caching.

There's also a really big reason why nobody's written an easy way to turn off caching in development, mainly because it's bad for you to see different behavior in development vs. production at any time, especially with queries and fragment caching.

The kind of caching you do want to turn off is class/controller caching for the sake of avoiding restarting your server just so it will pickup your new code.  

config.action_controller.perform_caching = false
config.cache_classes = false

So run memcached, do your fragment caching in development, and you should be good to go.  No reason not to cache queries or view renders.

Posted
 

Setting up Rails with Redis Resque and Rescue-Scheduler on Dotcloud

frozen fingertips

 

I learned a ton over the past week getting feedtopic.com's somewhat unique setup hosted on Dotcloud.  There are a few small caveats that you really have to pay attention to.

Before I get started, however, I need to give big ups to the Dotcloud team.  They are a fellow Y Combinator 2010 batch company, with amazing skill and genius, as well as incredible support ethics and all around amazing personalities.  I consider these guys my friends more than just tech support and company founders.

Now let's get down to what our backend is like.

Rails 3

Part of feedtopic.com is built on the Rails 3 framework.  This deals with our database, complicated machine learning / natural language processing, and custom algorithms we built on top.

Redis

We have decided to use Redis to deal with our queueing system.  We never need to store crazily marshalled objects, and json strings work great.

Resque and Resque-Scheduler

Naturally with redis, we use the resque gem to handle jobs in the redis queue, as well as resque-scheduler to basically do our cron for us.  It's slick because you can load up a front end on a subpath with Rack::URLMap and see how your queue and schedules are doing from a nice clean interface.  We originally used Redis To Go, so our configuration needed to be adapted when we setup the Dotcloud redis.

Postgresql

I want to mention we also chose postgres as our database.  We really don't need any fancy nosql setups or anything at this time.

Dotcloud

We chose Dotcloud for the job of hosting everything.  They are currently in "beta" mode, which means everything isn't as beautifully polished as it could be yet, but the support they provide, like REAL live support, is better than any docs you can get.  Because there's a few things to polish up, I decided to write this blog post with a few things we figured out along the way.  

Heroku (I still love Heroku though)

Another honest reason for choosing Dotcloud is, running this on Heroku in it's minimum "development" state would cost at least $72 a month.  The reason is, we need to run two workers on Heroku, and each one is about $36 a month.  That's a lot for two small workers (one resque, and one for resque-scheduler).  Also it's a bit hacked up on Heroku, because you need to run two apps, one for the resque job, and one for the resque-scheduler job.  Heroku only allows you to rake jobs:work, and that can only be mapped to one rake.  Thus you need to push a whole new app just to run the scheduler.  It works OK if you connect to the same Redis server, but the Redis server will again cost money (they use redistogo).  With Dotcloud, it's setup nicely and you can even setup your own Redis.  I definitely think Heroku will put something in place for this as well.  You could even get away with it if you use the cron add-on, but with a little less flexibility.

Also don't want to knock on Heroku.  These guys provide amazing service as well.  We've been using them successfully with Fanvibe.com for a year and a half now.  It's this special redis/resque setup that really prompted us to go with dotcloud.  Fanvibe.com not only powers our web property, but it also powers the backend of our iPhone app, and our API that powers all of the NBA's (National Basketball Association) properties too (iphone app, android app, ipad app, nba.com), so you can imagine how awesome Heroku can be.  Our backend doesn't just serve up pages, it cranks through live stats in real-time (sub seconds), uses heuristic algorithms to literally create on the fly prediction questions as well as ending answering and awarding people when the predictions are over, it also notifies people about news, stats, live scores, and what friends are watching and saying.  Anyway, I'll save Fanvibe talk for another post.  In a nutshell, I really spend very minimal time every worrying about Fanvibe on Heroku, so this in no way means Heroku isn't a great service.  It really is a great service.  The team over there is also very responsive and are great guys.  I have no doubt they'll really polish out a solution for resque/resque-scheduler soon enough.

Rails 3 setup

Going through the usual Dotcloud documentation, it's pretty straightforward.  When you first "deploy" your ruby app, you'll no doubt have a SystemTimer gem in your Gemfile.  Why?  Because resque gem docs tell you to put it in.  SystemTimer apparently fixes a crucial bug in Ruby 1.8.  I've heard this no longer exists on Ruby 1.9.  SystemTimer won't work on Ruby 1.9 without this fix being merged in, so you have two options:

  • deploy with dotcloud with the configuration of ruby 1.8, which is called "ree" in the config parameter.  ie. -c '{"ruby-version": "ree"}'
  • deploy without the SystemTimer gem and just let the default ruby 1.9 deal with it (this is what I ended up doing).

Another big problem was that, for some reason you'll need a nginx config fix on Dotcloud for some virtualhost problem.  This part I will need Dotcloud guys to step in.  I'm pretty sure they'll update the docs about this soon.  But if you're getting 404's on pages that you know work locally or on something like Heroku, ping them about the nginx config file and the vhost stuff.  More specifically, Sam over at Dotcloud put this together for me.

Postgresql Setup

I have our database setup on Dotcloud, just because it's easy.  One thing to note is that the password Dotcloud provides is pretty crazy, so feel free to wrap the password in double quotes, otherwise I believe it screws with the yaml.  Here's ours:

Redis Setup

The documentation is spot on here.  Note that I had previously setup a Redis To Go hosted redis, so these are more caveats on how to adapt that to your own redis setup on dotcloud. To connect it with your rails app, and your resque workers, you'll need to know a few things.

  • ENV["REDIS_URL"] doesn't really work on Dotcloud yet, so avoid that.  I would use a config/redis.yml file and load that in.  We had used an environment variable set in our development.rb / production.rb files per the instructions of Redis To Go.
  • The password again, is a bit crazy.  If you previously used Redis To Go, you'll see that they parse the URI, and that won't work with the Dotcloud super safe password.

Resque and Resque Scheduler Configuration

I'm going to group both of these here because they're related on how to set them up.  This part was a bit trickier, but we figured out a nice way of doing it.  I'm not going to go into how to get a working resque worker / scheduler working here, I'm going to assume you have it all working locally already.

 

supervisord.conf: This file is required at the root level of your app.  The problem here for a rails 3 app and the resque gems is that you need two different supervisord.conf files for different workers.  The different workers being a pure resque worker, and a resque-scheduler worker.  You can't put them both in the same file, but you also don't want to refactor the entire structure of the app to work in different directories on the rails app.  So we came up with cool solution

  • Each ruby-worker deployment has a unique $HOSTNAME variable.  It's basicaly whatever namespace.name you decided on when deploying.
  • Create two files, supervisord.conf_namespace.resque, and supervisord.conf_namespace.scheduler for example.
  • Make sure you also set the RAILS_ENV in the environment section of the config file.
  • Create a post install hook file: "postinstall" that creates a link to the correct supervisord config file, based on the hostname.  This will basically ensure the correct config file is used on a certain host!  FREAKIN SWEET

This is the resque worker's supervisord config file.  Notice we need to also set the rails environment to production.  For some reason it kept running on production for me.

This is the resque-scheduler's supervisord config file.  You can ignore the fact I have the environment setup there, the queue isn't used, for some reason I just still have it sitting there.

And lastly, our postinstall file.  This is basically making a link.  What a sweet hack.  Remember to chmod +x

We also added in a require in our Rakefile to include resque/tasks

That's it

Wow, ok that was a lot.  But it really seemed like a lot more when we were working on it.  The above were just the final conclusions we came to.  I'm pretty sure I've documented everything, but there is a possibility I left a few things out.  Again, these are very specific to our setup on Dotcloud, so what you have may vary.  Dotcloud guys might send me a few clarifications and I'll update as that comes in.

If you guys want anymore information about any of the topics, feel free to ask me in the comments or ping the Dotcloud team.

 

Photo: So I like posting photos I've taken with every post I make, regardless of how much sense it makes.  This was one I took of Issa when we visited Lake Tahoe.

Posted
 

Internet Explorer doesn't play nice with Rails 2+ respond_to blocks

 

One out of the millions of things wrong with Internet Explorer, is that it has strange Accept headers when sending a request to a server.  What happens in the end of it all is that IE request hits the first format you define in a respond_to block.  I found this funny transcript from http://garrickvanburen.com/archive/workaround-for-ie-overly-accepting-in-rail...

Here’s a conversation between Rails & Firefox

Firefox: “Hey Rails, I want this url”
Rails: “No problem, which format would you like it in?”
Firefox: “HTML, please.”
Rails: “Here you go.”

Here’s the same conversation with Internet Explorer

IE: “Hey Rails, I want this url”
Rails: “No problem, which format would you like it in?”
IE: “Whatcha got?”
Rails: “I’ve got Atom, and…”
IE: (interputting) “OK THANKS!”
Rails: “…um, what? I wasn’t finished, really? ok, here you go.”

The workarounds are pretty gross.  Every html request just pass along a :format => :html.  You can also make sure your AJAX calls goto .js if you put format.js blocks before anything else, or specifically set the request header for all AJAX calls as well like so:

xhr.setRequestHeader("Accept", "text/javascript" )

This google group thread is also a good quick read: http://groups.google.com/group/wellrailed/browse_thread/thread/619ad1b3c5a487cd

(photo is unrelated to topic! picture I took of my friend Anna)

Posted
 

XML Builder Partials in Rails

The past few days I've been creating an API with Rails and found that writing xml Builder files is slightly different then writing the usual erb in views.  The biggest problem was keeping things DRY and using partials, it just doesn't work the same way.  The following attempt to render does nothing in the xml.builder file:

render :partial => 'somepartial'

The reason is because the partial will receive a new xml Builder object rather than using the one already setup.  You can do a few tricks such as passing in the xml builder object in as a local variable, but I found out that the cleanest way to do it is:

xml << render :partial => 'somepartial'

Hope that helps people looking to make partials with the XML Builder in Rails

Posted
 

Create a simple API with Ruby on Rails

Here are the few easy steps to creating a simple API in a Ruby on Rails project.

  • Start a new rails app with the restful_authentication plugin by technoweenie: http://github.com/technoweenie/restful-authentication/
  • The restful_authentication plugin allows for user accounts, but doesn't have any API authentication built in.  If you want to make publicly available API keys for your users, you'll need to put this in so you can track API usage and deter any unauthorized use.  So assuming you have restful_authentication all setup with defaults, follow this tutorial for setting up API authentication: http://www.compulsivoco.com/2009/05/rails-api-authentication-using-restful-au...
  • Once you have the above api authentication applied, make sure all the actions that you want protected by the API authentication by adding a before filter:

before_filter :login_required, :only => [...array of actions to be protected...]

  • To render out xml for a certain object, you can simply use a respond_to when you're ready to render xml in the controller.

respond_to do |format|
  format.xml { render :xml => @some_object }
end

  • The above assumes you have an object that you want to return, and will dump the columns as needed.  If you want a prettier or custom return xml, I would recommend using the built in Builder that allows you to specify exactly what xml you want by creating a new view file called action_name.xml.builder and changing the respond_to line to the following:

respond_to do |format|
  format.xml
end

  • In your action_name.xml.builder, use the xml builder syntax to create your own xml file.  Here's a quick example:

xml.instruct!
xml.droplets do
  @droplets.each do |droplet|
    xml.droplet do
      xml.id droplet.id
      xml.name droplet.name
      xml.created_at droplet.created_at
    end
  end
end

  • You should test all of this using curl

http://localhost:3000/controller/action/param.xml?api_key=SOME_API_KEY

 

Posted
 

Snow Leopards and Rails

My turn to put in a few notes on what I did to get Rails to work nice with my newly updated Mac OS: Snow Leopard.  I got most of my information from Riding Rails blog post here: http://weblog.rubyonrails.org/2009/8/30/upgrading-to-snow-leopard.  I would recommend reading that through, but I found a few new things along the way:

  • Save your .bash_profile and your paths before doing the upgrade to Snow Leopard, you'll need them later.
  • Backup your databases, and be ready to reload them in after this is all done since we need to reinstall mysql with the 64 bit version
  • DO NOT use the latest mysql gem 2.8.x, it doesn't work.  Use the 2.7 version instead like this:
    • sudo env ARCHFLAGS="-arch x86_64" gem install mysql --version 2.7 -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
  • You must reinstall MacPorts before running the port commands listed in the blog post, you can now grab the latest Snow Leopard binary package here
  • You will have to upgrade --force install with your MacPorts to get the libraries updated correctly.  Takes a long time!  If any fail, just try deactivating the failed one first, then uninstalling it with -f, then reinstalling it.  If it's good, you will need to restart the upgrade --force install once more.  This will help resolve the issues there.
  • Use the script they have at the bottom of the post to see which gems you want to reinstall

After a lengthy MacPorts update, reinstalling gems, figuring out that I need the mysql 2.7 gem, getting git and some other paths like mysql back into my environment, everything fired right back up.  Now I have to reload my databases and I should be good to go.

Hope you guys get your stuff working!  If anything, read the comment thread on the Riding Rails blog, that'll help a ton!

Posted
 

memcached with passenger, ree, and the memcache-client gem

Memcached is pretty easy to setup, but there are a few items that are super strange.  This is how I set it up locally to test and on a production slice:

Production: Passenger 2.2.1, REE, rails 2.3.2, memcached, memcache-client gem, systemtimer gem, slicehost slice

Development: mongrels as usual, rails 2.3.2, ruby 1.8.6, memcached, memcache-client gem, systemtimer gem, Mac OS X Leapard

Development:

  • Install macports if you haven't already
  • sudo port install memcached
  • sudo gem install memcache-client
  • sudo gem install systemtimer
  • memcache -vv # this is the verbose for testing
  • I have yet to figure out how to get development working without memcached running when you don't care for it.

Production:

  • sudo apt-get memcached
  • sudo gem install memcache-client
  • sudo gem install systemtimer
  • memcache -d # this daemonizes it with the default IP to 127.0.0.1 and port 11211

Important Notes:

  • There is no configuration needed for apache/passenger.
  • memcache-client 1.5.0 is actually bundled with rails 2.1, but I would highly suggest upgrading to the memcache-client 1.7.2.  Read why here.  In a nutshell, it's WAY faster.
  • Check out his commit recently about system timer, this is extremely important for those running with REE.  He has not released this yet, but I've tried it on my slice and it works fine, so go ahead and edit your memcache.rb file with these changes.
  • Make sure you install all our gems with ruby enterprise.  Symlink your /usr/bin/ruby to the enterprise one, or make the enterprise ruby the first and only one to show up in your path.
  • Marshal serializes objects into memcached, and de-serializes them (is that the way to say it?) when you want to pull it out.  This way you can store more than just strings.  By passing in true as the third parameter of a fetch, you can do a raw add to memcache, but when you pull it back out, it will only come out as a string.  See more stuff later in this post about this.
  • Objects added or get'd from memcached need to be serializable!  Passenger gives you a horrible error message pointing to a line in memcache.rb where it does a Marshal.load or Marshal.dump, but tells you nothing else.  That's a good indication to check the objects you're returning in a fetch/add/get block to see if they are serializable.  If not write your own, see more further along in this post about it.
  • Passenger does smart spawning, which is great, but also freaks out memcached.  This is where we use memcache-client gem to do a reset whenever Passenger forks.  What smart spawning does, in short, is that whenever passenger needs a new worker process it loads it up with the an already loaded Rails application/framework, rather than loading the entire app and framework for each worker process.  It only does this once.  Really fast with REE and Passenger lined up.  REE improves this because it is copy-on-write friendly which means the worker processes will share as much memory as possible.  If you want to know more, read the two links in this bullet point that i mentioned.
  • How to re-establish connection with memcached in your rails app then specifically?  See below:

Setting up your production.rb to use memcached and to solve the smart spawning issue that Passenger has

# set cache classes to true
config.cache_classes = true
config.action_controller.consider_all_requests_local = false

# of course you want to perform caching
config.action_controller.perform_caching             = true

config.cache_store = :mem_cache_store
memcache_options = {
  :c_threshold => 10000,
  :compression => true,
  :debug => false,
  :namespace => 'a',
  :readonly => false,
  :urlencode => false

}

# require the new gem, this will load up 1.7.2 instead of using the built in 1.5.0
require 'memcache'

# make a CACHE global to use in your controllers instead of Rails.cache, this will use the new memcache-client 1.7.2
CACHE = MemCache.new memcache_options

# connect to your server that you started earlier
CACHE.servers = '127.0.0.1:11211'

# this is where you deal with passenger's forking
begin
   PhusionPassenger.on_event(:starting_worker_process) do |forked|
     if forked
       # We're in smart spawning mode, so...
       # Close duplicated memcached connections - they will open themselves
       CACHE.reset
     end
   end
# In case you're not running under Passenger (i.e. devmode with mongrel)
rescue NameError => error
end

And finally, the magical part of caching in your controllers:

CACHE.fetch('cachekey', 1.hour) { # block }

The above is using our CACHE global, that uses memcache-client 1.7.2.  memcache-client gem basically helps rails talk to memcached server.  No apache settings needed here at all.

The cache key should be unique enough so that the items in the block will be valid.

The second parameter is the timed expiration.  Be careful, Rails.cache.fetch accepts expiration as a hash parameter, :expires_in => 1.hour.  This is not the case for memcache-client, you must pass it in as a regular parameter.

The block can hold any ruby code you want.  It has access to any variables etc. normally accessible at this point.  The very last thing in that block is returned.  Great!  But something very important:

The returned object (and if a hash or array or an enum'd object, all objects there within) MUST be serializable.  If not, you're going to get some crazy ass error messages that says nothing about being serializable, and will give you headaches.  If something is not serializable, Marshalling will error out without telling you exactly what's happening.  You can write your own serialization for special objects to get around this, or save things in a custom hash if you don't really care about the entire object.  That custom hash would just hold all the values you need outside of the fetch block, and that hash can be returned.  here's an example of the error I got from Passenger when trying to hit an action that had an unserializable object returning:

[Sun May 03 23:14:57 2009] [error] [client 76.239.166.13] Premature end of script headers: amazon, referer: [ADDRESS_REMOVED_FOR_THIS_BLOG_POST]
[ pid=5227 file=ext/apache2/Hooks.cpp:546 time=2009-05-03 23:14:57.657 ]:
  Backend process 5255 did not return a valid HTTP response. It returned no data.
/opt/ruby-enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/memcache-client-1.7.2/lib/memcache.rb:335: [BUG] non-initialized struct

If you see the above, immediately check the objects returning from the fetch block.

That's all for now, hope that was helpful for those with passenger, ree, and the memcache-client gem!

Big thanks to all the help from Mike Perham, who maintains the memcache-client gem (amongst other amazing feats), Michael Simons who wrote about the Passenger issues with more emphasis with solutions when using memcache-client gem, and all the folks at memcached google group and phusion passenger google group.  And big help from joe who helped troubleshoot all the code the whole time and dealt with all my crap.

Posted
 

Rails cache store class and time-based expiry support with :expires_in option

Cache Store Class

Rails 2.x has an abstract cache store class, which is great to use for caching queries in the controller, but there are a few big gotchas that you'll need to figure out.  The fine print of the docs say little with a lot of links.  The basics of it are that you need to worry about your cache store implementation.  The docs recommends MemCacheStore.  I'm not sure what else you could use out of the box.

MemCacheStore uses memcached as cache storage, and is required to use :expires_in

:expires_in

So :expires_in won't work unless you specify in your settings that you're using the MemCacheStore implementation (or something similar) because MemCacheStore supports the :expires_in option with the write commands.  Otherwise your cache will not expire over time.  It's probably a better idea to use memcached and MemCacheStore on production as it's probably the best solution currently, than to write something to the database that saves off cache times and such.

If anyone has other suggestions to better solutiosn other than MemCacheStore, please post!  This is the only solution I've found looking through the docs.

Posted
 

Rails and Twitter Signin

For awhile I've been looking for a nice Twitter solution for Rails.  Sure I could've built something on my own, but I have been mostly looking out of curiosity.  The usual suspects were not the greatest, and I couldn't find a lightweight and elegant solution.  Then came TwitterAuth, a plugin written by Michael Bleigh.  It's made for Rails 2.3, mostly because of the Rails engine use, which is pretty slick and a whole other discussion altogether.

The fun part of TwitterAuth is that it uses oauth but is heavily influenced by the restful_authentication that the rails community has adopted as a very standard / solid way to do user authenticated accounts.  What does that mean?  that means it uses controller extensions like "logged_in?" and "current_user" so if you already use restful_authentication, this makes total sense.

Install TwitterAuth as a gem, or as a plugin.  Remember: you need oauth gem installed as well which is taken care of automatically with the gem install, but will be needed with the plugin instal method.

To quickly get into authenticating your users, goto the gem or plugin directory, and checkout his app directory that comes with it.  In there you'll see a user.rb model, a sessions_controller.rb, and some view partials!  This is exactly what you'll need, if not the only things you'll need to get your users immediately working with Twitter OAuth.  No need to write these yourself, grab these from his examples, and modify as needed.  Out of the box they worked perfectly for me.

Don't forget to get your consumer key and secret from Twitter.  Remember that if you send direct messages and stuff to twitter, it will come from the user you apply for the twitter key / secret with.  Meaning, if you use your FooBar twitter account to signup for the Twitter API key / secret, all direct messages will come from FooBar.  I have yet to figure out if we can send them from one specific person who we've authenticated in the past.  Should be easy.

To get the key and secret, you'll need to goto: http://twitter.com/apps.  This link is so buried, it took me forever to find.  That and I hadn't had coffee all day and I was on my 15th hour of working for the day.

Lastly, the OAuth callback is a bit tricky, because if you're working on localhost as a developer, it won't be able to... well, callback, unless you can give it a visible IP.  Without getting into tricks and sorcery, I just gave it a fake callback, and copy and pasted the parameters in the GET callback request and appended it to http://localhost:3000/oauth_callback. ; UPDATE: In the API Changeset of April 23rd, 2009, the oauth_callback is deprecated due to security issues, so no more localhost callback.  UPDATE: cleaverness of sorcery is actually attributed to joe.

Anyway, hope that was fun, go and authenticate yourself like crazy with Twitter =)

Posted
 

Rails and processing inbound emails

The TMail library is now included in the latest Rails, and handles inbound emails amazingly well.  No need to go into parsing MIME/SMIME for different email clients anymore, this does it all for you.  I found a lot of resources through searches on the subject, but they were slightly outdated and a bit confusing.  The quick and dirty of it is:

  1. Create a mailer
    • ruby script/generate mailer SomeNameMail
  2. def a receive method that accepts a TMail object - the email parameter is the TMail object in the example below:
    • def receive(email)
      ... # handle mail here (see next step) ...
      end

  3. Retrieve to, from, subject, attachments at more with very simple commands.  Note that email.to and email.from returns arrays in the case that there's more than one person being sent or recieved from, so make sure you grab all of them or just the first one:
    • @to = email.to.first
      @from = email.from.first

  4. And to get the body of the email with the 'text/html' content-type (meaning it comes with all the nice html tags) you need to do a little extra below:
    • @body = body_html(email)
      ...
      def body_html(email)
      result = nil
              if email.multipart?
                  email.parts.each do |part|
                      if part.multipart?
                          part.parts.each do |part2|
                              result = part2.unquoted_body if part2.content_type =~ /html/i
                          end
                      elsif !email.attachment?(part)
                          result = part.unquoted_body if part.content_type =~ /html/i
                      end
                  end
              else
                  result = email.unquoted_body if email.content_type =~ /html/i
              end
              result = email.body if result.nil?
              return result.strip
          end

TMail RDOC: http://tmail.rubyforge.org/rdoc/index.html

To get mail into the receive action is another story.  Read #46 in the Advanced Rails Recipes book for more information on how to do this.  The basics of it is to run a daemon that fetches mail from an inbox and feeds it as TMail to your mailer.  The mail_fetcher script mentioned in the Advanced Rails Recipes does a good job of it.

Attachments are also very simple, just email.attachments returns the array of attachments, which you can then save off to something like Paperclip or filecolumn.

Cheers!

Posted