Posterous
Arthur is using Posterous to post everything online. Shouldn't you?
3286618267_3db31c4cec_o_thumb
 

Arthur Chang

« Back to blog

Using TinyProxy with Rails

I have had quite the learning experience the last few days with a very unique situation.  For obvious reasons of concentrating on the core functionality of our app, we are using as many existing resources as possible, which basically means there's probably a startup that's sole purpose is to provide each individual piece of technology that we might need (other than the exact core functionality hopefully!).  The solution includes the use of Heroku as our backend web server to help scale and basically stay up, and a third party stats feed provider.  Stats as you know can be quite costly, and they restrict access to their feeds often by whitelisting your specific IP.

While we have a fairly static IP range with Heroku, we're also using Delayed Job as a worker in the backend that grabs live stats from our stats provider.  Unfortunately since we're running Delayed Job in the all powerful "cloud," the IP is ridiculously dynamic and has zero guarantee.  The only solution is to run a secured proxy from which the Delayed Job workers go through to maintain the static nature of an IP to be whitelisted.

In the future, Heroku's Morten says, Delayed Job workers will run through proxies to ensure IP's, but for now I had to setup my own on a separate VPS.  The real simple solution we have come up with is using TinyProxy running on Ubuntu Karmic on Slicehost that we use as our proxy to the stats server with all calls from our Delayed Job workers.  The code has been written easily enough for us to quickly switch out the proxy so that we can jump into Heroku's solution on moment's notice.  Reason for not just staying with our own proxy?  Just another process to maintain, monitor, and keep running (cost too for another VPS).

The setup for this entire process is quite simple once you get all the pieces in line.  Buy a VPS from Slicehost for example, and install Ubuntu.  From there install TinyProxy:

sudo apt-get install tinyproxy

That should also automatically start your default TinyProxy instance.  Check by listing the processes out, you will see a few:

 

root:~# ps -ef | grep tinyproxy

nobody    2826     1  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2827  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2828  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2829  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2830  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2831  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2832  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2833  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2834  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2835  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

nobody    2836  2826  0 03:18 ?        00:00:00 /usr/sbin/tinyproxy

 

Using the default values, you will see a number of tinyproxy processes running, the number here is the default for the number of tinyproxy's to run at a single time.  You can choose more or less depending on your needs, for now just run everything in default mode for testing.  The next step is setting up your Rails app to use the proxy.  Before that you have to allow your DelayedJob worker access using an IP, but we get into the problem of the DelayedJob IP being dynamic.  There are other options for authorizing use of the proxy, I'll let you choose that for yourself.  Read the docs for the config file for TinyProxy here.  OK enough, let's get on with it: fire up textmate or whatever your editor is and open your DelayedJob worker.

Our worker pulls a stats feed in XML using the (super fast) libxml-ruby plugin.  This is based off of the C libxml2 library, and is benchmarked as one of the fastest, at least as fast as other solutions like Nikogiri.  I would suggest either, but keep in mind that Nikogiri seems to be more modern but both are actively maintained and fast.  Avoid HPricot and REXML as their performance times are definitely not necessarily as fast in all situations.

We first get the XML feed by using the parxer: XML::Parser.file('path_to_feed').  The path_to_feed here is the URI to the stats provider, but there's no proxying going on here yet.  What we are using is the gem called rest_client made by Adam Wiggins of Heroku.  This is already installed on Heroku, so you won't need to worry about it (add it to your .gems manifest though just in case).  The most notable lines of code you will need are:

require 'rest_client'

RestClient.proxy = 'http://proxy.com:port'

RestClient.get = 'http://somefeed.com/feed.xml', :accept => 'text/xml'

First require the gem in your worker, then set the proxy.  The proxy is basically your slicehost IP, for example: http://204.123.123.123:8888.  After you can use the RestClient to get your feed (how restful!).  We have an extra header parameter to tell the server we're getting the feed from that we're accepting text/xml.  Our stats provider specifically requires us to tell them we can accept text/xml, otherwise they return the HTTP status code 406.  Other feed providers don't give a flying fart about what you accept, they give it anyway.  But best to be safe.  Now how do we get libxml-ruby to read this darn thing?  Easy, here it is:

 

RestClient.proxy = 'http://123.123.123.123:8888'

feed = 'http://livestats.com/get_feed/nba/boxscore.xml'

parser = XML::Parser.string(RestClient.get feed, :accept => 'text/xml')

doc = parser.parse

Of course the above is the watered down version of what we do, but you get the idea.  We make sure the proxy is already set, then get the feed with the text/xml accept header, then immediately parse.  If you want to see this in action, plug in all of the lines above into your console, and then tail your TinyProxy log file (tail -f /var/log/tinyproxy.log).  It will tell you who is connecting (the requester), and the request, then connect you, and close the connection.  We have a info message that says No Proxy for BLAH, where BLAH is the request server.  I'm thinking this is just that we don't have an explicit IP set for where this guy goes, but I haven't gotten that far yet, for now I just shrugged it off since it seems to be working =)

Anyway, enjoy playing around with RestClient, TinyProxy, libxml-ruby, and whatever else I mentioned.  It's all really cool stuff.

Big thanks to Morten at Heroku for awesome support and suggestions outside of Heroku!  Thanks to Matieu at Slicehost for giving me instantaneous support on our slice.  Superfeedr's Julien for his insight and suggestions in general on feeds.  And all the others who've helped steer me towards this solution.  Awesome to have such a great community that supports one another.

 

 
To leave a comment on this posterous, please login by clicking one of the following.
Posterous-login     twitter