Running R on Amazon’s EC2


This is a note for those who use R, but haven’t yet used Amazon’s EC2 cloud services.

Amazon’s EC2 is a type of cloud that provides on demand computing infrastructures called an Amazon Machine Images or AMIs. In general, these types of cloud provide several benefits:

  • Simple and convenient to use. An AMI contains your applications, libraries, data and all associated configuration settings. You simply access it. You don’t need to configure it. This applies not only to applications like R, but also can include any third-party data that you require.
  • On-demand availability. AMIs are available over the Internet whenever you need them. You can configure the AMIs yourself without involving the service provider. You don’t need to order any hardware and set it up.
  • Elastic access. With elastic access, you can rapidly provision and access the additional resources you need. Again, no human intervention from the service provider is required. This type of elastic capacity can be used to handle surge requirements when you might need many machines for a short time in order to complete a computation.
  • Pay per use. The cost of 1 AMI for 100 hours and 100 AMI for 1 hour is the same. With pay per use pricing, which is sometimes called utility pricing, you simply pay for the resources that you use.

Here are the main steps to use R on a pre-configured AMI.

Set up.
The set up needs to be done just once.

  1. Set up an Amazon Web Services (AWS) account by going to:

    aws.amazon.com.

    If you already have an Amazon account for buying books and other items from Amazon, then you can use this account also for AWS.

  2. Login to the AWS console
  3. Create a “key-pair” by clinking on the link “Key Pairs” in the Configuration section of the Navigation Menu on the left hand side of the AWS console page.
  4. Clink on the “Create Key Pair” button, about a quarter of the way down the page.
  5. Name the key pair and save it to working directory, say /home/rlg/work.

Launching the AMI. These steps are done whenever you want to launch a new AMI.

  1. Login to the AWS console. Click on the Amazon EC2 tab.
  2. Click the “AMIs” button under the “Images and Instances” section of the left navigation menu of the AWS console.
  3. Enter “opendata” in the search box and select the AMI labeled “opendata-analytic-images-us/odg-analytic-ubuntu-9.10-i386-server-20100119-1.manifest.xml”, which is AMI instance “ami-bf608dd6″.
  4. Enter the number of instances to launch (1), the name of the key pair that you have previously created, and select “web server” for the security group. Click the launch button to launch the AMI. Be sure to terminate the AMI when you are done.
  5. Wait until the status of the AMI is “running.” This usually takes about 5 minutes.

Accessing the AMI.

  1. Get the public IP address of the new AMI. The easiest way to do this is to select the AMI by checking the box. This provides some additional information about the AMI at the bottom of the window. You can can copy the IP address there.
  2. Open a console window and cd to your working directory which contains the key-pair that you previously downloaded.
  3. Type the command:
    ssh -i testkp.pem -X root@ec2-67-202-44-197.compute-1.amazonaws.com

    Here we assume that the name of the key-pair you created is “testkp.pem.” The flag “-X” starts a session that supports X11. If you don’t have X11 on your machine, you can still login and use R but the graphics in the example below won’t be displayed on your computer.

Using R on the AMI.

  1. Change your directory and start R

    #cd examples
    #R
  2. Test R by entering a R expression, such as:

    > mean(1:100)
    [1] 50.5
    >
  3. From within R, you can also source one of the example scripts to see some time series computations:


    > source('NYSE.r')

  4. After a minute or so, you should see a graph on your screen. After the graph is finished being drawn, you should see a prompt:

    CR to continue

    Enter a carriage return and you should see another graph. You will need to enter a carriage return 8 times to complete the script (you can also choose to break out of the script if you get bored with the all the graphs.
  5. To plot the time series xts.return and write the result to a file called ‘ret-plot.pdf’ use:

    > pdf("ret-plot.pdf")
    > plot(xts.return)
    > dev.off()

    You can then copy the file from the Instance to your local machine using the command:

    scp -i testkp.pem root@ec2-67-202-44-197.compute-1.amazonaws.com:/root/examples/ret-plot.pdf ret-plot.pdf
  6. When you are done, exit your R session with a control-D. Exit your ssh session with an “exit” and terminte your AMI from the Amazon AWS console. You can also choose to leave your AMI running (it is only a few dollars a day).

Acknowledgements: Collin Bennett from Open Data Group created the AMI.

Steve Vejcik from Open Data Group wrote the R scripts and configured the initial version of the AMI.

Special thanks to Uri Hasson who helped us improve the AMI image.

, , , ,

  1. #1 by rgrossman on May 28, 2009 - 7:23 pm

    For those interested in parallel R, you may want to consider some of the products offered by REvolution Computing, which can be used easily with Amazon’s EC2 instances.

  2. #2 by fhamilton on June 11, 2009 - 2:22 pm

    Robert, this was a great find being new to R and very helpful. Worked fine from my Mac but, we don’t have (and don’t want) X11 running locally on one of our Windows machines so, we thought we would run your scripts and then save the files as PDFs and then download them from the server configuration. However, it doesn’t seem to work with shell commands so, is there a way you would recommend to retrieve the pdf?

    Thanks, much.

  3. #3 by Robert Grossman on July 5, 2009 - 10:25 am

    fhamilton,

    Thanks for your comment. I have updated the post to show how to do this by writing the plot to a file and then retrieving the file using scp. Here are the commands:

    To plot the time series xts.return and write the result to a file called ‘ret-plot.pdf’ use:

    > pdf(”ret-plot.pdf”)
    > plot(xts.return)
    > dev.off()

    You can then copy the file from the Instance to your local machine using the command:

    scp -i testkp.pem root@ec2-67-202-44-197.compute-1.amazonaws.com:/root/examples/ret-plot.pdf ret-plot.pdf

    –Bob

  4. #4 by mkhayter on July 30, 2009 - 6:02 am

    Robert,
    Thanks for great information.
    The version of R installed as an AMI is 2.8.0
    Are there any AMI public images with R release 2.9.0 or 2.9.1?

  5. #5 by francescamoyse on September 1, 2009 - 9:07 pm

    Hi Robert – agree with other commenters this is fun stuff :-) .

    I’m biased, but would nevertheless love to know what you and readers of this blog think of our recently launched service Monkey Analytics (http://www.monkeyanalytics.com) which abstracts the AMI management and generation detailed here and delivers Octave, Python, and now R computation in the cloud on EC2 servers.

    At fhamilton – not sure if we solve your problem just yet, as we aren’t R experts (we spent more time in Matlab / Python / PV Wave in the past), but our approach with GNU Octave and Python is to wrap image / figure generation commands, spit out images, and deliver those in the browser via our web app.

    (R was the number one feature request post launch, and we just got it working a few days ago).

    We’re pretty excited about what we’re up to, and love being part of the discussion about how best to use cloud computing to get science computation done.


    Francesca Moyse | Founder, Monkey Analytics | francesca@monkeyanalytics.com

  6. #6 by rceed on February 12, 2010 - 7:44 am

    there’s a resource (they call it r-workbench or r-cloud) that is supposed to provide nice gui and cluster / cloud capabilities..
    http://wwwdev.ebi.ac.uk/Tools/rcloud

  7. #7 by rceed on February 12, 2010 - 7:50 am

    they demoed it and said R version is updated regularly. A pretty nice tool i must say, lot’s of gui like features drag-drops, build-in graphics, development of packages, etc, etc, whereas command line R is a stone age comparing to this. highly recommended.

  8. #8 by Tal Galili on February 17, 2010 - 5:32 am

    Hi there,

    I couldn’t find a “contact us” page, so I am writing to you this massage here:

    I run the service R-bloggers:
    http://www.r-bloggers.com/about/
    An aggregator of R related articles, from blogs.

    And wanted to encourage you to join R-bloggers.com at:
    http://www.r-bloggers.com/add-your-blog/

    The idea behind the project is to share readers in order to gain readers: R-bloggers already has over 800 RSS subscribers (that are growing everyday).

    I built it in order to find all the R bloggers out there. So far I found over 45 bloggers, which also agreed to add there feed (and some to give a link back and post about it).

    And would love it if you might agree to join as well.

    Feel free to erase this comment if it clutters the blog too much.

    All the best,
    Tal

  9. #9 by Ernesto Armijo on May 13, 2010 - 11:12 am

    Hi Robert:
    This post is fantastic. R in the cloud! It would be really nice if you guys could add RapidMiner to the list of software available for use under your plan. RM is extremely friendly and now it includes an add-on to do parallel processing. It would shine in Amazon ECS.

(will not be published)