Emisonian: Joe Emison's Blog

biz | tech | psych | law

Automatically Mirror Your S3 Data on Rackspace Cloud Files (or Vice Versa) for under $7.00/month

Hosting data in the cloud has many advantages, but it usually has the distinct disadvantage of being beholden to a single vendor.  Outages do happen, even with S3 (see July 2008).  Until now, it was a huge pain–and a significant cost–to establish any kind of mirroring between S3 and any another major cloud storage vendor.  This is because you need to compare the file lists at both locations, and then copy whatever files are needed, which means that you need to have a server that is at least partly dedicated to this mirroring task.

However, as I have intimated, mirroring between Amazon S3 and Rackspace Cloud Files (“CF”) is now much easier, and very cheap.  Read on for the gory details.

To mirror between S3 and CF, one needs a running server, but only for as long as it takes to do the comparing and copying.  Also, the server doesn’t need a lot of power, since almost all of what it will do is transfer files from one service to another.  In fact, the cheapest Amazon EC2 server, the t1.micro, is perfect for this job, as is spot pricing for instances.  This means that the server costs can reliably be as low as 1 cent per hour.  So, in line with the philosophy I articulated in my “Servers are Software” article, I have created a model “multi-cloud-mirroring” server that launches periodically, runs until the files are synchronized between an S3 bucket and a CF container, and then self-terminates.

Ordinarily, this type of flexible, auto-launching, configuring, and terminating server would be time-consuming to create and would require another server of mine to launch it, but RightScale‘s advanced cloud management platform makes it very easy, and removes the requirement of having another server for launch.  I have created a public ServerTemplate for fellow RightScale users called the “Multi-Cloud Mirroring Manager“.  You can use that template in a RightScale autoscaling server array so that the server launches periodically (say, twice per day), and quits once done.  See my step-by-step tutorial on setting up the autoscaling array.

In addition to RightScale, I have leaned heavily on Python and both the boto and python-cloudfiles libraries.  I wrote a distinct script to handle mirroring from S3 to CF and vice versa, called multi-cloud-mirror, and I created a separate google code page for it.  Please feel free to do whatever you would like with the script; I have released it under the MPL.

Finally, let me explain my estimate of $7.00/month.  Assuming that you are already paying for storage on Amazon S3, and you will copy 5GB of data each month to Cloud Files, saving a rotating 6 months worth of data on Cloud Files, checking twice per day, and running two hours each time at the 0.01/hour spot pricing rate.  Based upon an average file size of 10MB, this would result in less than an additional $7.00/month in both S3 and CF storage, requests, and bandwidth costs.

About these ads

12 responses to “Automatically Mirror Your S3 Data on Rackspace Cloud Files (or Vice Versa) for under $7.00/month

  1. Ryan Geyer (@rjgeyer) October 14, 2011 at 1:02 pm

    Joe,

    This is an awesome automated solution for mirroring data between the S3 and CloudFiles object stores.

    I’d started to dabble with something similar but couldn’t find the time to get it completed, I’ll be putting this to work in my own deployment very soon!

    Thanks!
    -Ryan J. Geyer-
    Sales Engineer – RightScale

  2. Thorsten October 14, 2011 at 1:19 pm

    Pretty cool! Did you test whether a larger instance, such as a c1.medium, is worth it for the presumably increased bandwidth? I haven’t looked whether you copy files sequentially or whether you can copy several in parallel, that may make a difference. This only matters if the speed of copying is a bottleneck, of course…

    • joemastersemison October 14, 2011 at 1:42 pm

      My script uses the multiprocessing library for python, so you can run as many simultaneous copies as you want. You would likely see better performance with a bigger instance, but since my ServerTemplate is tied only to a 64-bit instance, you would want to start with m1.large, and I would anticipate that you would want to use a c1.xlarge to see significant performance improvement. I’ve been shuttling around 10GB per day with the t1.micro and 5 simultaneous processes, so I think that will handle most use cases.

  3. fishhookben September 14, 2012 at 11:45 am

    I’ve got this up and running, but after it launches your script, I get the following error:

    14:19:07: File "/usr/local/bin/multi-cloud-mirror.py", line 299, in run
    [srcService, srcBucketName, destService, destBucketName, serviceError] = self.getScenarioDetails(scenario)
    File "/usr/local/bin/multi-cloud-mirror.py", line 152, in getScenarioDetails
    [fromBucket, toBucket] = scenario.split('->')
    ValueError: need more than 1 value to unpack
    14:19:08: Script exit status: 1
    14:19:08: Script duration: 2.939942
    *ERROR> Execution failed
    *ERROR> External command error: RightScript exited with 1
    *ERROR> Subprocess exited with 1
    *RS> failed: Set Up and Run Multi-Cloud Mirroring v2

    Any suggestions for getting the mirror script to run successfully?

  4. Pingback: PiCloud and Princeton Consultants Win the First Amazon EC2 Spotathon : Cloud Computing

  5. Pingback: SmartCloud

  6. mdeora January 11, 2013 at 11:19 am

    Is there any script, to mirror cloud files container on s3.?

  7. Diego Cosalter January 8, 2014 at 12:11 pm

    Would it be possible to run this script to mirror between 2 different Openstack-based providers? For instance, mirror a bucket from Rackspace Cloudfiles to Softlayer ObjectStorage? I assume it’s possible, but I don’t know how it would handle the associations with /etc/boto.cfg and /etc/cloudfiles.cfg…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: