Hosting data in the cloud has many advantages, but it usually has the distinct disadvantage of being beholden to a single vendor. Outages do happen, even with S3 (see July 2008). Until now, it was a huge pain–and a significant cost–to establish any kind of mirroring between S3 and any another major cloud storage vendor. This is because you need to compare the file lists at both locations, and then copy whatever files are needed, which means that you need to have a server that is at least partly dedicated to this mirroring task.
However, as I have intimated, mirroring between Amazon S3 and Rackspace Cloud Files (“CF”) is now much easier, and very cheap. Read on for the gory details.
To mirror between S3 and CF, one needs a running server, but only for as long as it takes to do the comparing and copying. Also, the server doesn’t need a lot of power, since almost all of what it will do is transfer files from one service to another. In fact, the cheapest Amazon EC2 server, the t1.micro, is perfect for this job, as is spot pricing for instances. This means that the server costs can reliably be as low as 1 cent per hour. So, in line with the philosophy I articulated in my “Servers are Software” article, I have created a model “multi-cloud-mirroring” server that launches periodically, runs until the files are synchronized between an S3 bucket and a CF container, and then self-terminates.
Ordinarily, this type of flexible, auto-launching, configuring, and terminating server would be time-consuming to create and would require another server of mine to launch it, but RightScale‘s advanced cloud management platform makes it very easy, and removes the requirement of having another server for launch. I have created a public ServerTemplate for fellow RightScale users called the “Multi-Cloud Mirroring Manager“. You can use that template in a RightScale autoscaling server array so that the server launches periodically (say, twice per day), and quits once done. See my step-by-step tutorial on setting up the autoscaling array.
In addition to RightScale, I have leaned heavily on Python and both the boto and python-cloudfiles libraries. I wrote a distinct script to handle mirroring from S3 to CF and vice versa, called multi-cloud-mirror, and I created a separate google code page for it. Please feel free to do whatever you would like with the script; I have released it under the MPL.
Finally, let me explain my estimate of $7.00/month. Assuming that you are already paying for storage on Amazon S3, and you will copy 5GB of data each month to Cloud Files, saving a rotating 6 months worth of data on Cloud Files, checking twice per day, and running two hours each time at the 0.01/hour spot pricing rate. Based upon an average file size of 10MB, this would result in less than an additional $7.00/month in both S3 and CF storage, requests, and bandwidth costs.
Like this:
Like Loading...
Joe,
This is an awesome automated solution for mirroring data between the S3 and CloudFiles object stores.
I’d started to dabble with something similar but couldn’t find the time to get it completed, I’ll be putting this to work in my own deployment very soon!
Thanks!
-Ryan J. Geyer-
Sales Engineer – RightScale
Pretty cool! Did you test whether a larger instance, such as a c1.medium, is worth it for the presumably increased bandwidth? I haven’t looked whether you copy files sequentially or whether you can copy several in parallel, that may make a difference. This only matters if the speed of copying is a bottleneck, of course…
My script uses the multiprocessing library for python, so you can run as many simultaneous copies as you want. You would likely see better performance with a bigger instance, but since my ServerTemplate is tied only to a 64-bit instance, you would want to start with m1.large, and I would anticipate that you would want to use a c1.xlarge to see significant performance improvement. I’ve been shuttling around 10GB per day with the t1.micro and 5 simultaneous processes, so I think that will handle most use cases.
I’ve got this up and running, but after it launches your script, I get the following error:
14:19:07: File "/usr/local/bin/multi-cloud-mirror.py", line 299, in run
[srcService, srcBucketName, destService, destBucketName, serviceError] = self.getScenarioDetails(scenario)
File "/usr/local/bin/multi-cloud-mirror.py", line 152, in getScenarioDetails
[fromBucket, toBucket] = scenario.split('->')
ValueError: need more than 1 value to unpack
14:19:08: Script exit status: 1
14:19:08: Script duration: 2.939942
*ERROR> Execution failed
*ERROR> External command error: RightScript exited with 1
*ERROR> Subprocess exited with 1
*RS> failed: Set Up and Run Multi-Cloud Mirroring v2
Any suggestions for getting the mirror script to run successfully?
Sounds like you’re putting in the parameters incorrectly. PARAM_SYNC_BUCKETS needs to look something like this:
s3://my3sbucket->cf://mycfbucket
Sounds like you’re missing the arrow (->)… and no spaces!
Never mind, I got it! (I was missing a “-us” at the end of my S3 bucket name. Works like a charm!
p.s. I am using ServerTemplate Multi-Cloud Storage Mirroring Manager v2 by Mark Smith, which is based on your original Template, if anyone else wants to try this.
https://my.rightscale.com/library/server_templates/Multi-Cloud-Storage-Mirroring-/lineage/16255
Pingback: PiCloud and Princeton Consultants Win the First Amazon EC2 Spotathon : Cloud Computing
Pingback: SmartCloud
Is there any script, to mirror cloud files container on s3.?
Yes, this script will do that.