By now you’ve probably read the news that Dropbox has migrated the majority of its data from the Amazon S3 cloud to its own data centers. In a blog posted on Monday, Akhil Gupta, Dropbox’s Vice President of Engineering, explained that Dropbox currently maintains about 90% of its data in-house. “As the needs of our users and customers kept growing, we decided to invest seriously in building our own in-house storage system,” he wrote.
Word on the street is that “about 90 %” of Dropbox data amounts to 500 petabytes (PB) from 500 million customers. That’s a big chunk of data, and Gupta wrote that the company knew from the beginning that it would have to build everything from scratch, because there is nothing out there that’s proven to work reliably at that scale. With nearly a 200 PB in-house cloud, Datto knows a bit about building and maintaining a scalable cloud, as well. And, we are also nearing the end of a project to migrate a massive amount of data from Amazon S3 to the Datto cloud. The scale of the migration is nowhere near as large as the Dropbox migration, weighing in at about 12 PB, but the projects shared a lot of similarities.
The Dropbox migration took nearly three years. Datto engineers have been working on migrating Backupify data for over a year, and have cycled teams on and off the project as appropriate to avoid fatigue and spread knowledge during that time. For both, testing and auditing were critical. Dropbox built a prototype as a proof of concept prior to this to get a sense of workloads and file distributions. Datto engineers took a similar approach, prototyping and testing the entire migration path at the start of the project. Auditing was important to ensure the success of the migration. Gupta wrote that Dropbox tested and audited the reliability of the system for the highest levels of data durability and availability. Datto performed audits throughout the process as well, using Amazon EC2 compute resources.
Like Dropbox, Backupify’s SaaS backup business was built on Amazon S3, which allowed it to grow rapidly and scale compute and storage resources affordably. However, when Datto acquired Backupify in 2014, the decision was made to take the data in-house. While Amazon S3 was an essential resource for Backupify and many other startups that require elastic IT, the scale of the Datto Cloud has made it more cost efficient to migrate out of Amazon. Gupta’s post indicates that Dropbox came to the same conclusion. The fact is, without Amazon, Dropbox and Backupify might not have ever existed. But when storage capacity needs grow large enough, it just becomes more cost effective to build your own infrastructure.
Finally, it’s important to note that both Dropbox and Datto maintain excellent working relationships with Amazon. David Block, Vice President of SaaS Backup Engineering at Datto, said that Amazon has been very helpful in the migration process, and it is likely that we will work with Amazon on future projects, when it makes sense. Gupta’s blog post indicated the same. “Later this year, we’ll expand our relationship with AWS to store data in Germany for European business customers that request it,” he wrote.