Save Big: Move Archives to Azure Table Storage
If you have an in-house application with a lot of archival data sitting in SQL Server tables, you may be hesitating to move it to Microsoft Windows Azure, and for good reason: Azure SQL Database costs a pretty penny to run. But redeveloping the application to use cheaper Azure table storage also costs money. At some point, the Azure SQL Database costs will exceed the redevelopment costs. Let me introduce you to a real-world example of an extremely price-conscious customer for whom redevelopment is the right choice: me.
In 1998, I developed a demo called Weather Now, and I've run it on my own server rack since 2006. The application shows real-time aviation weather for almost every airport in the world.
Even though I've moved all of my other work to Azure, this one remains the last living application on the server rack. Simply put, moving it as-is to Azure would have cost a ton more than leaving it alone, without re-writing the data layer.
You will have to do your own cost analysis to decide whether moving a large data footprint to Azure without refactoring makes sense. My analysis looks like this:
Weather Now stores its data on a 2U server running Microsoft SQL Server 2012 Enterprise. In the past 14 years, the application has quietly gathered 50 GB of data. This means that porting the application directly to an Azure Virtual Machine (VM) with a 50 GB SQL Database would cost me about $250 per month to run, increasing by about $50 per month each year.
Fortunately, I don't have to do that. Most of the data, you see, hardly ever gets used. (Also, my development costs are free, because I'm doing it myself. This is probably not the case in your organization.)
How much data is there? Well, Weather Now usually has around 4,500 current weather observations and 165,000 observations from the last 24 hours. Since each row is small, and the index is positively tiny (only the station ID and observation times are indexed), the current table uses about 1.5 MB of space and the last-24-hours table uses 43 MB. That doesn't even break a sweat on Azure SQL Database.
No, it's the Archive table that grows like the Beast from Below. That one has all of the past observations since the site started. Though I've pruned the table from time to time, it still has one row per observation per station for 8 years or so. For an average station like Chicago's O'Hare Airport, that means about 10,500 rows per year. For a chatty, automated station that spits out a report every 20 minutes, it stores about 27,000 rows per year. Multiply that by 4,500 stations, and in 2012 the app stored about 47 million* rows—and it continues to grow at a rate of 4 million rows per month.
Clearly Azure SQL Database is a bad idea for this volume of data. It makes a lot more sense to store archival weather reports in Azure table storage. Now, I may never move all the existing archives, so the new app won't have the full 50 GB of material initially, possibly ever; but even if it did, it would only cost about $5 per month to store it.
The conclusion should be obvious: If you start looking at things the Azure way, using Azure can save you tons of money. My current estimate of the monthly cost to run Weather Now, assuming current visitor levels and acceptable performance on "very small" Cloud Services instances, is $40 per month. If it eventually amasses 50 GB of archives, it will cost...$42 per month. And if I get thousands of visitors that require upgrading to a "small" instance, I'll start selling subscriptions, but I won't have to buy new equipment because it's Azure.
In my next post I'll talk about how I developed the table partitions to balance usability and efficiency.
* Actually, 47,704,735 rows for 2012, an average of 130,341 new rows per day.