The discipline of data intensive computing has been growing in importance and in popularity recently. It has now become popular enough that the term “big data” is beginning to be used instead. The graph below is from Google Trends and shows the growth of the term “big data” over the past couple of years.

I used to think that data came in three sizes depending upon how you managed it: either small enough to fit into memory, small enough to fit into a database, or too big for a database.
During the last few years, I have changed my point of view with respect to how you measure the size of big data. The most common point of view is to measure the size of data in terms of bytes: megabytes, gigabytes, terabytes, petabytes, and exabytes. But over the past few years, I have noticed that people with very large amounts of data, measure their data and the computing power required to process it in terms of MW.
Here are some examples:
- A good sweet spot for a data center is 15MW.
- Facebook’s leased data centers are typically between 2.5 MW and 6.0 MW.
- Facebook’s new Pineville data center is 30 MW.
- Google’s computing infrastructure uses 260 MW.
Today, the Open Science Data Cloud requires about 0.5MW. Our goal over the next 3 to 5 years is to develop and operate a 5 MW or so facility denoted to science.
The perspective when you measure data in MW is somewhat different. You would like the facility to be uniform. You would like to be able to add new racks and retire old racks with little if any manual intervention. You would like to be able optimize the amount of data you can manage and the amount of data you can process per MW.
Today, it takes too long for us to add and retire racks from the OSDC. If you would like to join a research project to develop open source software to simplify this, please write us at info at opencloudconsortium.org.