From Databases to Datapods

May 19, 2015

Scaling data and analytic services to cyberpods.

Data centers are sometimes divided into "pods," which can be built out and customized as needed. Pods vary in size but usually contain between 10 and 100 or more racks of computing infrastructure. Sometimes one of the pods is considered the "core," and provides some of the basic functionality, such as networking. Here is a description of some of the ways that data centers can be organized into core and pods.

For some of the clouds and data commons that we have been developing and operating for the Open Cloud Consortium, such as the Open Science Data Cloud, we have been using the term cyberpod for the cyberinfrastructure that fills the pods, including the hardware, networking infrastructure, software stack, data, applications, etc.

For example, a reasonable scale for a data commons is a cyberpod. We have also started to use the term datapod for the data and analytic cyber infrastructure needed to manage and analyze a cyberpod's worth of data.

Loosely speaking, you can think of a datapod as a data management system that scales out to the scale required by a datapod. Hadoop and NoSQL databases are an example of a software stack that scales out to a cyberpod. More recently, there has been renewed interest in shared nothing parallel databases that also scale out in this way.

We are building several data commons that are designed to scale out to cyberpods that are based on S3-compatible object storage, and long running services providing digital ID services, metadata services, and high performance transport services. Compute can be done via virtual machines or containers.

I gave a lightning talk today at XLDB 2015 at Stanford describing this approach.