Posts Tagged analytics
What is Analytic Infrastructure and Why Should You Care?
Posted by Robert Grossman in Blog, analytic infrastructure, analytic strategy on February 16, 2010
I have been building analytic models for over 20 years. The names have changed a lot over the years: 20 years ago we built statistical models, 10 years ago we built data mining models, and today we build analytic models. The algorithms have changed some: classification and regression trees became common 20 years ago, support vector machines about 10 years ago, and today graph-based algorithms are popular.

Perhaps what has changed the most is my perspective.
Analytic algorithms and models. Twenty years ago, I was focused on algorithms and was concerned with the different types of models that you could build using different types of algorithms on different types of data. This worked fine as long as the data fit into the memory of the computer.
Analytic infrastructure. For better or worse I ran into problems that had so much data that the data was too big to fit into memory. Some projects required a disk, some required many disks, and a few required tertiary storage. I spent over two decades working on what you might call analytic infrastructure. I first worked on teams that developed for the high energy physics community specialized data management infrastructures that were optimized for efficient reads (instead of safe writes) and accessed the data by columns (instead of rows) in order to speed up numerical computations. These turned out to be some of the first examples of data warehouses (the name was not used at that time), increased by 1 to 3 orders of magnitude the size of data that we could model, and were heavily criticized by the database community. Of course, several years later the database community embraced data warehouses at least for reports, if not for data intensive computing and modeling.
Beginning about five years ago, I began working on what are today called cloud computing platforms. Again, this increases by 1 to 3 orders of magnitude the size of data that we can model, and again these have been heavily criticized by some in the database community as being a big step backwards.
I recently edited a special issue of the ACM SIGKDD Explorations about analytic infrastructure. In an article there, I define analytic infrastructure as the applications, services, utilities and systems that are used for either preparing data for modeling, estimating models, validating models, scoring, or related analytic activities. For example, analytic infrastructure includes databases and data warehouses, statistical and data mining systems, scoring engines, grids and clouds. Note that with this definition analytic infrastructure does not need to be used exclusively for modeling but simply useful as part of the modeling process. The article is available as a pdf from the SIGKDD Explorations web site (it’s Issue 1 in Volume 11).
I don’t really like this definition and encourage you to provide a better one. What is important though is that using the appropriate analytic infrastructure is critical to building models for problems with so much data that simply putting it into memory and forgetting about it is not a viable solution.
Analytic Strategy. Returning to how my perspective has evolved, for the past several years, I have become increasingly concerned with what is usually called analytic strategy. Analytic strategy is concerned with making sure you are asking the right analytic question, that you are building a model that can be deployed efficiently, that the output of the model is actionable, that the actions have a business impact, the business impact is aligned with corporate strategy, that there is an appropriate governance process in place, and related questions.
My perspective these days is that analytics requires a firm foundation and that the foundation has three columns: 1) analytic strategy; 2) analytic infrastructure; and 3) analytic algorithms and models.
The picture is by Alyson Hurt.
In Analytics, It’s the Actions that Matter
Posted by admin in Blog, analytic strategy, analytics on April 28, 2009
In this note, let’s define analytics as the analysis of data in order to take actions. (This is a narrow definition of analytics, but one that is useful here.) If you don’t have day to day work experience with analytics, it is easy to have the mistaken impression that analytics is only about data and statistical models.
Although understanding data and developing statistical models is certainly an important component of an analytic project, this is just one aspect of analytics. This aspect includes cleaning data, enriching data, exploring data, developing features, building models, validating models, and iterating the process. From a broad perspective, this is a process in which the input is data and the output is a statistical model. When most people think of modeling, this is what they think of. For many analytic projects, this is just a small part of what is required for a successful engagement.
The second aspect of analytics is what I am concerned with in this note. This is the aspect of analytics concerned with:
- developing an appropriate score for a statistical model;
- using the score to define useful actions;
- determining which measures are best for evaluating the effectiveness of these actions;
- tracking these measures (often with a dashboard) and making sure that that they advance the strategic objectives of the company or organization.
One way to remember this is using the mnemonic SAMS for Scores, Actions, Measures and Strategies.
For example, with a response model, often a threshold is used. If the score from the response model is above the threshold, an offer is made (this is the action); if not, no offer is made.
Here are some examples of SAMS:
| Model | Score | Action | Measure | Strategy |
|---|---|---|---|---|
| on-line response model | likelihood to respond to an offer | display the offer to the visitor that has the highest likelihood of response and available inventory | revenue per day generated by the web site | increase revenue from a website by improving targeting of offers |
| fraud model | likelihood that a transaction is fraudulent | approve, decline, or obtain more information | detection and false positive rates | reduce costs and improve customer experience by lowering fraud rates |
| data quality model | likelihood that a data source has data quality problems | if the score is above a threshold, manually investigate the data to check whether there is in fact a data quality problem | detection and false positive rates | improve operational efficiencies by detecting data quality problems more quickly |
A successful analytics projects requires a careful study of what actions are possible; of the possible actions, which can be deployed into operational systems; and, how the systems can be instrumented so that the data required to compute the required measures is available.
The organizational challenge when developing and deploying analytics is that four groups must work together to complete a successful analytic project:
- The IT group must provide the required data to build the model.
- The analytics group must build the appropriate models and develop the appropriate scores.
- The operations group must decide which actions are possible and how these actions can be integrated with current systems and business processes.
- An executive sponsor must make sure that the measures have strategic relevance and the three groups above collaborate effectively.