Jul 20, 2016

Some best practices for deploying analytics

The life cycle of analytic model. A challenge for many organizations is moving an analytic model from the modeling environment to a product, service, or operational environment.

I have been giving a few talks over the last year or so on what I've been calling AnalyticOps, an abbreviation of Analytic Operations. Motivated broadly by the discipline of DevOps, AnalyticOps can be defined.

By the way, I still find one of the best introductions to DevOps this 2012 O'Reilly Radar article by Mike Loukides.

If you think of the goal of DevOps as "establish[ing] a culture and an environment where building, testing, releasing, and operating software can happen rapidly, frequently, and more reliably." (source: Wikipedia DevOps), than a starting point is to think of AnalyticOps as establishing a culture and an environment where building, validating, deploying, and running analytic models can happen rapidly, frequently, and reliably.

Some common mistakes when deploying analytic models include:

  1. After a pushing code representing a model to deploy a model into production without causing a disaster, thinking that you can continue to push code for analytic models into production without creating a disaster.

  2. Thinking that the features are fixed and all that you will need to do is update the parameters.

  3. Thinking the model is done and not realizing how much work is required to keep up to date all the the pre- and post-processing required.

  4. Not checking in production to see if the inputs to the models drift slowly over time.

  5. Not checking that the model will keep running despite missing values, garbage values, etc. (even values that should never be missing in the first place).

One approach is to use what is called an "analytic engine," which is a component that is integrated into products, services or enterprise IT for deploying analytic models and analytic workflows into operational workflows for products and services.

A Model Interchange Format is a format that supports the exporting of a model by one application and the importing of a model by another application. Model Interchange Formats include the Predictive Model Markup Language (PMML), the new Portable Format for Analytics (PFA), and various in-house or custom formats.

Analytic engines are integrated once into a product, service, or operational environment, but allow applications to update models by importing new models (for PMML) or new models or workflows (for PFA) via the analytic engine.

Analytic engines were designed to solve the first problem above, and PFA was designed to address the second and third problems above by supporting analytic functions (or primitives), which can be can be used to define analytic models and analytic workflows, where the output of one analytic function or model is the input to another.

Disclaimer: Open Data Group sells an analytic engine.