A Vision for a Biomedical Clouds

Kevin White and I wrote a paper about the impact of big data in biology, medicine and health care and some of the technology, such as science clouds, that provide the enabling the technology.

The paper is called “A Vision for Biomedical Clouds” and was published in the Journal of Internal Medicine (doi:10.1111/j.1365-2796.2011.02491.x). The paper is open access.

You can also find an online version of the paper here.

Posted in big data, genomics | Comments Off

PMML version 4.1 and Augustus version 0.5 Released

The Data Mining Group just released PMML Version 4.1. PMML is the leading standard for statistical and data mining models. Version 4.1 includes support for multiple models, such as segmented models and ensembles of models, and for new models, such as baselines models, which are used in data quality, process control and change detection.

Open Data also just released a new version of Augustus (version 0.5), which includes support for PMML 4.1. Augustus is an open source, python based PMML compliant analytic application that can produce PMML compliant models (a PMML Producer) and read PMML models and score data against them (a PMML Consumer). This newest version of Augustus also includes new support for streaming analytics.

Posted in analytic infrastructure | Leave a comment

Tutorial on Data Intensive Computing at SC 11

Collin Bennett and I gave a three hour tutorial at SC 11 in Seattle on data intensive computing. You can find the slides for the tutorial here.

The titles of the talks were: An Introduction to Big Data (Chapter 1), Managing Big Data (Chapter 2), and Processing Big Data (Chapter 3). You can also find the slides for the hands on laboratory session that we led.

Posted in big data | Comments Off