A Predictive Analytics Workbench or data mining workbench is a set of software components designed to enable the analysis of a set of data sources to determine the mathematical relationships within that data and to produce a predictive analytic model that embodies those relationships.
A Predictive Analytics Workbench is most commonly targeted at an analyst—someone with a working knowledge of predictive analytics, a reasonable understanding of the data that is available within the organization to be analyzed, and a good understanding of the business needs. Analysts are usually found within a centralized analytics team or within a line of business, working as a marketing analyst or fraud analyst, for example.
Predictive Analytics Workbenches provide a set of capabilities that enable the user to perform the following tasks:
- Connect to data
- Prepare the data for modeling
- Visualize the data
- Build predictive and statistical models
- Test predictive analytic models against holdout data
- Assess business impact of models
- Deploy models into production
- Manage deployed models
A Predictive Analytics Workbench has many different modeling techniques, the most common of which are as follows:
- Rule induction
- Decision trees
- Linear regression
- Logistic regression
- Affinity analysis
- Nearest neighbor
- Neural networks
- Genetic algorithms
These techniques can be used to build four main classes of models:
- Predictive analytic models: These look for patterns or trends in the data and provide a predicted outcome. The output could be binary (Will Churn / Won’t Churn), a numeric (Churn likelihood is 80%), or one of multiple results (Of the 20 campaigns we have running today, this customer is likely to respond to the Churn Campaign).
- Clustering Models: These group similar sets of data together and provide ways of looking at the profiles of each of the clusters. Cluster models are often used to gain a broad understanding of a customer base; for example, “I have five different groups of customer types and the most profitable is made up of women between the age of 18 and 25 who have been customers for over six months.” Cluster models can be used to segment the data prior to building predictive analytic models on each of the individual segments for finer grained targeting.
- Association Models: These look for situations in the data where one or more events have already occurred and there is a strong possibility of another event occurring. For example: “If a customer purchases a razor and after-shave, then that customer will purchase shaving cream with 80% confidence.” This is commonly used for analyzing customer basket data and when delivering recommendation engines behind online shopping sites.
- Statistical Models: In the context of a workbench, statistical models are often used to validate hypotheses. For example, “I think that young men who have been a customer for over 24 months are a high churn risk, so what is the probability that this is a reliable finding, or just due to random variation?”
A Predictive Analytics Workbench allows a user to create, validate, manage, and deploy predictive analytic models. A predictive analytic workbench consists of the components shown in the Figure below:
- A model repository: This is a place where models and the specification of the tasks required to produce them can be stored, revised, and managed. Not all Predictive Analytic Workbenches have such a repository, and some still store models as script files.
- Data management tools: Building predictive analytic models requires access to multiple data sources of various formats. A Predictive Analytics Workbench must be able to connect to and use this data.
- Design tools for a modeler: Modelers need to be able to define how data will be integrated, cleaned, and enhanced, as well as the way in which it will be fed through modeling algorithms and the results analyzed and used.
- Modeling algorithms: Predictive Analytic Workbenches have a wide array of modeling algorithms that can be applied to data to produce models.
- Data visualization and analysis tools: Modelers must be able to understand the data available, analyzing distribution and other characteristics. They must also be able to analyze the results of a set of models in terms of their predictive power and validity.
- Deployment tools: Models are not valuable unless they can be deployed in some way, and Predictive Analytic Workbenches need to be able to deploy models as code, as SQL, as business rules, or to a database using an in-database analytics engine.