Data mining is all about finding patterns and relationships in large datasets. The difference between data analysis and data mining is that data analysis is used to test models and hypotheses on a dataset, regardless of the amount of data. For example, we can analyze the effectiveness of a marketing campaign for different car models, or predict bicycle sales in the coming month. In contrast, data mining is used to explore hidden patterns in large volumes of data. For example, we can extract driving behavior from connected car data or find anomalies in online payments.
Data analysis and data mining can also complement each other. In this case, data mining is used to find general patterns such as groups of data (clusters), unusual records (anomalies) or dependencies (associations). These patterns can then be used as input for further analysis, in which machine learning or predictive analysis are used to predict a certain outcome. For example, we can use the data mining step to identify different customer groups in a CMS system, which can then be used to obtain more accurate prediction results for the profitability of new customers.
Clustering is an example of a data mining technique. Below are more examples of commonly used data mining techniques, which we also use in some of our solutions.
Anomaly detection is the task of finding unusual data records such as outliers and unexpected changes or deviations. We mainly use this technique in our Procure-to-Pay solution, which detects potential fraudulent transactions in payment data.
Clustering is the process of identifying groups in the data that are similar in some way, without using known structures in the data. We for example use this technique to cluster customer groups in our Customer 360 solution, or to identify general topics in the media monitor.
Association rule learning
In association rule learning we try to find relationships between variables. For example, we can use car configurator data about customer purchasing habits to can determine which options are frequently bought together and use this information for marketing purposes.
In a classification problem, we try to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories. For example, we may want to predict whether a house is in San Francisco or New York.