Good data quality is essential if you want to become more data driven. Before it is possible to run all kinds of Machine Learning models or algorithms, it is important that the data we are working with is clean and unique. A frequently used phrase in Machine Learning is: “Garbage in equals garbage out”. In other words: it does not matter how good your Machine Learning model is, if the data is inaccurate, then the output will also be inaccurate.
Let’s take one step back and look at the core datasets which are present at most companies. At first, a lot of companies have data about customers. It is very important that each customer has a unique customer ID, to be able to identify all data concerning a unique customer without mixing up customers or counting the same customer multiple times. Uniqueness is especially important when linking tables to each other. In this way a trustworthy link can be established between the two tables. This is not only the case with customer tables, but also with item (master) tables, in which the products or services that your company offers are stored. If products are stored in twofold, inventory tracking will be a tough job.
This is why the Datalab is offering all kinds of data standardization solutions. This differs from cleaning customer data, to extracting unique identifiers out of item descriptions, to harmonizing datasets from different sources. Additionally, we can also advice on how to store and maintain your datasets in a way that you can operate more accurate.
Because every case is unique, we would love to get in contact and talk about your data sets and concerns you might have about the quality. Do you have questions about data standardization or data quality in general? Contact Ralf de Haan or one of our team members!
Kalkhoff – Data Quality at Emstek
PPNL – CSC Watchdog
PAH – Dashboarding Standards
Master Data Analysis Tool