Every system you have captures data for different purposes. The CRM captures data on customers, prospects and the company’s interactions with them. The ERP captures data on sales, vendors, accounting transactions, employee records, and much more. In preparing data for analytics work, it is likely that you need to do some data cleaning.
Cleaning data is an important task. Here are 5 questions you need to address:
- Is there any duplication? If so, which source provides the most reliable data?
- Is there any missing data? If that’s the case, is it appropriate to assign default values or ignore those records all together?
- Is it necessary to filter the data? If it is needed, what filters, or criteria do you use?
- Is it necessary to extrapolate data from what is available? If so, what is the best way to produce what you need? Alternatively, what might serve as good surrogate data?
- Is it necessary to aggregate data to make the interpretation meaningful? Or conversely, disaggregate the data to find actionable insight?
Through the process, you might need to make generalization and assumptions. Take note of them as they might affect your interpretation of the results.