Data is like oxygen for a business. To make better decisions, businesses need data to eliminate unfavourable options and affirm their choices. Unfortunately, data is abundant but it doesn’t imply quality.
In many businesses, data are captured in multiple systems. One system might assign a customer number as a unique identifier whereas another system could use the social insurance number instead. The inconsistency makes it challenging to correlate data between systems. In some cases, a data field in a system could be used to capture something else over time. Data integrity is an ongoing issue. Here are some steps you can take to enhance data quality.
1. Pick the most reliable source
Businesses implement systems for different purposes. The billing system has the basic customer information for issuing invoices. The CRM, on the other hand, contains comprehensive data on the customer, including all communications and interactions. When you want to shortlist customers to invite to an event, you would extract data from the CRM because the billing system wouldn’t have the data you want to use for shortlisting.
2. Determine representative data subset
While it is nice to be able to pull data across the board, it might not be practical or feasible. Based on the purpose of use, it might be adequate to pull a select sample. Say you want to understand the causes of a process bottleneck, you start with a preliminary discussion on key concerns before diving into data extraction. The 80/20 rule helps to focus on specific data subsets that offer the most useful insights.
3. Remove exceptional deviations
In determining the correlation between data points, it is necessary to remove data that display anomalies. These anomalies skew the analysis. A wealth management company wants to study the correlation between the net worth of their clients and their preference for communication. The company monitors the use of web access for portfolio statements. It is known that a handful of clients would never use the web access. In reviewing the correlation, it is important to remove this group to better understand the trend.
4. Make assumptions
When the data available isn’t exactly what is needed, it is fine to make assumptions that are in line with the operating model. The finance department intends to study the timeliness of vendor payments. An invoice is not entered into the system until it is signed off by a manager. As a result, it is difficult to track the true elapse time between invoice receipt and payment. Without the actual tracking, an assumption could be made that invoice approval is done as soon as it is received. In this case, the time it takes to process payment would begin when the approved invoice is entered and end when a cheque is cut or the electronic fund transfer is complete. The result is not exact but pretty close.
5. Apply normalization
Following the above example, data could be normalized to account for known conditions. If the business has a guideline on how soon invoices should be reviewed and approved, it is reasonable to add lead time to the data extracted from the accounting system. Normalizing data is a way to emulate the real case, producing results that portray the real practice. Sometimes it is just as adequate to incorporate assumptions into the interpretations of the results as in step 4 above.
The most important thing to remember is consistency in what you use, how you extract the data and prepare the analysis. As long as the assumptions and deficiencies are noted, you would be able to draw reasonable insights from the data.