Hidden Costs of Duplicate Data

A client asked me once about what rate of duplicate data was “normal” in customer master data.

My initial answer was that, among companies that don’t have any formal master data management, data governance or data quality initiatives in place, duplication rates of 10%-30% (or more) are not uncommon. When I was at D&B, we would routinely see that level of duplication in client’s files.

In a study in the healthcare field, Children’s Medical Center Dallas engaged an outside firm to help clean up their duplicate data: 

“Solving both the current and future problems around duplicate records helped Children’s improve the quality of patient care and increase physician acceptance of the new EHR. The duplicate record rate was initially reduced from 22.0% to 0.2% and five years later it remains an exceptionally low 0.14%. The 5 FTEs initially tasked with resolving duplicate records have been reduced to less than 1 FTE.”

“For the Children’s Medical Center, the results were heartening, not only from a care delivery standpoint but also because of the significant cost-savings that can be realized. A study conducted on Children’s data showed that on average, a duplicate medical record costs the organization more than $96.

So it is possible to get the duplication rate down to really low levels through careful analysis and the application of the right tools, as part of an ongoing data governance program. Even the hospital above (and hospitals are usually not mentioned as practitioners of best practices) was able to maintain a duplication rate of only 0.14% after 5 years.

And there are very real costs to not de-duplicating your customer data. Depending on the functional area (marketing, sales, finance, customer service, etc.) and the business activities you undertake, high levels of duplicate customer data can:

  • Annoy customers or undermine their confidence in your company.
  • Increase mailing costs.
  • Cause hundreds of hours of manual reconciliation of data,
  • Increase resistance to implementation of new systems.
  • Result in multiple sales people, sales teams or collectors calling on the same customer, etc.

The best studies I’ve seen of the cost of duplicate data have been in the healthcare industry. One study I saw said: 

“According to Just Associates, the direct cost of leaving duplicates in an Master Patient Index database is anywhere from $20 per duplicate to several hundred dollars. The lower cost reflects the organization’s labor and supply costs to identify and fix the record while the higher expense reflects the costs of repeated diagnostic tests done on a patient whose previous medical records could not be located.

The American Health Information Management Association (AHIMA) estimates that it costs between $10 and $20 per pair of duplicates to reconcile the records. If the records aren’t reconciled, however, the costs are even higher.”

Here are three more case studies backing up the range I quoted of 10%-30%:

  • Once the analysis was complete, Sentara discovered they had a significant duplication rate, over 18%. They had attempted to address the duplication rate in the past through a remediation process, but due to either technology issues or because the cost of merging and cleaning up the duplicates across their many different systems was too high, they had not yet successfully reduced their duplication rate. Source: Initiate Systems success story
  • Emerson Process Management faced a tremendous challenge four years ago in getting its CRM data in order: There were potentially 400 different master records for each customer, based on different locations or different functions associated with the client. “You have to begin to think about a customer as an organization you do business with that has a set of addresses tied to it,” says Nancy Rybeck, the data warehouse architect at Emerson who took charge of the cleanup. Rybeck analyzed the customer records for similarities and connections using everything from postal standards to D&B data, and managed to eliminate the 75 percent site-duplication rate the company suffered in its data. “That’s going to ripple through everything,” she says. Source:DestinationCRM.com
  • Problem: Number of duplicate records: 20.9% of Utah Statewide Immunization Information System records. Impact of Problem: Difficult to find patients in system—key barrier to provider participation, risk of over-immunization—unable to find reliable patient record, cost of unnecessary immunizations, risk of adverse effects on patients. Source:health.utah.gov.

And here’s a good quote from a white paper titled Data Quality and the Bottom Line” by The Data Warehousing Institute:

“Peter Harvey, CEO of Intellidyn, a marketing analytics firm, says that when his firm audits recently ‘cleaned’ customer files from clients, it finds that 5 percent of the file contains duplicate records. The duplication rate for untouched customer files can be 20 percent or more.”

Every organization will need its own metrics, but left unchecked, the duplication problem is a hidden cost that drags at your company, slowing down your processes and making your analyses less reliable.

If your sales analysis reports can’t be sure that there’s one and only one record for each of your largest customers, then the sales figures for those customers are probably not right. So the entire report becomes suspect at that point.

I’d like to end with a great quote on data quality by Ken Orr from the Cutter Consortium in “The Good, The Bad, and The Data Quality”:

“Ultimately, poor data quality is like dirt on the windshield. You may be able to drive for a long time with slowly degrading vision, but at some point, you either have to stop and clear the windshield or risk everything.”

This article was originally posted December 13, 2009 on hubdesignsmagazine.com

Dan Power

Dan Power is the Founder & President of Hub Designs, a global consulting firm specializing in developing and delivering high impact master data management (MDM) and data governance strategies....

More About Dan Power