Using customer data has long meant accepting a trade-off: better marketing results at the cost of weaker privacy. But new research suggests that trade-off may no longer be unavoidable.
A study by Longxiu Tian, a marketing professor at UNC Kenan-Flagler Business School, finds that companies can use customer data to improve marketing while protecting privacy.
In a test using data from a U.S. telecom carrier, targeting customers likely to leave improved the campaign’s results by 1.46% with privacy protection, compared with 1.66% without it — while cutting the share of customers who could be identified from about 7% to zero.
“One of the biggest mistakes is to treat anonymity as privacy,” says Tian. “Just because data is anonymous does not mean it’s private. Through common tasks like linking datasets, you can work out who those people are.”
He conducted the research Dana Turjeman of Reichman University and Samuel Levy of the University of Virginia, and their findings were published in “Privacy Preserving Data Fusion” in Marketing Science.
Their research introduces a machine learning methodology, PPDF, that enables organizations to harness the full power of data fusion without violating user privacy. The method is crucial for meeting the increasing demands for compliance and efficiency in data-driven industries — as well as government agencies and academic researchers — seeking to extract value from sensitive data while ensuring individual confidentiality.
Most firms today are constantly combining individual-level data, linking records such as purchases, usage and billing with survey responses on how customers rate the service. Each dataset is useful on its own, but together give a far richer picture of customer behavior.
However, combining datasets has traditionally created a privacy risk, since even “anonymous” data can reveal identities when linked. Small details such as age or location can act like a digital fingerprint, which when matched across datasets can point back to a specific person, and result in legal, reputational and material consequences.
The study shows how real that risk is. In a test using data from a large telecoms company, linking customer records with survey responses led to nearly 7% of users being re-identified when no privacy protections were applied. Roughly one in 14 “anonymous” respondents could be matched back to their identities.
The researchers argue this is not a flaw in how companies handle data, but a feature of combining it.
“We cannot get full privacy and full accuracy at the same time,” says Tian. “The goal is to balance the two and that’s what our model tries to do.”
The researchers’ solution aims to break that trade-off by slightly distorting individual data points so they can’t be traced back to a person and keeping the overall trends accurate enough for firms to make good decisions.
In the telecoms group test, a campaign to stop customers from leaving worked almost as well with privacy protection as without it and, crucially, the protected approach prevented any customers from being identified.
For companies this suggests a marginal drop in performance, in exchange for eliminating a major privacy risk.
The study also shows that too much privacy can backfire. In the telecom test, stricter settings made the campaign far less effective, with the improvement dropping to under 1%.
The implication is that there is a “sweet spot” for privacy.
Another key finding is that privacy risk is uneven. A small group of customers with unusual behavior or characteristics are much easier to identify than others. Even in a larger dataset, no one else looks like them.
This suggests that simply adding more data to “hide” individuals will not solve the problem, and that protection might need to focus on those most at risk, rather than applying the same rules to everyone.
“Even without a data breach, if it becomes clear a company can trace ‘anonymous’ data back to people, it can lead to regulatory penalties and a loss of consumer trust,” says Tian.
The study also introduces a way to measure how likely each individual is to be identified, allowing companies to see exactly which customers are most at risk and adjust their privacy protections accordingly.