The Real Impact of Synthetic Data

Studying data is critical to the health field; unfortunately, gathering data can be time consuming and sometimes impossible due to privacy laws and concerns. Patients, first and foremost, have their rights to keep their medical information private. But researchers need medical information in order to do their work.

A new project — one that creates synthetic data for use by data scientists — maintains patient privacy and delivers the information researchers need. It has also put Edmonton on the map.

Health Cities — based in the provincial capital of Edmonton, located near the centre of Alberta — is a not-for-profit that works in the medical data field. It’s developing a way to synthesize large groupings of health data, easing concerns about patient confidentiality.

Reg Joseph, the CEO of Health Cities, says the processes have shown to be accurate and prevent identification of individuals within the data sets they have examined. “Now that we know that it works, that it has utility and that it is safe, does it have any relevance?” says Joseph. That relevance will be determined by how this data can be utilized by researchers, such as better tracing for the amount of drug overdoses that lead to emergency-room interventions, or how pandemics spread through the population.

With deidentified or anonymized data, individuals could still be recognized through patterns in their information. The process Health Cities uses to create synthetic data is more complicated, and in turn makes the data safer, than simply removing names and personal identifiers from large data sets.

In an initial trial, researchers “trained” a computer on the health records of 300,000 opioid patients from the previous seven years, consisting of multiple events per patient. They synthesized a new data set based on the patterns in the original. For researchers, this helps them better understand the “who, where and when” of opioid use, which helps them better plan things such as treatment and intervention plans. And, in an area as sensitive as drug use, protecting the privacy of patients is paramount.

The work of Health Cities on synthetic data has drawn international attention. It has been working with the Institute of Health Economics (IHE), Replica Analytics, the University of Alberta and Alberta Innovates on a pilot project. That led to a partnership with pharmaceutical giant, Merck Canada, to examine the practical applications of the process.

“This research collaboration on synthetic data will help facilitate access to health information critical to scientific advancement and, in absence of readily accessible patient-level data, unlock access to relevant information needed to generate meaningful analyses for decision makers while protecting privacy, ultimately helping Canadians have access to innovative medicines,” Heidi Waser, vice president, patient access at Merck Canada, wrote in a statement.

Joseph says the work Health Cities is doing with synthetic data is likely not being done anywhere else in Canada. “Alberta has been touting its data access for a number of years. We have looked at it as a potential opportunity for economic growth and diversification in the province,” says Joseph.

“The real value proposition here is for us to partner our health system with other talented organizations like academic institutions, other institutes and industry to help solve problems with advanced data techniques.”

The health field isn’t the only area where synthetic data is being put to use. Self-driving cars and the financial sector use synthetic data as a tool. But the potential of data sets clear of privacy concerns is bringing major attention.

“We believe the ability to quickly generate high-quality synthetic data will be a game changer for clinical trials and will provide for multiple uses in artificial intelligence and machine learning,” says Tim Murphy, vice president of Health at Alberta Innovates. “It also has the potential to grow our economy and knowledge industries.”