Data Quality & AI in Healthcare

IT_Nurse
Apr 11
6 min read

Updated: Aug 29

Some thoughts I've had recently around the intersection of data quality and artificial intelligence, specifically in the domain of healthcare:

Artificial Intelligence depends on data. AI is built on data, learns from data, and makes decisions or predictions based on data.

Healthcare is DROWNING in data. As a bedside RN I was constantly collecting and recording data about my patients and my activities. Some of it was documented electronically, some was entered into the patient's electronic health record (EHR). It also varied in format:

Structured Data: i.e. Administered Hydromorphone 0.5 mg IV at 14:00
Free-Text Data: i.e. Patient ambulated 10 feet with assistance, unsteady gait noted.
Sensor-Generated Data: i.e. Heart rate of 140 bpm captured by bedside monitor.
Gut Feeling: i.e. Patient quieter than usual, taking longer to answer questions, early call to rapid response.

With all that data, one would think that healthcare is a great place to implement AI, right?

Well, sometimes. If the right conditions are met. Remember, we're not making widgets in healthcare. Our actions can make patients better, or make them worse. The last thing we want to do is make them worse.

So if we are going to implement an AI solution in healthcare, we want to know that it works properly. And that's where things get very difficult. Not only do we need to be sure that the thing we built works the way it was designed (by testing the heck out of it, while acknowledging that it's not possible to test 100% of any software solution), but we need to know that the data used in the model is as close as perfect as possible. This is difficult because it is widely understood among data professionals that all data sets have data quality issues.

The Data Administration Management Association (DAMA) has helpfully identified six dimensions of data quality:

Let's look at these in terms of a heart rate of 140 bpm captured by a bedside monitor:

Completeness - Did we capture the date/time and the patient ID? Did we also document the context? If their heart rate was high, but it was taken while running on a treadmill, it could still be within normal limits.
Uniqueness - If the information automatically flowed from the monitor into the EHR, and the nurse also documented a slightly different heart rate by manually palpating their wrist, and both were associated with the patient, now we have two different heart rates for the same timestamp, and we don't know which one is correct.
Timeliness: The nurse noted the heart rate, and on their way to document it in the patient's chart they get pulled into an urgent situation, there could be a delay in the information being documented.
Validity: The nurse is in a rush when entering the data into the EHR and enters it as 1400 bpm instead of 140. This is outside the range of possible heart rates, and we now can't trust that data.
Accuracy: The patient was brushing their teeth, causing a false high reading. Or the nurse was in a hurry and scribbled the value on paper in a way that it was transcribed incorrectly into the patient's electronic chart.
Consistency: The nurse uses a different bedside monitor to double check and gets a different reading. When were the bedside monitors last calibrated?

Then there are the common data quality issues:

Missing values: A patient's weight is missing for a chemotherapy order, preventing accurate dose calculation.
Duplicates: A patient moves from one province to another. They wind up with two charts because the first time they registered they used their old health card and the second time they used their new one. Now their health data is fragmented.
Inconsistent formats: If the nurse documented a date as 04/05/12, how do we know which value is the day, which is the month, and which is the year?
Stale information: The patient's medication list includes drugs the patient stopped taking months ago.
Biases or Lack of Representation: The intake form only offers binary gender options, failing to capture care nuances for non-binary or transgender individuals.
Noise or Irrelevant Information: Flooding a vital signs chart with minute-by-minute data from a cardiac monitor, obscuring the actual event of interest.

So let's say you have perfect processes and tools in place to collect your data so your data has just about perfect data quality. How do you make sure it stays that way? For instance, what if another team in your hospital implements a new IT solution that has an unintended negative impact on your bedside monitors. How long will it take for the problem to be caught? Do you have data stewards that regularly monitor data quality? Do you have a data governance program that treats your data as the valuable asset that it is? If this is something you would like to learn more about, I highly recommend the Data Management Masterclass course on Udemy, which covers the following topics:

There's one last thing I would like to touch on before I wrap up.

If we want to share the data with another system (i.e. we are going to pull data from the electronic health record into our AI model) then we need to worry about semantic interoperability.

Basically, we need to make sure that not only does the data from the EHR land properly in our AI model (i.e. the heart rate from the EHR gets mapped to the heart rate field in our model) but we need to make sure nothing is lost in translation. For example:

RT means Respiratory Therapy and not Radiation Therapy.
Myocardial Infarction and Cardiac Infarction and Heart Attack are actually all synonyms of the same concept. So is Infarctus Cardiaque documented in an EHR using the French language.

This is where health data standards come in.

Health Interoperability Exchange (HIE) Standards define how systems talk to each other, like the grammar and structure of healthcare messaging. These include:

HL7 V2 / V3 – widely used for lab results, admissions, discharge notifications, etc.
HL7 CDA – Clinical Document Architecture for structured documents like discharge summaries.
HL7 FHIR – (Fast Healthcare Interoperability Resources) – the modern, API-friendly standard designed for flexible data sharing.
DICOM – used for exchanging medical images (e.g., X-rays, CT scans)

Terminology Standards define the vocabulary and coding systems used with healthcare data, and give meaning to the data elements. These include:

SNOMED CT – for clinical concepts (e.g., diagnoses, symptoms, procedures).
ICD-10-CA – for diagnosis classification, primarily for reporting and billing in Canada.
pCLOCD – (or LOINC in the US) - for lab and clinical observations (e.g., “Glucose [Moles/volume] in Blood”).
DPD (or RxNorm in the US) – for medications and drug information.
CCI (or CPT in the US) – for procedures.

For example, if a nurse enters "shortness of breath" in the EHR:

The system stores it as SNOMED CT code 267036007 (Dyspnea).
When shared via FHIR with another hospital, their system receives both the label and the code, ensuring the correct interpretation, even in another language or system.

To wrap up:

I think AI has the potential to help bring about some big positive changes in healthcare, if implemented correctly. To me, this ideally means any healthcare organization that captures data to be fed into an AI model has a well thought out Data Governance program including Data Stewards who regularly monitor / manage data quality, and the use of HIE and Terminology Standards to ensure the data is captured, stored, and shared in a way that ensures semantic interoperability. I've been really fortunate to have the chance to learn about all of these things through my work experiences and my certificate program at UVic. If you would like to learn more about these concepts, I would advise exploring the links I've embedded in this post, and I'd also recommend joining/following any of the following organizations:

Canada Health Infoway (come join me in the Health Analytics Community of Practice!)
Digital Health Canada
Canadian Nursing Informatics Association
Canadian Health Information Management Association
Canadian Institute for Health Information
Canadian Institute of Health Services and Policy Research
Pan-Canadian Health Data Strategy Expert Advisory Group

If you've made it this far, thank you very much for taking the time 😊 So how about you? What do you think about AI in healthcare? Do you have any thoughts or experiences you would like to share?

LISATOTTON

Data Quality & AI in Healthcare

Recent Posts

Comments

Contact