🐍 Python meets HL7 Version 2

IT_Nurse
Apr 28
7 min read

Updated: May 15

Introduction

This fall I took a course called HINF 535 - Health Information Standards through the University of Victoria. The last assignment of the class required writing a report on a Health Information Exchange standard, and I chose to write my paper on HL7 Version 2 (HL7 v2). What is HL7 v2 you ask? Here's the introduction from my paper:

In an ideal world, every patient would have a single source of truth for their healthcare data. Whether their care was received at a hospital, an immunization clinic, or their primary care provider’s office, all their data would funnel into a single system. This system would then be accessible to everyone in the patient’s circle of care, including the patient. In our current healthcare landscape, this remains a significant challenge. Interoperability, as defined by the Institute of Electrical and Electronics Engineers, is “the ability of two or more systems or components to exchange information and to use the information that has been exchanged (1991, p. 14). Health Level 7 Version 2 (HL7 V2) is a health data exchange standard created to allow interoperability between healthcare systems, and it is argued to be the most widely used standard in the world (HL7 International, n.d.-b). This report explores the positive impact the HL7 V2 data exchange standard has had on healthcare interoperability including the history, key features, benefits, and challenges. An implementation of HL7 V2 in Canada is also reviewed.

I chose to write my paper on HL7 v2 because it was the one standard I had real-life experience with prior to taking the course. I had previously worked on a project where I helped test an interface between an Electronic Health Record (EHR) system and a medical device, where the messages sent back and forth between the two used the HL7 v2 standard. For example, the following message could have been sent from the EHR to the medical device with the patient's information:

MSH|^~\&|ER_SYSTEM|DUCKBURG_HOSPITAL|EHR_SYSTEM|DUCKBURG_HOSPITAL|20250427123000||ADT^A01|10001|P|2.3
EVN|A01|20250427123000
PID|||123456789^^^DUCKBURG_HOSPITAL^MR||Duck^Daisy||19800415|F|||123 Main St^^Duckburg^CA^90210^USA||(555)555-1234|||S||123456789|987-65-4321
PV1||E|ER^1^1^DUCKBURG_HOSPITAL||||1505^House^Gregory|||||||||||ER|||20250427123000

You might pick out that Daisy Duck had a visit to Duckburg Hospital on 27-Apr-2025 and was seen by Dr. Gregory House. But otherwise it's probably a little hard to understand exactly what the message means. This is one of the known limitations of HL7 v2: it's only possible for humans to read the messages if they have access to the documentation.

During my course I had the opportunity to learn how to use the documentation to create (aka encode) HL7 v2 messages, and how to go backwards and decode them. This was super interesting, but I was left with a lingering question: if the point of these standards is for computers to encode and decode these messages, what does that actually look like? Which brings me to today's topic:

The Hl7apy Python Library

Hl7apy is a python library that handles HL7 v2 messages. Here is a excerpt from their website:

In my last blog post I asked ChatGPT to help me create a Python program/script to encode and decode a different kind of HL7 message: HL7 FHIR. Since I have a programming background, being able to see the code really helped solidify in my mind what HL7 FHIR is, how it works, and how computers encode and decode the messages. This was so helpful that I've decided to see what other types of HL7 messages I can create Python scripts for, starting with HL7 v2 messages.

Using Hl7apy Python Library to encode messages

I started by trying to create messages about patients being registered at a facility (this is known in HL7 v2 as an ADT_A01 message). In my last blog post I had used an Excel spreadsheet of fake patients that ChatGPT generated for me, which was a great starting point. However, this only had information about patients, while I needed registration information . So I asked ChatGPT if it could give add another tab to the spreadsheet with the visit information, and it came through beautifully. Here are the two tabs in my 'fake_patients_with_visits.xlsx' file:

Now I just needed the code to read in the data from the two tabs of the spreadsheet and then build (and save!) HL7 v2 ADT_A01 messages for each. Here's what ChatGPT came up with. It involves saving the patient data into a Python data structure called a Dictionary, and using the hl7apy library to create a ADT_A01 message using the HL7 v2.3 specification. It then goes through each of the patient data elements (i.e. Health Card Number, Last Name, First Name, etc.) and maps them into the correct location/field of the HL7 message. In some cases, more than one data element winds up in the same field, separated by "^" characters. You can see this happens with the first and last names in the msg.pid.pid_5 section below.

import pandas as pd
from hl7apy.core import Message
from datetime import datetime
import os

# Load the Excel file of patient information and visit information
file_path = r"C:\fake_patients_with_visits.xlsx"
patients_df = pd.read_excel(file_path, sheet_name="Patients")
visits_df = pd.read_excel(file_path, sheet_name="Visits")

# Set the output folder
output_folder = r"C:\hl7_output"

# Convert Patients to dictionary for easy lookup
patients_dict = patients_df.set_index('HealthCardNumber').to_dict(orient='index')

# Loop through each Visit and build A01 message
for index, visit in visits_df.iterrows():
    patient = patients_dict.get(visit['HealthCardNumber'])

    if not patient:
        print(f"⚠️ No matching patient for Health Card Number: {visit['HealthCardNumber']}")
        continue

    # Create a new ADT^A01 Message
    msg = Message("ADT_A01", version="2.3")
    
    # MSH Segment
    msg.msh.msh_1 = '|'
    msg.msh.msh_2 = '^~\&'
    msg.msh.msh_7 = datetime.now().strftime("%Y%m%d%H%M%S")
    msg.msh.msh_9 = "ADT^A01"
    msg.msh.msh_10 = str(index + 1)  # Message Control ID
    msg.msh.msh_11 = "P"
    msg.msh.msh_12 = "2.3"

    # EVN Segment
    msg.evn.evn_1 = "A01"
    msg.evn.evn_2 = datetime.now().strftime("%Y%m%d%H%M%S")

    # PID Segment
    msg.pid.pid_3 = str(visit['HealthCardNumber'])
    msg.pid.pid_5 = f"{patient['LastName']}^{patient['FirstName']}"
    # Format birthdate safely
    birthdate_value = patient['BirthDate']
    if isinstance(birthdate_value, (datetime, pd.Timestamp)):
        birthdate_str = birthdate_value.strftime("%Y%m%d")
    else:
        birthdate_str = str(birthdate_value)
    msg.pid.pid_7 = birthdate_str
    msg.pid.pid_8 = patient['Gender'][0]  # M or F

    # PV1 Segment
    msg.pv1.pv1_2 = visit['PatientClass'] 
    # Assigned Location (Location^Room^Bed^Facility)
    msg.pv1.pv1_3 = f"{visit['Location']}^{visit['Room']}^{visit['Bed']}^{visit['Facility']}"
    # Attending Doctor (AttendingID^AttendingLast^AttendingFirst)
    msg.pv1.pv1_7 = f"{visit['AttendingID']}^{visit['AttendingLast']}^{visit['AttendingFirst']}"
    # Admission Date/Time
    msg.pv1.pv1_44 = visit['AdmitDateTime'].strftime("%Y%m%d%H%M")

    # Put all the segments together to create the message
    hl7_text = msg.to_er7()

    # Create the filename for the message and save the file
    filename = f"A01_{patient['LastName']}_{visit['VisitNumber']}.hl7"
    output_path = os.path.join(output_folder, filename)
    with open(output_path, "w", encoding="utf-8") as f:
        f.write(hl7_text)
    print(f"✅ Saved: {safe_filename}")

# Print to the screen that the program completed successfully
print("\n🎉 All A01 Admit messages generated successfully!")

This code worked great. Here's the message that was created for Mickey Mouse:

MSH|^~\&|||||20250427194727||ADT^A01|1|P|2.3
EVN|A01|20250427194727
PID|||357313407||Mouse^Mickey||1928-11-18|m
PV1||E|ER^4^A^Walt Disney Regional Hospital||||1725^House^Gregory|||||||||||||||||||||||||||||||||||||202504241633

Using Hl7apy Python Library to decode messages

Once I had that working, I thought it would be interesting to see if I could go in the opposite direction. I wanted to have a Python script that would take all the HL7 messages I just created, decode them, and save the information in a spreadsheet. I wanted to do this because this is how it would work in the real world:

My first python script showed me how patient data gets encoded into HL7 v2.3 messages (i.e. step 2 above), but I wanted to also see what the code would look like to decode the messages (i.e. step 4 above).

Here's the code it gave me to read in my hl7 messages, decode each one, and save the information to an Excel spreadsheet.

import os
import pandas as pd
from hl7apy.parser import parse_message
from datetime import datetime

# Tell the script where to find the HL7 messages, and set the location to save the output
hl7_folder = r"C:\hl7_output"
output_excel = r"C:\decoded_hl7.xlsx"

# Helper functions to clean and format dates
def clean_and_format_date(val):
    if not val:
        return ""
    cleaned = val.replace("-", "").strip()
    if len(cleaned) >= 8 and cleaned.isdigit():
        return f"{cleaned[0:4]}-{cleaned[4:6]}-{cleaned[6:8]}"
    return ""

def clean_and_format_datetime(val):
    if not val:
        return ""
    cleaned = val.replace("-", "").strip()
    if len(cleaned) >= 12 and cleaned.isdigit():
        return f"{cleaned[0:4]}-{cleaned[4:6]}-{cleaned[6:8]} {cleaned[8:10]}:{cleaned[10:12]}"
    return ""

# Prepare a list to hold patient visit details
records = []

# Loop through each .hl7 file
for filename in os.listdir(hl7_folder):
    if filename.endswith(".hl7"):
        file_path = os.path.join(hl7_folder, filename)

        with open(file_path, 'r', encoding='utf-8') as f:
            hl7_text = f.read()

        try:
            # Parse the HL7 message safely
            hl7_text = hl7_text.replace('\n', '\r').strip()
            msg = parse_message(hl7_text, find_groups=False, validation_level=2)

            # Extract top-level fields
            patient_id = msg.pid.pid_3.value or ""
            patient_name = msg.pid.pid_5.value or ""
            birthdate = msg.pid.pid_7.value or ""
            gender = msg.pid.pid_8.value or ""
            location = msg.pv1.pv1_3.value or ""
            attending_doctor = msg.pv1.pv1_7.value or ""
            admit_datetime = msg.pv1.pv1_44.value or ""
            message_control_id = msg.msh.msh_10.value or ""

            # Split patient name
            if '^' in patient_name:
                last_name, first_name = patient_name.split('^')[:2]
            else:
                last_name, first_name = patient_name, ""

            # Split location (Location^Room^Bed^Facility)
            loc_parts = location.split('^')
            location_name = loc_parts[0] if len(loc_parts) > 0 else ""
            room = loc_parts[1] if len(loc_parts) > 1 else ""
            bed = loc_parts[2] if len(loc_parts) > 2 else ""
            facility = loc_parts[3] if len(loc_parts) > 3 else ""

            # Split attending doctor (ID^Last^First)
            doc_parts = attending_doctor.split('^')
            doctor_id = doc_parts[0] if len(doc_parts) > 0 else ""
            doctor_last = doc_parts[1] if len(doc_parts) > 1 else ""
            doctor_first = doc_parts[2] if len(doc_parts) > 2 else ""

            # Clean and format birthdate and admit datetime
            birthdate_formatted = clean_and_format_date(birthdate)
            admit_datetime_formatted = clean_and_format_datetime(admit_datetime)

            # Append the record to our list
            records.append({
                "MessageControlID": message_control_id,
                "PatientID": patient_id,
                "LastName": last_name,
                "FirstName": first_name,
                "BirthDate": birthdate_formatted,
                "Gender": gender,
                "Location": location_name,
                "Room": room,
                "Bed": bed,
                "Facility": facility,
                "DoctorID": doctor_id,
                "DoctorLast": doctor_last,
                "DoctorFirst": doctor_first,
                "AdmitDateTime": admit_datetime_formatted,
                "SourceFile": filename
            })

            print(f"✅ Decoded {filename}")

        except Exception as e:
            print(f"⚠️ Failed to parse {filename}: {e}")

# Create a DataFrame from the records and save it as an excel file
df = pd.DataFrame(records)
df.to_excel(output_excel, index=False)

# Print to the screen that the program completed successfully
print("\n🎉 All messages decoded, split, cleaned, and saved to:", output_excel)

Interestingly, while the script ChatGPT provided above to encode the messages had very few issues, the decoding script gave me a little bit of trouble. You'll see there's one line of code I highlighted in green...this was the solution for the problem I ran into. The error message I ran into said:

⚠️ Failed to parse A01_Bunny_1013.hl7: The version 2.3 EVN is not supported.

I went back and forth with ChatGPT several times, giving it the error message and trying the updated code it provided, but after a few tries it felt like we were going in circles. So I reached into my trusty troubleshooting toolbox and checked Stack Overflow. The first post I found had the answer:

I gave this information to ChatGPT, it added the line of code 'hl7_text = hl7_text.replace('\n', '\r').strip()', and the next time I ran the script it worked!

Here's ChatGPT's reasoning why it was necessary:

And here's what my decoded_hl7.xlsx output file wound up looking like:

Conclusion

In this project I used the HL7Apy Python library to encode a spreadsheet of patient visit information into HL7 v2.3 messages, and then take those same messages and decode them back into into a spreadsheet. Next up, I'm hoping to look at doing something similar for HL7 v3-CDA messages.

If you have any thoughts or feedback about what I came up with, or how I went about it, I would love to hear from you in the comments section below. I'd also love to know if you have any related projects you would like to share. In any case, I hope you have a fantastic day!

Lisa

LISATOTTON

🐍 Python meets HL7 Version 2

Recent Posts

Comments

Contact