CareerBeacon Meets Python: My Enhanced Job Search Solution

IT_Nurse
Nov 22, 2024
5 min read

Updated: Dec 19, 2024

In my last blog post, "Daily Job Board Updates? There’s a Script for That!", I shared how I automated the process of monitoring job postings from the Government of New Brunswick’s job board. It was a great first step toward creating an efficient system for staying up to date on job opportunities. But like most projects, one improvement led to another, and I realized there was more I could do—especially as I looked to expand my search to new platforms.

Why Expand?

My career interests are focused on the healthcare domain, particularly roles related to data (analysis, visualization, governance, or standards), nursing informatics, or process improvement. Geographically, I’m looking for remote opportunities or roles based in New Brunswick, Canada. While the GNB job board is a valuable resource, it only covers a portion of the job market. That’s when I turned my attention to CareerBeacon, a popular platform used by many employers in my area.

The Challenge: Adapting the Script

One of the most significant hurdles in expanding my job monitoring system to include CareerBeacon was adapting the Python script I initially developed for the Government of New Brunswick (GNB) job board. While the GNB site organizes job postings in a straightforward table format, CareerBeacon introduced new complexities due to its underlying structure and pagination system.

For the GNB site, extracting job postings was relatively straightforward. The jobs were presented in a single table, which allowed me to use the Beautiful Soup library to locate the <table> element and extract its rows directly. With this uniform structure, the process involved minimal iteration and was easy to replicate.

CareerBeacon, however, presented an entirely different challenge. Instead of a single-page table, CareerBeacon displays job postings across multiple pages, necessitating the use of pagination. This required me to implement a loop in the script to navigate through each page of job results. Additionally, the site’s layout lacked the centralized table format of GNB, requiring me to identify specific HTML elements—like <div> tags and their associated class attributes—where job information was stored.

To handle these differences, I expanded the script to include:

Pagination Handling: I added logic to simulate user navigation through the site. This included identifying the "Next Page" button's unique attributes and iterating until all job postings were captured.
Dynamic HTML Parsing: Since CareerBeacon’s job details were spread across various <div> elements, I customized my Beautiful Soup selectors to pinpoint key data such as job titles, employers, and job URLs.
Error Management: With a more complex structure, I included error-handling mechanisms to ensure the script could gracefully skip over unexpected changes or missing fields in the HTML.

This adaptation process was a valuable learning experience, reinforcing that no two websites are the same when it comes to web scraping. Each new platform brings unique challenges, but with Python’s flexibility, even these obstacles can be overcome with some creative problem-solving.

Here’s the Python script:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import os

# Function to scrape job details from a single page
def scrape_jobs_from_page(url):
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Find all job entries on the page
        jobs = []
        job_containers = soup.find_all('div', class_='non_featured_job_inner_container')
        
        for job in job_containers:
            # Extract job title
            title = job.find('div', class_='job_title').get_text(strip=True) if job.find('div', class_='job_title') else "N/A"
            
            # Extract location
            location = job.find('span', class_='location mid-grey').get_text(strip=True) if job.find('span', class_='location mid-grey') else "N/A"
            
            # Extract salary
            salary = job.find('div', class_='job_salary').get_text(strip=True) if job.find('div', class_='job_salary') else "N/A"
            
            # Extract posted date
            posted_ago = job.find('div', class_='job_pub_date').get_text(strip=True) if job.find('div', class_='job_pub_date') else "N/A"
            
            # Extract job URL
            job_url = job['data-posting_url'] if job.has_attr('data-posting_url') else "N/A"
            
            # Add job details to list
            jobs.append({
                "Job Title": title,
                "Location": location,
                "Salary": salary,
                "Posted Ago": posted_ago,
                "Job URL": job_url
            })
        
        return jobs
    else:
        print(f"Failed to retrieve page {url}")
        return []

# Load employer URLs from Excel
employers_df = pd.read_excel('C:/Python/CareerBeacon/CareerBeaconEmployers.xlsx')

# Create an empty DataFrame to store all job details
all_jobs_df = pd.DataFrame(columns=["Employer", "Job Title", "Location", "Salary", "Posted Ago", "Job URL"])

# Loop through each employer and scrape their jobs
for index, row in employers_df.iterrows():
    employer_name = row['Employer']
    employer_url = row['URL']
    
    print(f"Processing jobs for {employer_name}...")
    
    page = 1
    while True:
        page_url = f"{employer_url}?page={page}"
        jobs = scrape_jobs_from_page(page_url)
        
        if not jobs:  # If no jobs are found, stop the loop for this employer
            print(f"No more jobs found on page {page} for {employer_name}. Moving to the next employer.")
            break
        
        # Convert job list to DataFrame and add employer information
        jobs_df = pd.DataFrame(jobs)
        jobs_df['Employer'] = employer_name
        
        # Append to the main DataFrame
        all_jobs_df = pd.concat([all_jobs_df, jobs_df], ignore_index=True)
        
        print(f"Page {page} Completed for {employer_name}")
        page += 1

# Save the combined DataFrame to CSV
script_date = datetime.now().strftime('%Y-%m-%d')
output_path = f'C:/Python/CareerBeacon/CareerBeacon_{script_date}.csv'
all_jobs_df.to_csv(output_path, index=False, encoding='utf-8')

print(f"Job information saved to {output_path}")

# Print the DataFrame to verify
print(all_jobs_df)

On another note, a feature of CareerBeacon that delighted me was its support for capturing the URLs to individual job postings—a capability that was absent on the GNB site. This seemingly small detail made a big difference, as it allowed me to create a direct link to the full job description for each posting. Not only did this streamline my workflow, but it also ensured that I could revisit and analyze specific postings in greater detail, adding a layer of functionality that wasn’t possible with the GNB data. This discovery felt like a win and reinforced the value of adapting my script to new platforms.

Automation, Meet Integration

Like my GNB script, this new script is set up with a .bat file and scheduled in Windows Task Scheduler to run automatically every day. But I didn’t stop there. I also revisited the Excel file I’d created to aggregate my daily job data. Previously, it only combined the GNB files; now, it also pulls in the CareerBeacon files. Using Power Query, I’ve set up filters to display only the jobs that meet my criteria:

Healthcare domain
Roles related to data, nursing informatics, or process improvement
Remote or based in New Brunswick

The result? A consolidated view of all the jobs I’m interested in, updated daily, with clickable links to explore each posting in detail.

Lessons Learned: Adapting to a New Platform

One of the most valuable aspects of this project was learning how to adapt my Python script to work with a different platform. CareerBeacon’s structure posed unique challenges compared to the GNB job board, requiring me to rethink how data is organized and extracted. For example:

GNB: Uses a straightforward table structure, which made it relatively easy to extract high-level job details like titles and locations. However, it doesn’t expose URLs for individual postings, limiting the depth of data available.
CareerBeacon: Lacks a centralized table and spreads job details across multiple <div> elements, with pagination adding another layer of complexity. On the plus side, CareerBeacon includes URLs for each job posting, enabling me to link directly to detailed descriptions and capture richer data.

These differences reinforced the importance of flexibility and creative problem-solving when working with web scraping. Each platform has its quirks, and learning to navigate them was a rewarding challenge that broadened my technical skills.

Challenges and Future Plans

Of course, no system is perfect. Like the GNB site, a challenge with CareerBeacon is ensuring the script can handle changes to the website’s structure. If the HTML layout changes, the script may need updates to keep working. Adding more robust error handling and monitoring tools could help mitigate this.

Looking ahead, I’d like to:

Explore additional job boards to further expand my search.
Integrate the data into Power BI for a more user-friendly and visually appealing interface.
Experiment with capturing more detailed information from CareerBeacon postings, such as job descriptions or qualifications, where possible.

Why This Matters

This project isn’t just about job searching—it’s about using technology to solve real-world problems. By automating repetitive tasks, I’ve created a system that saves time and ensures I don’t miss opportunities.

If you’re navigating the job market or exploring automation, I hope this inspires you to try something similar. Whether it’s scraping job postings, tracking industry trends, or automating another process, the possibilities are endless—and incredibly rewarding!

LISATOTTON

CareerBeacon Meets Python: My Enhanced Job Search Solution

Recent Posts

header.all-comments

Contact