Web Scraping LinkedIn Jobs Using Python and Selenium
LinkedIn is a valuable resource for job seekers and recruiters alike, offering a vast database of job listings across various industries. In this article, we’ll explore how to scrape job listings from LinkedIn using Python and the Selenium library. By automating the data extraction process, we can gather valuable insights and streamline our job search efforts.
Prerequisites
Before we begin, ensure that you have the following prerequisites:
- Python installed on your system
- Selenium library installed (
pip install selenium
) - ChromeDriver executable compatible with your Chrome browser version
Step 1: Install the LinkedIn Jobs Scraper Library
To simplify the scraping process, we’ll use the linkedin-jobs-scraper
library. Install it by running the following command:
!pip install linkedin-jobs-scraper
Step 2: Import Required Libraries
Import the necessary libraries for the scraping task:
import logging
from linkedin_jobs_scraper import LinkedinScraper
from linkedin_jobs_scraper.events import Events, EventData, EventMetrics
from linkedin_jobs_scraper.query import Query, QueryOptions, QueryFilters
from linkedin_jobs_scraper.filters import RelevanceFilters, TimeFilters, TypeFilters, ExperienceLevelFilters, OnSiteOrRemoteFilters
Step 3: Configure Logging and Event Listeners
Set up logging and define event listener functions to handle data extraction and processing:
job_data = []
logging.basicConfig(level=logging.INFO)
def on_data(data: EventData):
print('[ON_DATA]', data.title, data.company, data.company_link, data.date, data.link, data.insights, len(data.description))
job_data.append([data.title, data.company, data.company_link, data.date, data.link, data.insights, data.description])
def on_error(error):
print('[ON_ERROR]', error)
def on_end():
print('[ON_END]')
Step 4: Create a LinkedinScraper Instance
Create an instance of the LinkedinScraper class with the desired configuration:
scraper = LinkedinScraper(
chrome_executable_path=None,
chrome_options=None,
headless=True,
max_workers=1,
slow_mo=0.5,
page_load_timeout=40
)
Step 5: Define Scraping Queries
Define the queries and filters for the job listings you want to scrape:
queries = [
Query(
options=QueryOptions(
limit=1
)
),
Query(
query='Data analyst',
options=QueryOptions(
locations=["Israel"],
apply_link=True,
skip_promoted_jobs=True,
page_offset=0,
limit=10,
filters=QueryFilters(
relevance=RelevanceFilters.RECENT,
time=TimeFilters.MONTH,
type=[TypeFilters.FULL_TIME, TypeFilters.INTERNSHIP],
on_site_or_remote=[OnSiteOrRemoteFilters.REMOTE],
experience=[ExperienceLevelFilters.ENTRY_LEVEL, ExperienceLevelFilters.INTERNSHIP]
)
)
),
]
Step 6: Run the Scraper
Run the scraper with the defined queries and event listeners:
scraper.on(Events.DATA, on_data)
scraper.on(Events.ERROR, on_error)
scraper.on(Events.END, on_end)
scraper.run(queries)
Step 7: Process and Analyze the Scraped Data
Once the scraping process is complete, you can access the scraped job data in the job_data
list. You can perform further analysis, store the data in a database, or export it to a file for later use.
Conclusion
Web scraping LinkedIn jobs using Python and Selenium provides a powerful way to gather job listing data efficiently. By automating the scraping process, you can save time and effort in your job search or recruitment efforts. Remember to use the scraped data responsibly and in compliance with LinkedIn’s terms of service.
Happy scraping!