M.I. Journal

❯

❯

Enhanced Web Scraping Scripts for Student Data

Enhanced Web Scraping Scripts for Student Data

Aug 01, 20242 min read

Python
Selenium
Web-Scraping
Data-Extraction
Beautifulsoup

Enhanced Web Scraping Scripts for Student Data

Day: 2024-08-01
Time: 22:30 to 23:55
Project: Dev
Workspace: WP 2: Operational
Status: Completed
Priority: MEDIUM
Assignee: Matías Nehuen Iglesias
Tags: Python, Selenium, Web Scraping, Data Extraction, Beautifulsoup

Description

Session Goal

The goal of this session was to develop and refine Python scripts for web scraping student data using Selenium and BeautifulSoup.

Key Activities

Developed a Python script utilizing Selenium and BeautifulSoup to extract student information from web pages, storing data in pandas DataFrames while avoiding duplicates based on URL IDs.
Modified Selenium scripts to manage browser sessions and tabs effectively, enhancing error handling to improve script robustness.
Implemented changes to handle empty tables and deprecated warnings, optimizing DataFrame concatenation using pd.concat instead of append.
Updated scripts to print HTML structures using BeautifulSoup’s prettify method and ensured proper page loading with error handling mechanisms.

Achievements

Successfully created and refined multiple scripts for extracting and processing student data from web pages.
Improved error handling and session management in Selenium scripts, increasing the stability and reliability of the scraping process.
Optimized data handling in pandas, ensuring efficient data manipulation and storage.

Pending Tasks

Further testing of scripts in diverse web environments to ensure robustness across different scenarios.
Continuous monitoring and adjustment of scripts to accommodate any changes in web page structures or technologies.

Evidence

source_file=2024-08-01.sessions.jsonl, line_number=2, event_count=0, session_id=995a8361e14cc97fa3e3fa67e103518dc2f5414272b73f860e46f793ea471eca
event_ids: []

Graph View

Enhanced Web Scraping Scripts for Student Data
Description
Session Goal
Key Activities
Achievements
Pending Tasks
Evidence

Backlinks

Monthly Journal 2024-08

Created with Quartz v4.5.1 © 2026

Home
CV
Projects
Thesis
GitHub