M.I. Journal

❯

❯

Enhanced web scraping with BeautifulSoup

Enhanced web scraping with BeautifulSoup

Mar 08, 20231 min read

Web-Scraping
Beautifulsoup
Python
Html-Parsing
Data-Extraction

Enhanced web scraping with BeautifulSoup

Day: 2023-03-08
Time: 19:10 to 19:30
Project: Dev
Workspace: WP 2: Operational
Status: Completed
Priority: MEDIUM
Assignee: Matías Nehuen Iglesias
Tags: Web Scraping, Beautifulsoup, Python, Html Parsing, Data Extraction

Description

Session Goal

The session aimed to update and enhance a web scraping script using BeautifulSoup to improve data extraction from HTML pages.

Key Activities

Updated CSS selectors in the BeautifulSoup code to replace :contains with :-soup-contains, avoiding warnings.
Developed a Python function to scrape researcher and graduate student names into a pandas DataFrame.
Corrected a misspelled header tag in the web scraping function to ensure accurate data extraction.
Addressed encoding issues by specifying character encoding manually in BeautifulSoup.
Provided guidance on the correct URL for the Image Processing and Computer Vision Group’s webpage.
Suggested HTML code corrections to resolve search failures in BeautifulSoup.

Achievements

Successfully updated and corrected the web scraping script, enhancing its functionality and accuracy.
Resolved encoding issues and improved data extraction reliability.

Pending Tasks

Further testing of the updated web scraping function with different HTML pages to ensure robustness.
Exploration of additional BeautifulSoup features for more complex data extraction scenarios.

Evidence

source_file=2023-03-08.sessions.jsonl, line_number=0, event_count=0, session_id=aa8ff1298d996138f231a074baaf3a2ef2ba0bf72c5d22406b637375bf1e6b37
event_ids: []

Graph View

Enhanced web scraping with BeautifulSoup
Description
Session Goal
Key Activities
Achievements
Pending Tasks
Evidence

Backlinks

Monthly Journal 2023-03

Created with Quartz v4.5.1 © 2026

Home
CV
Projects
Thesis
GitHub