π 2023-03-08 β Session: Enhanced web scraping with BeautifulSoup
π 19:10β19:30
π·οΈ Labels: Web Scraping, Beautifulsoup, Python, Html Parsing, Data Extraction
π Project: Dev
β Priority: MEDIUM
Session Goal
The session aimed to update and enhance a web scraping script using BeautifulSoup to improve data extraction from HTML pages.
Key Activities
- Updated CSS selectors in the BeautifulSoup code to replace :containswith:-soup-contains, avoiding warnings.
- Developed a Python function to scrape researcher and graduate student names into a pandas DataFrame.
- Corrected a misspelled header tag in the web scraping function to ensure accurate data extraction.
- Addressed encoding issues by specifying character encoding manually in BeautifulSoup.
- Provided guidance on the correct URL for the Image Processing and Computer Vision Groupβs webpage.
- Suggested HTML code corrections to resolve search failures in BeautifulSoup.
Achievements
- Successfully updated and corrected the web scraping script, enhancing its functionality and accuracy.
- Resolved encoding issues and improved data extraction reliability.
Pending Tasks
- Further testing of the updated web scraping function with different HTML pages to ensure robustness.
- Exploration of additional BeautifulSoup features for more complex data extraction scenarios.
