M.I. Journal

❯

❯

Web Scraping Techniques and Error Handling

Web Scraping Techniques and Error Handling

Mar 19, 20242 min read

Web-Scraping
Python
Beautifulsoup
Selenium
CSV
Debugging

📅 2024-03-19 — Session: Web Scraping Techniques and Error Handling

🕒 22:25–23:20
🏷️ Labels: Web Scraping, Python, Beautifulsoup, Selenium, CSV, Debugging
📂 Project: Dev
⭐ Priority: MEDIUM

Session Goal

The session aimed to explore various web scraping techniques using Python libraries, with a focus on handling dynamic content and common errors.

Key Activities

HTML Structure for Data Extraction: Discussed organizing HTML elements for effective data extraction using BeautifulSoup.
Basic Web Scraping: Demonstrated using requests and BeautifulSoup to fetch and parse HTML content, highlighting limitations with dynamic pages.
Error Handling: Provided solutions for handling KeyError in BeautifulSoup and corrected f-string usage.
CSV Encoding Issues: Addressed common encoding problems when saving CSV files and suggested solutions.
Dynamic Page Scraping: Explained challenges with AngularJS-rendered pages and recommended tools like Selenium and Puppeteer.
Precios Claros Repositories: Compared GitHub repositories for scraping Precios Claros, detailing their approaches and technical complexity.
OpenDataCordoba Guide: Offered a step-by-step guide for using the OpenDataCordoba repository to scrape Precios Claros.
Debugging with ipdb: Provided insights into using ipdb for debugging Python code.

Achievements

Clarified methods for scraping static and dynamic web pages.
Solved common errors in web scraping and CSV file handling.
Provided resources for further exploration of web scraping projects.

Pending Tasks

Further exploration of API alternatives for dynamic content extraction.
Implementation of advanced scraping techniques using Selenium or Puppeteer.

Graph View

📅 2024-03-19 — Session: Web Scraping Techniques and Error Handling
Session Goal
Key Activities
Achievements
Pending Tasks

Backlinks

Monthly Journal – 2024-03

Created with Quartz v4.5.1 © 2025

Home
CV
Projects
Thesis
GitHub