📅 2024-08-16 — Session: Developed web scraper for Buenos Aires norms
🕒 02:05–02:35
🏷️ Labels: Web Scraping, Python, Automation, Buenos Aires, Data Extraction
📂 Project: Dev
⭐ Priority: MEDIUM
Session Goal: The goal of this session was to develop a software tool to automatically check and download daily government norms published by Buenos Aires Province, focusing on resolutions from the current year.
Key Activities:
- Developed a Python script utilizing Requests, BeautifulSoup, and Pandas to parse URLs and extract relevant data into a structured format.
- Analyzed HTML structure to design a software architecture for data extraction and organization.
- Implemented web scraping steps, including pagination handling and data storage in a Pandas DataFrame.
- Enhanced error handling in the Python script to safely access list elements and prevent errors.
- Designed a function to generate URLs for searching Buenos Aires norms, using wildcard parameters and specific filters.
- Built a Python function to construct search URLs, filtering out empty values for clean query strings.
- Created a Python script for daily data scraping, appending results to a CSV file with error handling and logging.
Achievements:
- Successfully developed a web scraper for Buenos Aires government norms, capable of handling pagination and storing data efficiently.
- Improved error handling mechanisms in the scripts to ensure robustness.
Pending Tasks:
- Further testing and optimization of the web scraper for different types of norms and date ranges.
- Integration of the URL generation function with the main scraping workflow.