📅 2024-09-11 — Session: Data Gathering and Processing Workflow Execution

🕒 17:00–19:00
🏷️ Labels: Data Gathering, Web Crawling, Data Processing, Investment, Optimization
📂 Project: Dev
⭐ Priority: MEDIUM

Session Goal

The goal of this session was to outline and execute workflows for data gathering and processing, focusing on value-based investment strategies.

Key Activities

  • Data Gathering and Processing Workflow: Developed a comprehensive workflow for creating subsets of stores, crawling data, and processing it for value-based investment baskets.
  • Precios Claros Scraping: Outlined a workflow for scraping and analyzing store data using the Precios Claros crawler.
  • Store Selection: Implemented a Python approach to select the closest stores by group using DataFrame operations.
  • Geospatial Data Handling: Fetched GeoJSON files for Buenos Aires and CABA, and optimized geospatial data visualization using GeoPandas and Matplotlib.
  • Scrapy Crawler Optimization: Optimized a Scrapy crawler for specific store IDs and analyzed execution logs to improve efficiency.
  • Savings Opportunity Calculation: Calculated potential savings by comparing current prices to median prices, aiming to optimize a basket of goods.

Achievements

  • Established a structured approach for data gathering and processing.
  • Enhanced scraping efficiency and data visualization techniques.

Pending Tasks

  • Further optimization of data processing workflows.
  • Implementation of savings opportunity strategies in investment decisions.