Developed Headless Scraping Microservice with FastAPI

  • Day: 2025-07-14
  • Time: 01:15 to 02:25
  • Project: Dev
  • Workspace: WP 2: Operational
  • Status: Completed
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: Web Scraping, Fastapi, Playwright, Automation, Docker

Description

Session Goal

The session aimed to develop a robust headless scraping microservice using FastAPI and Playwright, focusing on automation and scalability.

Key Activities

  • Addressed clipboard issues in headless Chrome environments using Selenium.
  • Developed strategies for job data extraction and handling JavaScript-heavy pages.
  • Explored alternatives for content copying in Streamlit apps and production-level DOM content extraction.
  • Planned and implemented a cloud-based headless browser solution for scalable web scraping.
  • Analyzed resource usage and scaling strategies for headless browsing systems.
  • Set up a FastAPI headless browser scraper API and tested with JavaScript-heavy pages.
  • Scaffolded and built a headless scraping microservice, including Dockerization steps.
  • Resolved DNS errors in Playwright and confirmed API functionality.
  • Developed curl commands for job listing scraping and handled cookie consent modals.
  • Investigated Spider API capabilities for dynamic content extraction.

Achievements

  • Successfully developed and tested a headless scraping microservice using FastAPI and Playwright.
  • Implemented solutions for common issues like DNS errors and cookie consent handling.
  • Explored and compared Spider API capabilities with Playwright for dynamic content scraping.

Pending Tasks

  • Further optimization of resource usage and cost analysis for scaling headless browsing systems.
  • Continued investigation into Spider API’s advanced features for handling complex web interactions.

Evidence

  • source_file=2025-07-14.sessions.jsonl, line_number=5, event_count=0, session_id=2f52a08a016cf29a2525e0e0e40f9f034f2ccc2f3c94b727ab736e7b2c3a0e77
  • event_ids: []