Developed Web Scraping Strategy for Exactas UBA

  • Day: 2026-03-12
  • Time: 22:10 to 22:40
  • Project: Dev
  • Workspace: WP 2: Operational
  • Status: In Progress
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: Wordpress, Web Scraping, Exactas Uba, Data Extraction, API

Description

Session Goal

The session aimed to develop a comprehensive strategy for web scraping and data extraction from the Exactas UBA domains, focusing on identifying and utilizing the WordPress infrastructure.

Key Activities

  • Conducted search queries to retrieve sitemap and header information for the domains exactas.uba.ar and lcd.exactas.uba.ar, using insights from BuiltWith.
  • Explored robots.txt and sitemap.xml files to understand the web optimization and scraping potential.
  • Analyzed the technological stack of the domains, confirming the use of WordPress and suggesting a data extraction strategy leveraging the REST API.
  • Developed a fingerprinting strategy to verify WordPress installations using REST endpoints, feeds, sitemaps, and curl commands.
  • Confirmed the WordPress structure of the sites and proposed mapping strategies to optimize data extraction.
  • Outlined a structured plan for ingesting LCD content into a knowledge base, detailing objectives and operational constraints.

Achievements

  • Successfully identified the WordPress infrastructure of the Exactas UBA domains and developed a tailored strategy for data extraction.
  • Established a systematic approach for verifying WordPress sites and documenting results.

Pending Tasks

  • Implement the proposed data extraction and ingestion strategies.
  • Monitor and adjust the strategies based on real-time results and data quality.

Evidence

  • source_file=2026-03-12.sessions.jsonl, line_number=0, event_count=0, session_id=04ae7ffd5eab2aaab2d675ceb0ff234b4ebb87ce8882764550245467b2ec31cd
  • event_ids: []