Installed and troubleshooted JabRef and web scraping tools

  • Day: 2023-11-07
  • Time: 22:05 to 23:40
  • Project: Dev
  • Workspace: WP 2: Operational
  • Status: Completed
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: Jabref, Web Scraping, Python, Selenium, Linux

Description

Session Goal

The session aimed to install and troubleshoot the JabRef application on a Debian-based system and to explore web scraping techniques for academic papers, particularly focusing on Google Scholar.

Key Activities

  • JabRef Installation: Installed JabRef using a .deb package and resolved command recognition issues by modifying the system PATH and creating symbolic links for easier access.
  • Web Scraping Exploration: Developed Python scripts for extracting academic paper details from HTML using BeautifulSoup and regular expressions. Addressed challenges like pagination, legal considerations, and dynamic content loading.
  • Troubleshooting: Resolved issues with ChromeDriver version mismatches and Selenium WebDriver options errors.

Achievements

  • Successfully installed JabRef and ensured command recognition.
  • Developed a robust Python script for web scraping academic papers, handling pagination and HTML parsing.
  • Resolved several Selenium-related issues, ensuring compatibility and proper configuration.

Pending Tasks

  • Further refine web scraping scripts to comply with legal guidelines, especially concerning Google Scholar’s terms of service.

Evidence

  • source_file=2023-11-07.sessions.jsonl, line_number=1, event_count=0, session_id=491341c1b64963186cbe2f97cbfd0db1aa43b73c53ef1a92e6f1dd685a721b20
  • event_ids: []