Installed and troubleshooted JabRef and web scraping tools
- Day: 2023-11-07
- Time: 22:05 to 23:40
- Project: Dev
- Workspace: WP 2: Operational
- Status: Completed
- Priority: MEDIUM
- Assignee: Matías Nehuen Iglesias
- Tags: Jabref, Web Scraping, Python, Selenium, Linux
Description
Session Goal
The session aimed to install and troubleshoot the JabRef application on a Debian-based system and to explore web scraping techniques for academic papers, particularly focusing on Google Scholar.
Key Activities
- JabRef Installation: Installed JabRef using a
.debpackage and resolved command recognition issues by modifying the system PATH and creating symbolic links for easier access. - Web Scraping Exploration: Developed Python scripts for extracting academic paper details from HTML using BeautifulSoup and regular expressions. Addressed challenges like pagination, legal considerations, and dynamic content loading.
- Troubleshooting: Resolved issues with ChromeDriver version mismatches and Selenium WebDriver options errors.
Achievements
- Successfully installed JabRef and ensured command recognition.
- Developed a robust Python script for web scraping academic papers, handling pagination and HTML parsing.
- Resolved several Selenium-related issues, ensuring compatibility and proper configuration.
Pending Tasks
- Further refine web scraping scripts to comply with legal guidelines, especially concerning Google Scholar’s terms of service.
Evidence
- source_file=2023-11-07.sessions.jsonl, line_number=1, event_count=0, session_id=491341c1b64963186cbe2f97cbfd0db1aa43b73c53ef1a92e6f1dd685a721b20
- event_ids: []