Automated Google Search Script for Profile Links
- Day: 2024-07-12
- Time: 19:50 to 20:10
- Project: Dev
- Workspace: WP 2: Operational
- Status: Completed
- Priority: MEDIUM
- Assignee: Matías Nehuen Iglesias
- Tags: Python, Web Scraping, Automation, Google Search, CSV
Description
Session Goal
The primary goal of this session was to develop and refine a Python script that automates Google searches to retrieve profile links from a CSV file.
Key Activities
- Developed a Python script utilizing libraries such as
requests,BeautifulSoup, andgooglesearch-[[python]]to automate Google searches for profile links. - Corrected a keyword argument in the
google_searchfunction, changingnumtonum_resultsto comply with thegooglesearch-[[python]]library. - Updated the
google_searchfunction by removing thepauseargument and implementingsleep_intervalfor request delays. - Managed Google search usage limits by considering potential IP bans and legal issues, and explored solutions like using official APIs, rate limiting, and proxies.
- Created a script to structure profile links in a DataFrame and save them to a CSV file.
- Set up a rotating proxy pool for web scraping to ensure ethical compliance and avoid IP bans.
Achievements
- Successfully automated the process of retrieving and structuring profile links from a CSV file using Python.
- Implemented error corrections and updates to enhance script functionality and compliance.
Pending Tasks
- Further testing of the proxy pool setup to ensure reliability and compliance with Google’s terms of service.
Evidence
- source_file=2024-07-12.sessions.jsonl, line_number=3, event_count=0, session_id=c2e77da68b8090f4da0ac278c11114f2e9ba3c83ef9c8c35354d1149d85a43b4
- event_ids: []