Developed Python scripts for web scraping and NLP

  • Day: 2023-11-19
  • Time: 03:00 to 07:00
  • Project: Business
  • Workspace: WP 2: Operational
  • Status: Completed
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: Python, Web Scraping, NLP, Financial Services, Political Analysis

Description

Session Goal: The session aimed to develop and refine Python scripts for web scraping and natural language processing (NLP) tasks related to political speech analysis and financial account management.

Key Activities:

  • Explored scenarios where financial services like Wise may change account details, focusing on customer notifications and transaction impacts.
  • Drafted an inquiry template for contacting Wise customer support about account detail changes.
  • Proposed innovative project ideas for analyzing political speeches using web scraping and AI, emphasizing thematic organization and educational resources.
  • Developed a Python script for scraping links from the CFK Argentina website, covering data from 2007 to 2023, with error handling and compliance considerations.
  • Customized URL handling in the web scraping script for specific years using a dictionary.
  • Created a Python script to extract and concatenate text from URLs using requests and BeautifulSoup, with network error handling.
  • Implemented a script to concatenate speeches, preprocess text, and count word frequencies in Spanish using the NLTK library.

Achievements:

  • Successfully developed and tested multiple Python scripts for web scraping and text processing.
  • Generated actionable insights and templates for both financial account management and political speech analysis.

Pending Tasks:

  • Further refinement of the web scraping scripts to enhance efficiency and accuracy.
  • Exploration of additional NLP techniques for deeper analysis of political speeches.

Evidence

  • source_file=2023-11-19.sessions.jsonl, line_number=0, event_count=0, session_id=9ba6362a6f2cd56b93f21a7b49360d98badf68ec1b5f2a8e9e70523b720699cc
  • event_ids: []