Advanced Text Analysis with NLP Techniques

  • Day: 2023-11-19
  • Time: 21:15 to 22:25
  • Project: Dev
  • Workspace: WP 2: Operational
  • Status: Completed
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: NLP, Python, Text Analysis, Spacy, NLTK, Docentes

Description

Session Goal:

The session aimed to enhance text analysis capabilities using various Natural Language Processing (NLP) techniques, focusing on the word ‘docentes’ within Cristina Fernández de Kirchner’s speeches.

Key Activities:

  • Implemented Named Entity Recognition (NER) using the spaCy library to identify named entities related to ‘docentes’.
  • Conducted co-occurrence analysis of the word ‘docentes’ in speeches, identifying associated themes and sentiments.
  • Developed a Python function for text processing and word frequency counting, including stopwords filtering and special character handling.
  • Corrected a Python script for co-occurrence analysis, addressing tokenization and stopword removal issues.
  • Implemented stemming in co-occurrence analysis using Python’s NLTK library, focusing on Spanish with the Snowball Stemmer.
  • Enhanced co-occurrence analysis by mapping stemmed tokens back to original words.
  • Explored dependency parsing using spaCy to analyze the grammatical structure of sentences involving ‘docentes’.

Achievements:

  • Successfully implemented and corrected scripts for various NLP techniques, improving the analysis of ‘docentes’ in textual data.
  • Established a structured framework for analyzing text data using advanced NLP methods.

Pending Tasks:

  • Further refinement of scripts to improve accuracy and efficiency in text analysis.
  • Exploration of additional NLP techniques to enhance analysis capabilities.

Evidence

  • source_file=2023-11-19.sessions.jsonl, line_number=2, event_count=0, session_id=3c73d5be8f92403b092a0610b7e2a2ed2546e892f9be8962092ad8442f423c92
  • event_ids: []