📅 2023-11-19 — Session: Developed NLP Scripts for ‘Docentes’ Analysis

🕒 21:20–22:25
🏷️ Labels: NLP, Python, Text Analysis, Docentes, Spacy, NLTK
📂 Project: Dev
⭐ Priority: MEDIUM

Session Goal: The main objective was to develop and refine scripts for analyzing the term ‘docentes’ in various textual contexts, particularly in speeches by Cristina Fernández de Kirchner.

Key Activities:

  • Implemented Named Entity Recognition (NER) using spaCy to identify and categorize entities related to ‘docentes’.
  • Conducted co-occurrence analysis to explore themes and sentiments associated with ‘docentes’ in the speeches.
  • Developed a text processing function to count word frequencies, incorporating stopword filtering and special character handling.
  • Corrected and enhanced a Python script for co-occurrence analysis, addressing tokenization and stopword removal issues.
  • Applied stemming techniques using NLTK for improved co-occurrence analysis, specifically tailored for Spanish language.
  • Explored dependency parsing with spaCy to analyze grammatical structures around ‘docentes’.
  • Structured a comprehensive script framework for analyzing ‘docentes’, integrating POS tagging, NER, and dependency parsing.

Achievements:

  • Successfully implemented and refined multiple NLP techniques for detailed analysis of ‘docentes’.
  • Created a robust framework for future text analysis tasks involving similar linguistic elements.

Pending Tasks:

  • Further testing and validation of scripts in diverse text corpora to ensure robustness and accuracy.
  • Integration of additional NLP techniques as needed based on initial analysis results.