📅 2023-11-19 — Session: Developed NLP Scripts for ‘Docentes’ Analysis
🕒 21:20–22:25
🏷️ Labels: NLP, Python, Text Analysis, Docentes, Spacy, NLTK
📂 Project: Dev
⭐ Priority: MEDIUM
Session Goal: The main objective was to develop and refine scripts for analyzing the term ‘docentes’ in various textual contexts, particularly in speeches by Cristina Fernández de Kirchner.
Key Activities:
- Implemented Named Entity Recognition (NER) using spaCy to identify and categorize entities related to ‘docentes’.
- Conducted co-occurrence analysis to explore themes and sentiments associated with ‘docentes’ in the speeches.
- Developed a text processing function to count word frequencies, incorporating stopword filtering and special character handling.
- Corrected and enhanced a Python script for co-occurrence analysis, addressing tokenization and stopword removal issues.
- Applied stemming techniques using NLTK for improved co-occurrence analysis, specifically tailored for Spanish language.
- Explored dependency parsing with spaCy to analyze grammatical structures around ‘docentes’.
- Structured a comprehensive script framework for analyzing ‘docentes’, integrating POS tagging, NER, and dependency parsing.
Achievements:
- Successfully implemented and refined multiple NLP techniques for detailed analysis of ‘docentes’.
- Created a robust framework for future text analysis tasks involving similar linguistic elements.
Pending Tasks:
- Further testing and validation of scripts in diverse text corpora to ensure robustness and accuracy.
- Integration of additional NLP techniques as needed based on initial analysis results.