Advanced Text Analysis with NLP Techniques
- Day: 2023-11-19
- Time: 21:15 to 22:25
- Project: Dev
- Workspace: WP 2: Operational
- Status: Completed
- Priority: MEDIUM
- Assignee: Matías Nehuen Iglesias
- Tags: NLP, Python, Text Analysis, Spacy, NLTK, Docentes
Description
Session Goal:
The session aimed to enhance text analysis capabilities using various Natural Language Processing (NLP) techniques, focusing on the word ‘docentes’ within Cristina Fernández de Kirchner’s speeches.
Key Activities:
- Implemented Named Entity Recognition (NER) using the spaCy library to identify named entities related to ‘docentes’.
- Conducted co-occurrence analysis of the word ‘docentes’ in speeches, identifying associated themes and sentiments.
- Developed a Python function for text processing and word frequency counting, including stopwords filtering and special character handling.
- Corrected a Python script for co-occurrence analysis, addressing tokenization and stopword removal issues.
- Implemented stemming in co-occurrence analysis using Python’s NLTK library, focusing on Spanish with the Snowball Stemmer.
- Enhanced co-occurrence analysis by mapping stemmed tokens back to original words.
- Explored dependency parsing using spaCy to analyze the grammatical structure of sentences involving ‘docentes’.
Achievements:
- Successfully implemented and corrected scripts for various NLP techniques, improving the analysis of ‘docentes’ in textual data.
- Established a structured framework for analyzing text data using advanced NLP methods.
Pending Tasks:
- Further refinement of scripts to improve accuracy and efficiency in text analysis.
- Exploration of additional NLP techniques to enhance analysis capabilities.
Evidence
- source_file=2023-11-19.sessions.jsonl, line_number=2, event_count=0, session_id=3c73d5be8f92403b092a0610b7e2a2ed2546e892f9be8962092ad8442f423c92
- event_ids: []