🕒 21:00–21:50
🏷️ Labels: Python, NLP, Spacy, Legal Documents, Data Extraction
📂 Project: Dev
⭐ Priority: MEDIUM

Session Goal

The session aimed to enhance natural language processing (NLP) capabilities for analyzing legal documents, focusing on extracting resolutions, dispositions, and linguistic elements like verbs and objects.

Key Activities

  • Implemented Python code to extract resolutions and dispositions from summaries using regex and structured storage.
  • Addressed challenges in word frequency analysis due to missing summary examples and connectivity issues with downloading Spanish stopwords.
  • Developed Python scripts to eliminate stopwords and convert text to uppercase for better analysis.
  • Utilized spaCy for extracting verbs and objects, integrating these functions into data extraction workflows.
  • Explored improvements in spaCy’s language model performance for better verb and object identification in bureaucratic texts.

Achievements

  • Successfully created functions to extract and analyze linguistic elements from legal texts using Python and spaCy.
  • Proposed solutions to improve NLP model accuracy in processing legal documents.

Pending Tasks

  • Further refine spaCy models to reduce errors in verb tagging and improve language processing accuracy in legal contexts.