Developed Email Categorization System with NLP

  • Day: 2024-10-05
  • Time: 23:40 to 00:00
  • Project: Dev
  • Workspace: WP 2: Operational
  • Status: In Progress
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: Email Categorization, Machine Learning, NLP, Data Extraction, Feature Engineering

Description

Session Goal

The session aimed to develop a machine learning-based email categorization system using NLP techniques and classification algorithms.

Key Activities

  • Planning: Outlined a framework for creating an email categorization system using manually labeled examples as training data.
  • Classifier Optimization: Discussed strategies to enhance classifier performance by addressing dataset limitations, such as increasing training examples and improving feature extraction methods.
  • Data Extraction: Developed a workflow to extract subjects and content from HTML files, preparing the data for machine learning.
  • Feature Engineering: Proposed combining email subject lines and content as features for a classification model.

Achievements

  • Established a plan for the email categorization system leveraging NLP and machine learning.
  • Developed a structured approach for data extraction from HTML files.
  • Proposed methods for feature engineering to improve model input.

Pending Tasks

  • Confirm the directory path or upload necessary files for data extraction.
  • Implement the proposed strategies for classifier optimization and feature engineering.

Evidence

  • source_file=2024-10-05.sessions.jsonl, line_number=1, event_count=0, session_id=506412965551a043c30fe35676ae9a03ec7bdcaf6bb8cd3713e49e0723a296df
  • event_ids: []