Integrated Graph-Based Search with NLP Enhancements

  • Day: 2025-02-17
  • Time: 17:10 to 17:40
  • Project: Dev
  • Workspace: WP 2: Operational
  • Status: In Progress
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: Graph Search, Cassandra, NLP, Text Classification, Knowledge Graph

Description

Session Goal

The session aimed to explore the integration of graph-based search systems with Cassandra storage and enhance data processing using NLP techniques.

Key Activities

  • Graph-Based Search and Cassandra Storage: Discussed the structure and functionality of a graph-based search system integrated with Cassandra for efficient storage and retrieval of text chunks, focusing on metadata, embeddings, and document relationships.
  • Data Structure Enhancements: Reviewed the current data structure, proposing new analysis tables to improve data processing while maintaining ID compatibility.
  • Knowledge Aggregation: Explored methods for aggregating micro pieces of knowledge into higher-level units, discussing strategies for storage and integration.
  • NLP Annotations to Knowledge Web: Outlined steps to transform NLP annotations into a knowledge web, including preprocessing, classification, and graph construction.
  • SOTA Models for Text Classification: Summarized state-of-the-art models from Hugging Face for text classification, providing recommendations based on language and task requirements.
  • Script Plan for Text Classification: Developed a high-level plan for a script using Hugging Face’s all-MiniLM-L6-v2 model to generate embeddings and cluster 35,000 text chunks, integrating results into a knowledge graph.

Achievements

  • Clarified the integration process of graph-based search with Cassandra.
  • Proposed enhancements to the current data structure for better data management.
  • Identified state-of-the-art NLP models suitable for various text classification tasks.

Pending Tasks

  • Implement the proposed data structure enhancements.
  • Execute the script plan for text chunk classification and integration into the knowledge graph.

Evidence

  • source_file=2025-02-17.sessions.jsonl, line_number=7, event_count=0, session_id=2eebfecf6ccccc36e26a3f619f3ad73a13e1ca3cf406d4b0a204bd26ab688939
  • event_ids: []