📅 2025-02-17 — Session: Integrated Graph-Based Search with NLP Enhancements
🕒 17:10–17:40
🏷️ Labels: Graph Search, Cassandra, NLP, Text Classification, Knowledge Graph
📂 Project: Dev
⭐ Priority: MEDIUM
Session Goal
The session aimed to explore the integration of graph-based search systems with Cassandra storage and enhance data processing using NLP techniques.
Key Activities
- Graph-Based Search and Cassandra Storage: Discussed the structure and functionality of a graph-based search system integrated with Cassandra for efficient storage and retrieval of text chunks, focusing on metadata, embeddings, and document relationships.
- Data Structure Enhancements: Reviewed the current data structure, proposing new analysis tables to improve data processing while maintaining ID compatibility.
- Knowledge Aggregation: Explored methods for aggregating micro pieces of knowledge into higher-level units, discussing strategies for storage and integration.
- NLP Annotations to Knowledge Web: Outlined steps to transform NLP annotations into a knowledge web, including preprocessing, classification, and graph construction.
- SOTA Models for Text Classification: Summarized state-of-the-art models from Hugging Face for text classification, providing recommendations based on language and task requirements.
- Script Plan for Text Classification: Developed a high-level plan for a script using Hugging Face’s all-MiniLM-L6-v2 model to generate embeddings and cluster 35,000 text chunks, integrating results into a knowledge graph.
Achievements
- Clarified the integration process of graph-based search with Cassandra.
- Proposed enhancements to the current data structure for better data management.
- Identified state-of-the-art NLP models suitable for various text classification tasks.
Pending Tasks
- Implement the proposed data structure enhancements.
- Execute the script plan for text chunk classification and integration into the knowledge graph.