Enhanced Clustering with HDBSCAN and UMAP Techniques

  • Day: 2025-05-17
  • Time: 22:30 to 23:25
  • Project: Dev
  • Workspace: WP 2: Operational
  • Status: In Progress
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: HDBSCAN, UMAP, Clustering, Data Visualization, Parameter Tuning

Description

Session Goal

The session aimed to enhance clustering techniques using HDBSCAN and UMAP to explore latent data structures and improve clustering results.

Key Activities

  • Parameter Tuning for HDBSCAN: Adjusted parameters like ‘min_cluster_size’, ‘min_samples’, and ‘cluster_selection_epsilon’ to optimize cluster detection.
  • UMAP Projections: Explored multiple UMAP projections to visualize latent structures, emphasizing the use of more than two dimensions.
  • Visualization and Conflict Resolution: Addressed size conflicts in UMAP visualizations and iterated through axis pairs to enhance data representation.
  • Clustering Function Development: Implemented a customizable HDBSCAN function for time-based data, including parameter tuning insights.
  • Exploratory Analysis: Designed loops to analyze cluster results, reporting on cluster sizes and noise to compare configurations.
  • Hyperparameter Explorer Design: Proposed a multivariate hyperparameter explorer for HDBSCAN to optimize clustering outcomes.

Achievements

  • Successfully adjusted HDBSCAN parameters and explored UMAP projections, improving clustering insights.
  • Resolved visualization conflicts and enhanced data representation through iterative plotting.
  • Developed a robust framework for exploring clustering configurations and optimizing HDBSCAN parameters.

Pending Tasks

  • Finalize the implementation of the hyperparameter explorer for HDBSCAN and test its effectiveness in real-world datasets.

Evidence

  • source_file=2025-05-17.sessions.jsonl, line_number=1, event_count=0, session_id=da5f6f16346cd76701105743d0d4183d87c89ee233235e3c4aa40c867825258e
  • event_ids: []