Developed AI prompt and evaluation strategy for Onyx Hammer

  • Day: 2025-06-18
  • Time: 23:30 to 00:00
  • Project: Dev
  • Workspace: WP 2: Operational
  • Status: In Progress
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: AI, Prompt Engineering, Onyx Hammer, Rubric Design, Model Evaluation

Description

Session Goal

The session aimed to develop a comprehensive strategy for AI prompt engineering and evaluation, specifically focusing on the Onyx Hammer project and AI robustness in Physics.

Key Activities

  • Reviewed the structural breakdown of the Onyx Hammer project, including its mission, compensation model, and current status.
  • Outlined a strategy for prompt engineering and AI robustness evaluation, targeting weaknesses in AI models within the Physics domain.
  • Crafted a complex AI prompt focused on atmospheric entry dynamics, highlighting the need for expert reasoning and modeling.
  • Analyzed quiz items from Step 3 of the Onyx Hammer framework to understand AI model performance and failure criteria.
  • Constructed a rubric for evaluating AI responses, emphasizing clarity, objectivity, and alignment with prompt demands.

Achievements

  • Developed a detailed strategy for prompt engineering and AI robustness evaluation.
  • Created an expert-level AI prompt for Physics modeling.
  • Analyzed and clarified quiz responses and failure criteria for AI models.
  • Constructed effective rubric criteria for AI evaluation.

Pending Tasks

  • Implement the developed strategy and prompts in real-world testing scenarios.
  • Monitor AI model performance using the constructed rubric and adjust criteria as needed.

Evidence

  • source_file=2025-06-18.sessions.jsonl, line_number=1, event_count=0, session_id=e66b7ff4255dd1c7e68f5c460064d99629d607343090e825ce95a704f5e128f7
  • event_ids: []