📅 2025-06-18 — Session: Developed AI prompt and evaluation strategy for Onyx Hammer

🕒 23:30–00:00
🏷️ Labels: AI, Prompt Engineering, Onyx Hammer, Rubric Design, Model Evaluation
📂 Project: Dev
⭐ Priority: MEDIUM

Session Goal

The session aimed to develop a comprehensive strategy for AI prompt engineering and evaluation, specifically focusing on the Onyx Hammer project and AI robustness in Physics.

Key Activities

  • Reviewed the structural breakdown of the Onyx Hammer project, including its mission, compensation model, and current status.
  • Outlined a strategy for prompt engineering and AI robustness evaluation, targeting weaknesses in AI models within the Physics domain.
  • Crafted a complex AI prompt focused on atmospheric entry dynamics, highlighting the need for expert reasoning and modeling.
  • Analyzed quiz items from Step 3 of the Onyx Hammer framework to understand AI model performance and failure criteria.
  • Constructed a rubric for evaluating AI responses, emphasizing clarity, objectivity, and alignment with prompt demands.

Achievements

  • Developed a detailed strategy for prompt engineering and AI robustness evaluation.
  • Created an expert-level AI prompt for Physics modeling.
  • Analyzed and clarified quiz responses and failure criteria for AI models.
  • Constructed effective rubric criteria for AI evaluation.

Pending Tasks

  • Implement the developed strategy and prompts in real-world testing scenarios.
  • Monitor AI model performance using the constructed rubric and adjust criteria as needed.