📅 2025-06-18 — Session: Developed AI prompt and evaluation strategy for Onyx Hammer
🕒 23:30–00:00
🏷️ Labels: AI, Prompt Engineering, Onyx Hammer, Rubric Design, Model Evaluation
📂 Project: Dev
⭐ Priority: MEDIUM
Session Goal
The session aimed to develop a comprehensive strategy for AI prompt engineering and evaluation, specifically focusing on the Onyx Hammer project and AI robustness in Physics.
Key Activities
- Reviewed the structural breakdown of the Onyx Hammer project, including its mission, compensation model, and current status.
- Outlined a strategy for prompt engineering and AI robustness evaluation, targeting weaknesses in AI models within the Physics domain.
- Crafted a complex AI prompt focused on atmospheric entry dynamics, highlighting the need for expert reasoning and modeling.
- Analyzed quiz items from Step 3 of the Onyx Hammer framework to understand AI model performance and failure criteria.
- Constructed a rubric for evaluating AI responses, emphasizing clarity, objectivity, and alignment with prompt demands.
Achievements
- Developed a detailed strategy for prompt engineering and AI robustness evaluation.
- Created an expert-level AI prompt for Physics modeling.
- Analyzed and clarified quiz responses and failure criteria for AI models.
- Constructed effective rubric criteria for AI evaluation.
Pending Tasks
- Implement the developed strategy and prompts in real-world testing scenarios.
- Monitor AI model performance using the constructed rubric and adjust criteria as needed.