Developed AI prompt and evaluation strategy for Onyx Hammer
- Day: 2025-06-18
- Time: 23:30 to 00:00
- Project: Dev
- Workspace: WP 2: Operational
- Status: In Progress
- Priority: MEDIUM
- Assignee: Matías Nehuen Iglesias
- Tags: AI, Prompt Engineering, Onyx Hammer, Rubric Design, Model Evaluation
Description
Session Goal
The session aimed to develop a comprehensive strategy for AI prompt engineering and evaluation, specifically focusing on the Onyx Hammer project and AI robustness in Physics.
Key Activities
- Reviewed the structural breakdown of the Onyx Hammer project, including its mission, compensation model, and current status.
- Outlined a strategy for prompt engineering and AI robustness evaluation, targeting weaknesses in AI models within the Physics domain.
- Crafted a complex AI prompt focused on atmospheric entry dynamics, highlighting the need for expert reasoning and modeling.
- Analyzed quiz items from Step 3 of the Onyx Hammer framework to understand AI model performance and failure criteria.
- Constructed a rubric for evaluating AI responses, emphasizing clarity, objectivity, and alignment with prompt demands.
Achievements
- Developed a detailed strategy for prompt engineering and AI robustness evaluation.
- Created an expert-level AI prompt for Physics modeling.
- Analyzed and clarified quiz responses and failure criteria for AI models.
- Constructed effective rubric criteria for AI evaluation.
Pending Tasks
- Implement the developed strategy and prompts in real-world testing scenarios.
- Monitor AI model performance using the constructed rubric and adjust criteria as needed.
Evidence
- source_file=2025-06-18.sessions.jsonl, line_number=1, event_count=0, session_id=e66b7ff4255dd1c7e68f5c460064d99629d607343090e825ce95a704f5e128f7
- event_ids: []