Refactored LLM Evaluation to Tutoring System

Day: 2025-05-13
Time: 00:05 to 00:20
Project: Teaching
Workspace: WP 1: Strategic / Growth & Development
Status: In Progress
Priority: MEDIUM
Assignee: Matías Nehuen Iglesias
Tags: LLM, Jinja2, Python, Tutoring, Pedagogy

Description

Session Goal

The primary goal of this session was to refactor the existing LLM evaluation system to a more modular and tutoring-focused approach, enhancing the educational experience for computer science students.

Key Activities

Streamlined LLM Evaluation Design: Implemented a modular design using Jinja2 templates and a Python evaluator class to improve maintainability and robustness.
Modularization of evaluator.py: Proposed separating the instantiation of the system from student input, allowing for character customization and instruction reuse without hardcoding.
Refined Evaluation Prompt Structure: Enhanced the evaluation prompt’s readability and alignment with ChatCompletion formats, ensuring clear separation between system instructions and user inputs.
Transformation to Tutoring Focus: Adjusted the evaluation prompt to a tutoring focus, promoting active understanding and critical thinking in computer science students.
Pedagogical Shift: Proposed a pedagogical shift from evaluator to tutor, emphasizing guidance and student support over direct correction.

Achievements

Successfully designed a modular and reusable evaluation system.
Developed a refined prompt structure that supports both evaluation and tutoring.

Pending Tasks

Further testing and integration of the new tutoring-focused prompts with existing LLM tools to ensure compatibility and effectiveness.

Evidence

source_file=2025-05-13.sessions.jsonl, line_number=1, event_count=0, session_id=2c183da0a9a50d7142e5561c53184540650217944658ad26efdf3e0e968b2d16
event_ids: []

M.I. Journal

Journal Entries

Frequent Keywords

Refactored LLM Evaluation to Tutoring System

Refactored LLM Evaluation to Tutoring System

Description

Session Goal

Key Activities

Achievements

Pending Tasks

Evidence

Graph View

Table of Contents

Backlinks