Evaluation Results
LLM Response Quality Evaluation
Date:Jan 20, 2024, 10:30 AMID:
eval-2024-01-20-001Total Tests
25
Pass Rate by Provider
gpt-4:96.0%
claude-3:92.0%
llama-3:76.0%
Passed Tests by Provider
gpt-4:24/25
claude-3:23/25
llama-3:19/25
Total Tests by Provider
gpt-4:25
claude-3:25
llama-3:25
Test Results
Comparing 3 providers: gpt-4, claude-3, llama-3
▶
Uncategorized
2 tests • 1 passed
50.0%
Showing 2 test results across 1 category