LLM Comparison is a web application that allows users to compare responses from leading Large Language Models (LLMs) such as OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude. Users can input prompts, generate responses from various LLMs, and vote on their favorite responses. The app supports multiple versions of each model and includes visualizations of voting data using charts. To ensure fair usage, query limits are enforced for premium models, allowing up to three queries per day.
OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude were selected for this project because they are among the most popular and powerful Large Language Models currently available to the general public. You can read about the capabilities of the models and their different versions below, with descriptions directly from their respective documentation.
Models below with yellow text are premium models. These models are considered premium because querying them via APIs cost fairly more than other models, hence why users are only allowed to query premium models 3 times a day.
Model Descriptions from OpenAI:
| Model | Capabilities |
|---|---|
| GPT-4o mini | Affordable and intelligent small model for fast, lightweight tasks |
| GPT-4o | The fastest and most affordable flagship model |
| GPT-4 Turbo and GPT-4 | The previous set of high-intelligence models |
| GPT-3.5 Turbo | A fast, inexpensive model for simple tasks |
Model Descriptions from Google:
| Model | Capabilities |
|---|---|
| Gemini 1.5 Pro | Complex reasoning tasks such as code and text generation, text editing, problem solving, data extraction and generation |
| Gemini 1.5 Flash | Fast and versatile performance across a diverse variety of tasks |
| Gemini 1.0 Pro | Natural language tasks, multi-turn text and code chat, and code generation |
Model Descriptions from Anthropic:
| Model | Capabilities |
|---|---|
| Claude 3.5 Sonnet | Most intelligent model, highest level of intelligence and capability |
| Claude 3 Opus | Powerful model for highly complex tasks, top-level performance, intelligence, fluency, and understanding |
| Claude 3 Sonnet | Balance of intelligence and speed, strong utility, balanced for scaled deployments |
| Claude 3 Haiku | Fastest and most compact model for near-instant responsiveness, quick and accurate targeted performance |