Finding the best Large Language Model for a specific problem can be challenging due to the varying strengths and weaknesses of different LLMs. This makes it difficult to determine which one will perform optimally in a given scenario. Additionally, the output of LLMs can differ significantly based on the prompt used. Experimenting with multiple prompts and comparing their results can be a tedious process, making it hard to identify the most effective approach.

To simplify this process, llm-model-comparison allows creating Experiments to manage multiple model configurations and prompt variations with ease. By creating an Experiment file, which outlines the desired models, style prompts, and additional text, users can automate the process of generating output files for each combination.

The tool uses 🦙 Ollama to allow running LLMs locally. This enables testing models in isolation without relying on cloud-based services or external resources.

The resulting output is then compiled into a single Markdown document, with distinct subtitles per model and style prompt. This final output file presents a concise summary of each experiment’s results, making it easier to compare and contrast the performance of different LLMs and configurations.