Abstract: Real-time tasks often exhibit correlated execution-time distributions due to common factors such as shared caches, resources, and inputs. Yet state-of-the-art probabilistic analysis still ...
Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.