Organizations embracing agents often fail to estimate the costs of testing their output, with the non-deterministic nature of results often leading to complex and expensive evals.