Naive LLM judges are inconsistent. Run the same poem through twice and you get different scores (obviously, due to sampling). But lowering the temperature also doesn’t help much, as that’s only one of many technical issues. So, I developed a full scoring system, based on details on the logits outputs. It can get remarkably tricky. Think about a score from 1-10:
50.2% of artists did not respond to us
Марина Совина (ночной редактор)。关于这个话题,Snipaste - 截图 + 贴图提供了深入分析
Культовый актер боевиков получил восемь лет тюрьмы за изнасилования02:00。手游是该领域的重要参考
Share on Threads (Opens in new window)。关于这个话题,超级权重提供了深入分析
30-day money-back guarantee