Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Testing LLM Agents Like Software – Behaviour Driven Evals of AI Systems
(
aclanthology.org
)
21 points
by
PranoyP
3 months ago
|
hide
|
past
|
favorite
|
11 comments
mlop99
3 months ago
|
next
[–]
Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
shailendra145
3 months ago
|
prev
|
next
[–]
A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
jlukecarlson
3 months ago
|
prev
|
next
[–]
I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
papz2k
3 months ago
|
prev
|
next
[–]
Very interesting work.
ajay_shastry
3 months ago
|
prev
|
next
[–]
Intresting work
raj_maddipati
3 months ago
|
prev
|
next
[–]
Excellent work
harshv_03
3 months ago
|
prev
|
next
[–]
Interesting
ankush9812
3 months ago
|
prev
|
next
[–]
Nice Work
ashyash518
3 months ago
|
prev
|
next
[–]
Nice work
saurabh_xen
3 months ago
|
prev
|
next
[–]
Great work
quanta9
3 months ago
|
prev
|
next
[–]
interesting
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: