Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The biggest issue with using them in this way is how alien the failure modes are.

Interpretable models with transparent loss functions are easy to grok.

How LLMs might fail on a classic task is (afaict right now) difficult to predict.



What is not transparent in the cross-entropy loss used in a large number of deep nets?


I think there was a breakdown in communication here.

If I train a classic deep net as a classifier and there are 5 possible classes, it will only ever output those 5 classes (unless there's a bug).

With ChatGPT, for example, it could theoretically decide to introduce a 6th class - what I would call an alien failure mode, even if you explicitly told it not to.

I think formally / provably constraining the output of LLM APIs will help mitigate these issues, rather than needing to use an embedding API / use the LLM as a featurizer and train another model on top of it.


Formal proof is problematic because English has no formal specification. Some people are working on this, it's a nascent area bringing formal methods (model checking) to neural network models of computation. But it's an interesting fundamental issue that arises there, if you can't even specify the design intentions then how do you prove anything about it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: