I think you're both right but are pointing at different ways to approach the intelligence problem. This is kind of the Connectionist vs Symbolic debate.
The fundamental question is, is a representational (contextual) bootstrap required in the long run for a contained computational system to perform at human level across a large number of domains? This isn't a solved problem.
So yes, AI would be better if it could "figure things out by itself" however humans don't "figure things out by themselves" they come pre-wired with a lot out of the box and a lot of help cleaning the data (parents, teachers, literal labels etc...)
The fundamental question is, is a representational (contextual) bootstrap required in the long run for a contained computational system to perform at human level across a large number of domains? This isn't a solved problem.
So yes, AI would be better if it could "figure things out by itself" however humans don't "figure things out by themselves" they come pre-wired with a lot out of the box and a lot of help cleaning the data (parents, teachers, literal labels etc...)