Here's the most recent talk by Karpathy about this topic: https://www.youtube.co...

Here's the most recent talk by Karpathy about this topic: https://www.youtube.com/watch?v=hx7BXih7zx8 (there are more if you google for "karpathy talk")

Here's the summary via 2 examples.

1. Example of gathering data that needs further labeling

To implement neural network (NN) to recognize stop signs they program the cars to recognize things that look like stop signs and deploy that to the fleet of over 600 thousand cars on the road. The cars send those "might or might not be a stop sign" images back to tesla and they get manually labelled and added to a training set. They gather enough images and the "recognize stop signs" feature is done. Apply the same logic to other recognition tasks: recognize cars, pedestrians, animals, speed signs, traffic lights etc.

Yes, labeling is expensive but everyone else (including Waymo) has the same cost.

But Waymo has even bigger cost of driving around to collect those images.

Tesla makes gross profit on every car they sell and they got 600 thousand people driving for them for free. Waymo has 600 cars, each of them reportedly costing over $200k and they have to pay drivers at least minimum wage for each hour of driving.

That's why Google is reportedly is spending $1 billion a year on Waymo and why they need $3 billion of additional investment to keep going.

2. Example of gathering data that doesn't need labeling

Consider implementing NN to recognize cut in i.e. other cars entering your lane in front of you.

They deploy a first version trained on small sample, running in shadow mode i.e. it makes predictions for cut ins but doesn't act on it.

When it makes wrong prediction, it sends the clip of the cut in back to Tesla. This doesn't need manual labeling. They know whether the car did cut in or not so they can rewind time and automatically label the past car action.