I see what you mean- you're saying that you can find 79mil cat videos on youtube, so you have 79 million examples of cat behaviours.
That might sound like a lot - but it's still not nearly enough to model a cat's behaviour. To convince yourself that this is the case, subtract 79 million from infinity. The number remaining is the number of cat behaviours that a model trained on 79 million youtube videos would never have seen and therefore not know how to deal with.
See, the point is not how much data you have- it's how much data you're missing. If the amount of data you have is a tiny part of the whole, then you can't model the whole very well.
It's already hard enough to train an image classifier to recognise still images (video frames) of cats. You're proposing to train some kind of model (it wouldn't be a classifier anymore) to recognise -and reason about- not only the likeness of a cat, but the relation of a cat with its environment; with arbitrary environments and arbitrary entities in those environments. And the cat is interacting with those arbitrary entities in the arbitrary environments in arbitrary ways.
Seriously, you're looking at an extravagantly large number that nothing we have right now can handle.
The quote is “to lose one is unfortunate, to lose two looks like carelessness.” One person misunderstanding me I can ignore, two in the same way in a short window is definitely a sign I communicated poorly.
Why do you believe there is infinite cat behaviour? Why would they evolve that?
Even if they did, the point of learning is to reduce a probability distribution from “everything is equally likely, from this cat pawing at a toy to pushing a pen in exactly the right way to forge my signature, from hunting for a mouse to mugging an old lady for a voting card and using it to cast a fraudulent vote in her name for the Natural Law Party at the next election” so the probably distribution — your expectations — fits in a finite brain and matches all one has seen (70 million videos only need to be 36 seconds on average to be a lifetime of nothing but cats).
That being the case, all one really needs to do for a ”common sense” understanding of cats is the set of things human are not surprised by cats doing.
As I said before, I don’t like the phrase “common sense” because it’s such a bad model for reality. That being the case, it doesn’t matter what a cat would do when, say, elected governor of a small town — common sense, right or wrong, would say “eat, sleep, meow” or similar. Probably varies by person, given how many complain that “people just don’t have any common sense these days”.
Edit: why do you think it’s hard to train a classifier to recognise cats? Google did that the unnecessarily hard way six years ago, now we have GANs that imagine into existence cat pictures, as a student project to help apply for a PhD: https://ajolicoeur.wordpress.com/cats/
>> Why do you believe there is infinite cat behaviour? Why would they evolve
that?
An infinity of behaviours is not a distinct ability that has evolved to
fulfill some purpose. Rather, it's the result of the animal interacting with
its environment. The number of possible such interactions is infinite - or,
well, most likely infinite.
An infinite number of combinations can arise from very simple processes- for
example, an automaton that generates strings in the aⁿbⁿ grammar (n a's
followed by n b's) can go on for ever. There is no reason to assume that a
complex mind in a complex environment will ever run out of combinations of
mind-states and world-states. Accordingly, there is no reason to believe we
will ever be able to collect examples of all those combinations, and represent
them in computer memory.
Edit: I'm not talking about a cat being elected governor here. Just ordinary
real-world behaviours, like all the ways a cat may chase a mouse, say. Try to
observe a cat and systematise its behaviour and see how welll you can do. Then
try to do it with a computer.
With a computer, it should be easier, right?
* * * *
I wouldn't say that the point of learning is to reduce infinity to something
manageable. I think it's more like animal minds, like ours, have some kind of
ability to pick out what is relevant to a learning task from the infinity of
available experiences (incidentally, that's the subject of my PhD thesis :).
However, even as our minds are able to perform this one simple trick, we have
no clue how we do it and can therefore not yet reproduce it with our machines.
The result is the current state of the art in machine learning: data hungry
algorithms that require loads and loads of computing power to reach top
performance. This reliance on large datasets and compute limits progress: so
far we've seen results only in situations were there is sufficient data and
computing power and always in restricted domains (cat videos, vs cats in the
wild). In problems where either there is not enough data, or the data is not
sufficient because the domain is too large and too unconstrained, like natural
language or modelling individual behaviour, progress has been much slower.
In short, modern machine learning substitutes quantity for quality, which has
proven successful in the short term but looks to be self-limiting in the long
run (even before the time when we're all dead). Eventually, we'll need to find
an alternative or progress will stall.
* * * *
>> Edit: why do you think it’s hard to train a classifier to recognise cats?
Yep, that's a good example of what I'm talking about.
The model in the link was trained on 9304 examples and it shows- you can see
the smudges and deformation in the high-res image (and the last one, sent by
another person). I can't find the original dataset, but the results look very
homogeneous, so they're basically just reproducing the training examples
faithfully without generalising well- in other words, overfitting. Which makes
sense: 9304 examples are maybe OK for a school project etc, but nowhere near
enough for a real-world application.
Not that I can see a real-world application for generating faces of cats, but
the point is that if you just want to train a small model to see how this sort
of thing works, then you can certainly do it with a few examples; but if you
want something useful, that approaches state-of-the-art performance then you
need access to a lot more data and a lot more computing power.
I think you're underestimating how hard it is to make machine learning
algorithms work well. It is worth reading announcements in the lay press and
claims in scientific papers with a critical, even strongly skeptical attitude.
Just because Google says that deep learning is the bees knees, don't just
accept it as fact. Try to repeat their feats on your own. See how far you get.
I'm assuming you haven't otherwise you wouldn't be asking that question :)
This is getting too long and detailed to use my mobile to keep replying in as much detail as it deserves. :)
I get the impression that either (1) you have a very different definition of “common sense” to me, or (2) you are no longer talking about it. Does this seem like a fair representation? If so, can you explicitly describe what you mean by “common sense”?
As for reproducing results: limited experience of simple things only. Full time job has gone from software to full-time carer for parent with Alzheimer’s, so I don’t have time for anything more complex than e.g. {train scikit-learn to read from scratch, then read all the digits in Shakuntala Devi‘s number, then calculating the answer to her famous question} and timing it as faster than the human visual system takes to go from a number appearing to conscious awareness of that.
You know, fun toy examples for whiteboard interviews.
Mainly I’m keeping up to date with the “Two Minute Papers” YouTube channel. Hopefully I’ll be able to apply to start a PhD when my family can take over care duties for me…
Quick edit: I think your definition of intelligence is equivalent to mine. Please elaborate why you disagree?
Sorry for the comment size! I use HN from a PC always and I tend to forget that's probably not the most common use.
You're right, I'm not talking about common sense. It was another commenter who mentioned it. I interjected that it's very hard to collect enough examples of "What cats do? How cats respond to things? How they interact physically with their environment?" to build a good model of cat behaviour.
I'll be honest and say I have no idea what is "common sense" in the context of cat behaviour. Not to mention that it's very difficult to agree on a definition of "common sense". Despite that, I think you'll find there's general agreement that machine learning models don't have anything that could be recognised as "common sense". One reason for that is that it's extremely difficult to collect training examples of "common sense", exactly because it's so very hard to define it.
Apologies if that was too much of a sidetrack from what you wanted to discuss!
I actually don't have a definition of intelligence :) I'm working off an assumption that there is such a thing, that it's one process or one set of processes and that we may be able to reproduce it on computers, at some point in the future. But not in my life time.
The great advantage of doing a PhD is that you have plenty of time to read up and experiment to your heart's content. I hope it all goes well for you and you can soon start your studies.
I'm sorry to hear about yoru parent. You both have my sympathy. Hang in there.
Thanks! I think we’re basically in agreement then, as all of my responses were predicted on the incorrect belief that you were using common sense as an argument against AI.
I certainly agree that humans can accurately extrapolate — for example what a cat is likely to do next — with what seems like less data than any current machine learning system.
I suspect have my suspicions why, but to keep this short I’ll only say “catastrophic forgetting”, and separately that the normal approach in ML seems to be like teaching kids “by asking them random questions from the set of things we expect them to know at 18”, to almost-quote one of the podcasts I listen to.
“Hey Siri show me cat videos”
“Here are some videos I found of ‘cat’ on the web:” «Link to YouTube page saying “About 79,900,000 results”»