But your examples are way harder than they sound. Speech or non speech analysers have a hard time with context. What do you mean by "recent" photos. And what percentage? Of people wearing mask in each photo, or of 1 or more people with a mask in the whole set of photos? Or the percentage of photo having all people wearing a mask. We humans make a lot of deduction from context. We haven't been able to teach computers this aspect for 25 years. It has started more recently, deep learning shows potential.
Absolutely, but you could start with reasonable assumptions and refine from there. Just the percentage of individual humans wearing a mask. Recent = past week. Start somewhere. Give me an answer first, or ASK for clarification instead of "Sorry, I couldn't understand your query."
I understand it's a hard problem, but with current software capabilities and given Google's compute infrastructure I honestly think some of these things are well within the realm of what a team of several hundred Google engineers can do.
I'm not asking it to write Shakespeare, I'm asking it to crunch data with a sentence that could reasonably easily be parsed into a graph and be turned into a MapReduce query. I thought they were good at that stuff.
Context? I know it's hard, but I thought Google has been working on that. I'm very much adjusting my expectations to what I think Google should be able to accomplish in a decade. I expect some basic context capability now, at the very least these data-crunching type use cases.
There are two ways to look at this problem: (1) what is hard and what is easy for our current tech to do, and (2) what are the things that humans actually want to pawn off to assistants?
The problem is that those are two very different answers. I don't think I agree with the grandparent comment that some of those things should be easy, but I do think that comment contains good examples of the level of sophistication that would make an AI assistant more than just a curiosity for the couple times each day you prefer not to hit the button yourself (or if you listen to the radio while you shower).
Exactly, which is why these assistants are not very useful beyond simple tasks you could just do yourself. If they're only good at things that are easy for you to do, then what's the point of them besides not needing your hands to do simple tasks?
Recognizing masks is the hard part, indeed. But it can be done semi-decently by just analyzing tags (#wearamask or whatever).
The rest is easy, it could even be done completely on-device if you have a recent high end chipset. That's how powerful phones are nowadays.
Tbf, it is easier to just run your own server, and pre-program everything for yourself, a truly personalized experience.
Though Google could easily do it for the millions of people using Android. I really wish they allowed custom modules or something for their Assistant, the voice recognition is unmatched.
Context is hard, but it seems like “recent” means (99% of the time) order by date descending, grab the first 15 or so, and then how many of those photos contain a person with a mask.
Maybe the difficult part is whether or not you look for the 15 most recent photos containing people, or the 15 most recent photos of anything.
No, it doesn't mean that 99% of the time. It's more like 99% contextual.
If I ask for recent wildfire news and I'm in a state that doesn't experience wildfires often, are you going to return 15 news articles about wildfires spread out over the 200 year history of the state? I almost certainly want 15 news articles about the current wildfires in some other parts of the country. Your algorithm doesn't really say what to do here.
If I ask for "recent relatively rare astronomical event" recent might mean hundreds of years or more. If I ask for "recent PC game releases" it might mean a month or the current year. If I ask for "recent public events in my town" it might mean over the last week.
In many cases, "there are no recent events" is a better answer than "here are the last 15 events."