This is a great one. The manipulation is hard, but we're probably on a trajectory to be able to do it in 1-3 years if you were tolerant of some risk to the baby, but, of course, your tolerance for injuring babies is basically zero. I think 'risk & reliability' is a good potential category: there is the bar of 'got it to do a task reliably enough that we got a video' and the bar of 'got it to do a task reliably enough that I'd risk an infant in its grippers.
> but we're probably on a trajectory to be able to do it in 1-3 years
This is wildly optimistic. I quit working in robotics because I got tired of all the bullshit promises everybody made all the time. I'm not saying robotics isn't advancing or the work is unimportant, but the spokespeople are about as reliable as Musk when it comes to timelines.
I doubt it will happen in 10 years, even with a constrained environment and hardware that costs well into 6 digits.
I think GP was basically talking about doing it on a doll. As in, a robot in 1-3 years might be able to change diapers with occasional success, but half the tries will result in a dismembered diaper user: we'd use dolls in this scenario, since dismembering babies is taboo and generally frowned upon within the robotics community.
It will have to be a robot doll. Changing my baby's diaper was a piece of cake until he learned how to escape midway through the process. Babies can be surprisingly hard to restrain!
after fiddling for 10 minutes with the baby, while being late for the day job, because it KEEPS ON MOVING while your changing the FRIGGING diaper that is full of FECES I can assure you that my tolerance is clearly above zero :-)
> your tolerance for injuring babies is basically zero.
Um, no it's not. Is absolutely zero tolerance. There is not weasel words out of this. If a robot was to cause any pain to the baby, there would be no remorse. There would be no front of mind thoughts to not repeat the same thing the next time. There would be no guilt for causing pain to the baby.
Why you would "basically" this the way you have is disturbing.
Sorry, this is me communicating like an engineer. In a technical sense risk of anything can only approach zero: never actually get there. I meant that there should be essentially zero chance, similar to holding a baby in your arms or putting it in a high chair, and probably less chance of injury than driving in a car with a baby in a car-seat. Basically zero.
I don't think the parent comment advocates for hurting babies. It just, probably correctly, states that cherry picked examples won't be representative of roboty safety with infants in the next years, but that true safety will improve over time as well.
real world treatment of babies is very different from the zero tolerance you've described. From pregnant mothers smoking/drinking to medical care unavailability to doctor errors to various toxin contaminated baby products and the environment (Flint leaded water comes to mind) to babies left in hot cars and other abuse to poor availability of daycare (even less availability of daycare good for mental development) to ...
Granted most of this is unintentional. The same about injuries by robots - we're supposedly talking about unintentional injuries here. So, if robots save money/time/effort (like Flint water switch) i'm not sure that the society would suddenly change its current approach to unintentional baby injuries and implement zero tolerance.
To illustrate - Uber self-driving killed a woman, and another self-driving maimed a woman in SF. Uber case was an obvious criminal gross negligence running with explicitly disabled emergency braking), and the company wiggled out of it in part by having to shut down the self-driving. Where is in SF it was an obvious case of technology limitations and teething issue, so there were no real severe consequences as we're much more tolerant to honest technological accidents (at least when they happen not to us personally).
> From pregnant mothers smoking/drinking to medical care unavailability to doctor errors to various toxin contaminated baby products and the environment [...]
You don't even need to go so extreme. Driving involves risk. And so does getting out of bed at all (or staying in bed..)
If the chance of the robot hurting your kid becomes orders of magnitude smaller than the chance of getting hit by a freak asteroid, you can probably call that save enough, even if it's not strictly speaking zero.
Well that is simply not possible. Even mothers drop their babies to the floor sometimes (very infrequently, I hope). Even for humans the tolerance isn't zero.
I'd go for something in the manipulation of ropes or wires.
State of the art seems to be that they can untangle a loosely knotted cord.
Untying a short rope with a tightly pulled overhand knot in the middle seems like it's decades away. You have to be able to grip it well enough, then twist the rope and push (even though every physicist says pushing a rope is impossible).
Interesting. Futurism is super hard, but "decades" too far away to me. I think with strong 2 finger grippers this is probably close to state of the art, especially with a wrist force sensor, like the TRI setup.
Standard evening family home tidy/reset - toys, books, clothes, shoes away in their places. All over the house.
Oh, and load/unload dishwasher. Same with laundry machines. Along with folding laundry, these are the domestic robot equivalents of 'de-mining' and 'search and rescue': the classic motivating use cases for mobile autonomous robots.
You need to manipulate a large sheet, and you probably need to move around, bend down and lean over to reach all the corners. Bonus points for neat hospital corners on a flat sheet.
Putting pillows in pillowcases is another fun one. Usually pretty easy, probably a bronze medal.
Gold medal: put a UK super king size duvet inside a duvet cover. It's huge and awkward, there are buttons, and it's almost but not quite square (why??) so there's a good chance you'll get it round the wrong way and have to rotate it 90 degrees.
I think this is my favorite suggestion. All of them are super hard, but
- Bronze: pillow in pillow case
- Silver: Make a bed with fitted sheet, flat sheet and blanket
- Gold: put a duvet in a duvet cover
Feels like it would be a good option for a round 2...
Consider examples using building tools like screwing in a drywall screw, or hammering a nail, using a paint roller, caulking a sink, minor plumbing repair with a torch and solder. These differ enough in terms of forces, state changes, and combined dexterity/acuity (two-handed proprioception) from the windex, sandwich and key examples
How about something with unpacking items from a shopping bag, i suspect the difference in bags (standard plastic, reusable etc) and certain items can really crank the difficulty.
It can also create a good time of a story - open the door to get the grocery delivery, unpack the delivery etc.
Thanks for putting this together. This IMO is 1000x better than any other AI challenges to date. ARC-AGI is bullshit and has nothing to do with reality.
I’ll likely use some of your tests for our robotics (not just humanoid) testing in the future at least as some baselines.
Also I really liked you dressing up as a robot - that’s very fun and really reflects the point of robotics: replace human action for all tasks.
My suggestion: Identify a collapsed person in the home and render first aid (I need this because I have epilepsy and live alone)
Bronze: ID collapse and call emergency services
Silver: Bronze tasks + manipulate person into appropriate recovery position
Gold: Communicate details to emergency personnel and playback previous hours of interactions
Tie your shoe laces? Pet a cat. Open one of those scissors that are sold in those fused plastic boxes, requiring a scissor to open. Opening a packet of tissues (wet ones for extra challenge). Cook rice. Throw out the trash.
Maybe careful application of large amounts of force? Opening a jar, peeling garlic, splitting a squash, opening a soda can. This category seems like a good test of "grip" strength + force feedback + sense of touch.
I love your list and it makes me think we are so far away from these things ever being feasible/cost effective compared to just hiring a poor person to do it. And the world is making a lot of poor people right now.
something requiring navigating stairs while holding something full like a laundry basket. bronze - straight stairs, silver - one 90deg turn. gold - spiral.
something requiring co-ordination between 2 robots. think relay race which the olympics has. So say, moving a couch together.
btw love the idea and the silver body suit. good stuff.
Ooh. I like full body manipulation. Humans use hips & elbows to move laundry baskets. Two robot collaboration is good too. I wonder who I can convince to wear another silver suit.... :)
(HN link on Substack points at empty page instead of this one, at least before I made this comment.)
What I think is missing is marathon events. Biathalons and Triathalons.
We all know LLMs have a rather limited context window. Thus seeing robots do longer chains of events would be interesting to see that they're capable than a possibly rigged demo.
Something like: move a stack of boxes from one room to another. The boxes at the end also need to be stacked up. or how about pick up a box, go up some stairs, open a door, and put the box on a shelf on the other side.
Also, the real world is sloppy and messy and dirty and, to be real, kinda janky sometimes. Gold for unlocking a door with a key at a well-maintained office complex, (and opening it, and walking through it) is one thing, because facilities is going to replace the lock before it gets old and needs replacing, and we can assume the door fits in the frame properly so it doesn't need to be shoved or lifted up or yanked in order to be opened is easier than. But the real world is messy and sloppy and you gotta jiggle the key in just the right way in order to get it to work.
Closing the door (assuming the robots weren't raised in a robot barn) is also harder than it looks if the door is shitty and needs a proper slam in order to be fully closed. Also, the robot locking the door behind itself after it comes in.
Scanning a key card and opening a door, but the first try fails.
We're a long way from a general robot that can screw a simple screw together like you would to assemble Ikea furniture.
Object recognition.
Gather only the dishes from a messy coffee table and put them in the dish bin.
Pick up only the clothes from a messy floor and bed, and put them in the hamper.
Dump a hamper of clothes onto a table, and sort out stuff that doesn't want to go into the washing machine.
Terrain traversal.
Just walk 500 ft, but theres increasing levels of obstacles in the way.
We all saw Boston dynamics robot parkour videos, but what I want to see is a robot make it from the front door of Simpsons house to the kitchen in the back, but it's got to go through the living room, but it's hella messy, with Maggie and Bart and Lisa’s crap strewn all over, Homer’s got some beer bottles, some empty, some full, all over the floor and on the table, and all the robot has to do is walk from one side of the room to the far side of the room without stepping on anything, or knocking anything over. (Simpsons merely being a home layout that's familiar to most people. Doesn't need to actually be them.)
Ducking under a low ceiling. Climb over a barrier,
of varying shapes and sizes.
Other loocomotion. how much weight in its arms in front of it, holding a 5-lb briefcase with one hand while walking. Can it carry something on its back? What's the limit? Can it give piggyback rides?
A category for simulated. Let companies show off their robot's kinematics control systems, so have something on the level of CoppeliaSim, so the motors and the gears and the actuators are themselves simulated, vs a simple 3d video game where they are not. Plug their model into the simulated robot and see how well it just walks. If we remember QWOP, it's harder than it looks!
Obviously it's not going to be totally 100% accurate to the real world. The benefit of this is it lets people complete from all over world without having to replicate a very specific setup in the physical world, and compete from wherever they live am not have to fly to your facility to test, opening up a whole new world of contestants because they can now compete because they can afford it now.
At the end of the day, the most important challenge is, can it pick up a battery from the shelf, swap it with one of the two in its chassis, and put the dead one it just pulled out onto the charger?
Yeah, having been there during Andy's firing I can vouch that the thunderdome era of replicant actually intensified after he left (and lasted until boston was sold off and the rest of us moved into X) but nuance is hard and less funny and I did warn folks that the history is mostly wrong... ;)
There is exactly one joke by ChatGPT. For Stability AI's CEO's prediction that there are no longer any human programmers by 2028 I liked the idea of referencing Stanford's intro to CS class which is semi-famously a bell-weather for the tech industry. My original replacement class was "Growing food for Sustenance" which kind of worked but I thought was weak. I asked ChatGPT for alternates and it gave me about 15, of which "Barter Economics and Goat Management" was clearly the funniest, and annoyingly funnier than mine.
I had a dickens of a time with the ending. Having it end at the present seemed super abrupt (as it really feels like we are in the middle of a big shift) but I didn't really want to venture into my own predictions. One of my early readers had the suggestion of using prominent AI CEO/VC's predictions about the near future and treating them seriously as if they were inevitable fact, which I found very funny. And really this is all about amusing myself.
My Favorite Le Guin quote on science fiction is very appropriate"
SCIENCE FICTION IS OFTEN DESCRIBED, AND EVEN DEFINED, as extrapolative. The science fiction writer is supposed to take a trend or phenomenon of the here-and-now, purify and intensify it for dramatic effect, and extend it into the future. “If this goes on, this is what will happen.” A prediction is made. Method and results much resemble those of a scientist who feeds large doses of a purified and concentrated food additive to mice, in order to predict what may happen to people who eat it in small quantities for a long time. The outcome seems almost inevitably to be cancer. So does the outcome of extrapolation. Strictly extrapolative works of science fiction generally arrive about where the Club of Rome arrives: somewhere between the gradual extinction of human liberty and the total extinction of terrestrial life.
This may explain why many people who do not read science fiction describe it as “escapist,” but when questioned further, admit they do not read it because “it’s so depressing.
Thanks, Benjie! Great to see you here. I hope it's OK if I plug your excellent writings on robotics that I think everyone should check out: https://generalrobots.substack.com/