Health
Teaching the iPhone to Drive
The coming singularity in machine vision.
Posted December 3, 2012
*This article co-authored with Los Alamos neuro-physicist Michael Ham
This is a story about a coming singularity.
For those unfamiliar, the term “singularity,” comes from astrophysics, where it’s technically the spot in a black hole where matter is crushed to a tiny point with infinite mass and no volume; and metaphorically an event horizon—a point beyond which we cannot see.
In the 1950s, mathematician John von Neumann, applied this metaphor to technology, writing: "[The] ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue."
Ray Kurzweil, author of The Singularity is Near and the term’s greatest popularizer, referred to the singularity as the point in time when computers become smarter than humans.
The singularity we’re describing is nothing quite this dramatic, but no less revolutionary. Very soon, arguably within the next five years, we will cross a line and computers will begin to see better than humans.
What does this mean? Well, right now computers are mostly trapped in a digital universe—they can’t yet make direct sense of our analog world. Some sort of human intervention is still required.
The iPhone’s Siri is an example. By talking to your iPhone, Siri converts an analog input (your voice) into a digital response, but the process—while amazing—still requires a human.
In machine vision, other than in extremely cumbersome systems like LIDAR—the main eyes for Google’s autonomous car—the ability to do without human involvement doesn’t yet exist in any realistic capacity.
By realistic, what I mean is that the LIDAR system is a) very expensive b) rather cumbersome. In other words, it doesn’t fit in your iPhone.
But if the iPhone could process the data from its camera with the same accuracy as a human, it could drive your car. In short, this is the visual singularity.
And we’re getting closer. Both license plate detection and facial recognition are computer vision tricks that already work. But they’re limited algorithms—they do one thing very well, but not lots of things. You can’t plug your iPhone into your Roomba and tell it to clean up the dirt, but not the Legos.
Two forces are changing this and, as it turns out, these are the same two forces that drive all singularities.
The first is exponential curves. Moore’s Law, Butter’s Law, etc. The same acceleration in computational horsepower that drove Siri’s breakthrough is powering machine vision’s evolution. The difference is that speech detection is a megabyte problem, while machine vision is a gigabyte problem. But, as our computers continue to get faster, this problem goes away completely.
The second is data—a critical mass of data.
We have found the easiest ways to ape human abilities is to train them up. For example, it was the massive growth in websites (i.e.-digitalized text) that allowed the text-reading singularity (the point at which machines could read as well as humans) to occur. Similarly, huge amounts of human digitized speech were required to achieve the speech singularity (aka Siri). Likewise, without Youtube and the 72 hours of video uploaded each minute, the coming visual singularity would be impossible.
Along these lines, last June, Google connected together 16,000 computer processors into a giant machine vision learning neural net and let them loose on YouTube. The result, as the New York Times pointed out, was the network taught itself to recognize cats.
Why? Simple…. There are tons of cat videos on YouTube. So that’s one of the things it saw a lot of. Just the way an infant learns to recognize the objects they see every day.
The cat story got around. What most people missed in that Times piece was the fact that Google’s machine vision algorithm performed far better than anything else that had come along—roughly doubling its accuracy (while recognizing objects from a list of some 20,000 items) on its way to cat detection.
This doubling—well that’s exponential growth. Visible exponential growth. What it means is that while machine vision has been on an exponential curve for a while, it’s been below the knee of the curve, where those doublings are mostly invisible. Google’s success puts the arc much closer to the knee—it means we’re getting ever closer to sight as we humans know it.
From a different perspective, when we’re talking about sight as we humans know it, we’re talking about an acceptable error threshold. The human visual system is pretty good. Not great, but more than enough to keep us around these past 200,000 years. For that very reason, it’s error rate is acceptable to us.
But it has limits. Human vision gets tired. In experiments performed at Los Alamos National Laboratory, when humans were asked to perform object recognition tasks, the experiments were kept under an hour so as not to get to the point where the subjects could no longer focus on the task. Google’s machine ran for a week over millions of images, long past the point any human could hope to keep up.
Once this threshold is crossed, the impact on society will be significant.
Right now, for example, we have the Da Vinci surgical robot. Amazing invention. Da Vinci helps surgeons perform everything from cardiac bypasses through gastric bypasses with far more precision and less collateral damage than an unaided human. But the Da Vinci still needs human involvement. It’s ability to perform the actual surgery is hands far better than our hands, but it needs to borrow our eyes. But when machine vision becomes better than human vision—the surgeon becomes obsolete.
Okay, not completely obsolete, we’ll still need their knowledge and research skills. Yet, IBM has sent Watson (the Jeopardy –winning supercomputer) to medical school. It’s being loaded with as much medical data as possible. The results will put an incredibly powerful diagnostic device into the cloud. Couple that diagnostic device to better-than-human machine vision (and lab-on-a-chip microfluidic analysis) and it’s not just surgeons who are out of a job.
Doctors too. Right now, diagnostic error for human doctors is 45 percent. That means that if you go to your doctor three times—the percentages say he got things wrong on one of those visits. We already have Watson, the lab-on-a-chip tech is a few years out as well (see the Qualcomm Tricorder X Prize). Machine vision will complete the triumvirate. The results will change health care forever.
Truthfully, it’s not just health care. Once machines are capable of visually interacting with the world will unlock a trove of technologies that are now only science fiction.
So, Siri, drive me to work while I finish watching the last twenty minutes of Terminator.