Seen in → No.152
If you want to get a good idea (or new proof) of the kind of early-stage (half-baked?) AI that’s being created and put to use in society right now, this is a great one. The problem known as data shift (mismatch between testing and real-life data) is already well known, now 40 researchers at Google are finding that underspecification is a big issue. In something bringing to mind the discussions around black boxes, they’ve found that a lot of models can test successfully but then prove completely ineffectual or vary greatly in performance in the real world. Sometimes simply because the random training data was slightly different. Unsurprising conclusion: they need to test a lot more and get better at specifying requirements ahead of time. “D’uh!” Comes to mind.
The training process can produce many different models that all pass the test but—and this is the crucial part—these models will differ in small, arbitrary ways, depending on things like the random values given to the nodes in a neural network before training starts, the way training data is selected or represented, the number of training runs, and so on.
In other words, the process used to build most machine-learning models today cannot tell which models will work in the real world and which ones won’t.