Computers Are Learning to Read—But They’re *Still* Not So Smart

The Mad-Libs-esque pretraining task that BERT uses — called masked-language modeling — isn’t new. In fact, it’s been used as a tool for assessing language comprehension in humans for decades. For Google, it also offered a practical way of enabling bidirectionality in neural networks, as opposed to the unidirectional pretraining methods that had previously dominated […]