Note — Jun 06, 2021

The Race to Understand the Thrilling, Dangerous World of Language AI

Seen in → No.176

Source →

The excellent Karen Hao at MIT Technology Review looks at Google’s new LaMDA large language model (LLM), existing ones like GPT-3, and the numerous pitfalls and problems we already know about. Hao also includes the implications of Gebru and Mitchell’s firing by Google, Safiya Noble’s Algorithms of Oppression, and the fantastic (from what little I know) scientific collaboration project by BigScience “for an open-source LLM that could be used to conduct critical research independent of any company.”

[S]tartups are creating dozens of products and services based on the tech giants’ models. Soon enough, all of our digital interactions—when we email, search, or post on social media—will be filtered through LLMs. […]

Another will focus on developing responsible ways of sourcing the training data—seeking alternatives to simply scraping data from the web, such as transcribing historical radio archives or podcasts. The goal here is to avoid toxic language and nonconsensual collection of private information. […]

Every data point and every modelling decision is being carefully and publicly documented, so it’s easier to analyze how all the pieces affect the model’s outcomes. “It’s not just about delivering the final product,” says Angela Fan, a Facebook researcher. “We envision every single piece of it as a delivery point, as an artifact.”