Inside Meta’s race to beat OpenAI: “We need to learn how to build frontier and win this race”
In a desperate bid to outdo OpenAI in the AI arms race, Meta has been frantically searching for new data sources to feed its AI models. The company’s recent reliance on LibGen, a dataset of pirated scientific papers, has raised concerns about copyright infringement and regulatory risks.
An internal email revealed that Meta is keenly aware of these risks, stating “we need to learn how to build frontier and win this race” against OpenAI. This urgency is fueled by the realization that there are limits to how much copyrighted content can be used, a concern also echoed by other AI pioneers like OpenAI cofounder Ilya Sutskever.
As The Verge previously reported, Meta’s efforts to train its AI models have been hindered by a lack of available data. However, this has not deterred the company from exploring unconventional methods. According to an internal document, Meta employees suggested removing copyright headers and metadata to avoid potential legal complications.
The email also highlights concerns about regulatory risks, mentioning “policy risks” related to the use of pirated content. It warns that regulators might respond negatively to media coverage suggesting Meta’s involvement with copyrighted material, which could compromise its negotiating position on issues like bioweapons and CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosives).
The revelation has sparked concerns about potential legal complications, not just for the AI community but also for creators and publishers who risk having their work used without permission.
Source: www.theverge.com