"OpenAI's Hunger for Data Is Coming Back to Bite It"

Following a temporary suspension in Italy and several investigations in other European Union countries, OpenAI has just over a week to comply with European data protection laws. Failure to comply could result in costly fines, data deletion requirements, or even a ban. However, experts say that OpenAI's compliance with the rules will be nearly impossible. This is because the data used to train its Artificial Intelligence (AI) models was collected by scraping the Internet for content. The dominant principle in AI development is that more training data is preferable. The data set for OpenAI's GPT-2 model consisted of 40 GB of text. ChatGPT is based on GPT-3, which was trained on 570 GB of data. OpenAI has not disclosed the size of the data set for its most recent model, GPT-4, but the company's desire for larger models is now coming back to haunt it. Several Western data protection authorities have launched investigations into how OpenAI collects and processes the data that powers ChatGPT in recent weeks. They believe it has extracted and used the personal information of individuals without permission, such as their names and email addresses. This article continues to discuss OpenAI's AI services potentially breaking data protection laws and why it could be impossible for the company to comply with data protection rules.

MIT Technology Review reports "OpenAI's Hunger for Data Is Coming Back to Bite It"

Submitted by Anonymous on Thu, 04/20/2023 - 12:45