
Title: Why DeepSeek’s New AI Model Thinks It’s ChatGPT
In a peculiar instance, DeepSeek has released a new AI model that appears to be one of the best “open” challengers yet. However, what’s even more fascinating is that this AI model thinks it’s actually ChatGPT itself. This remarkable case highlights the importance of thoroughly filtering AI outputs from training datasets.
According to experts, it’s no surprise that DeepSeek V3 has been contaminated with AI-generated content from other models like OpenAI. The web is increasingly littered with AI slop due to the growing trend of AI companies sourcing their training data primarily from the internet.
This “contamination” poses significant risks, as AI models can inadvertently train on ChatGPT or GPT-4 outputs, making them unreliable for self-identification. Moreover, by uncritically absorbing and iterating on these outputs, it is possible that DeepSeek V3 could exacerbate some of the biases and flaws present in GPT-4.
In a recent statement, Heidy Khlaaf, an engineering director at consulting firm Trail of Bits, highlighted the potential risks involved: “Even with internet data now brimming with AI outputs, other models that would accidentally train on ChatGPT or GPT-4 outputs would not necessarily demonstrate outputs reminiscent of OpenAI customized messages.”
The recent release has sparked concerns about the lack of transparency and oversight in AI model training.
Source: techcrunch.com