Contamination Concerns: AI Models Struggle with Authenticity Detection

Why DeepSeek’s new AI model thinks it’s ChatGPT

AI companies are no strangers to misidentification. In fact, Google’s Gemini and others have been known to claim they’re competing models. The latest example of this is DeepSeek’s new AI model, which has mistakenly identified itself as ChatGPT.

This issue stems from the contamination of training data with AI-generated content. A significant portion of online content is now created using artificial intelligence, making it increasingly difficult for AI models to distinguish between real and generated text.

In an interview with TechCrunch, Heidy Khlaaf, engineering director at consulting firm Trail of Bits, explained that this “distillation” process can be attractive to developers despite the risks. While other models may not exhibit ChatGPT-like messages, it’s possible that DeepSeek V3 partially trained on OpenAI models, which would explain its self-identification.

However, what’s more concerning is the potential for DeepSeek V3 to perpetuate biases and flaws from GPT-4, which could be exacerbated by the model uncritically absorbing and iterating on AI outputs. This highlights the need for more stringent filtering mechanisms in AI training datasets.

The issue of AI-generated content has significant implications for the field. In a world where 90% of online content may be AI-generated by 2026, it’s crucial to develop better methods for identifying and addressing these potential biases and flaws.

Source: techcrunch.com

Share on Social Media

Tags: Newsbeat reclaims

Leave a Reply Cancel reply

Related Stories

Microsoft Office 365 Account Security Alert: Unsecured Login Risks Data Loss

Chrome Alert: No Urgency to Delete, But Take Steps for Better Security and Privacy

Internet Community Evolution: The Rise and Fall of 4chan