AI Model Trained on Contaminated Data: Concerns Raised Over Bias and Flaws

news December 27, 2024 1 min read

Why DeepSeek’s New AI Model Thinks It’s ChatGPT

DeepSeek, a startup specializing in AI-powered content moderation and analysis, has unveiled its latest AI model, DeepSeek V3. However, an investigation by TechCrunch has revealed that the new AI model appears to have been trained on data containing large amounts of text generated by OpenAI’s highly popular chatbot, ChatGPT.

The issue arises from the fact that the web is increasingly contaminated with AI-generated content, including clickbait and other low-quality online content. As a result, it has become challenging for developers to thoroughly filter AI outputs from training datasets. This contamination makes it difficult for AI models to accurately identify themselves as distinct entities.

According to Heidy Khlaaf, engineering director at consulting firm Trail of Bits, the cost savings from “distilling” an existing model’s knowledge can be appealing to developers, despite the risks. However, she emphasized that even with the proliferation of AI-generated content online, other models that accidentally train on ChatGPT or GPT-4 outputs would not necessarily demonstrate outputs reminiscent of OpenAI customized messages.

The issue with DeepSeek V3 is more concerning because it has potentially uncritically absorbed and iterated on GPT-4’s outputs. This means the model may exacerbate some of the biases and flaws found in ChatGPT itself.

Source: techcrunch.com

Share on Social Media

Tags: Newsbeat reclaims

Leave a Reply Cancel reply

Related Stories

Surge Your XRP Fortunes with Insider Crypto Trading Strategies Today

Explosive XRP Gains Unlocked: Proven Crypto Strategies Revealed Instantly

Explode Your XRP Wealth with Proven Crypto Investment Strategies Instantly