ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

Author: Patrick Feb 22,2025

OpenAI suspects that DeepSeek, a Chinese AI model significantly cheaper than Western counterparts, may have been trained using OpenAI's data. This revelation, coupled with DeepSeek's rapid rise in popularity, triggered a sharp decline in the stock prices of major AI companies, most notably Nvidia, which experienced its largest single-day loss in history.

The relatively low cost of DeepSeek's R1 model—estimated at $6 million—compared to the billions invested by American tech giants, has raised concerns about the viability of their current AI development strategies. DeepSeek claims its model leverages the open-source DeepSeek-V3, requiring less computing power than Western alternatives.

OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by using its API or employing "distillation"—a technique to extract data from larger models—to train its own. OpenAI confirmed its awareness of such efforts by Chinese and other companies to replicate leading U.S. AI models and stated its commitment to protecting its intellectual property.

David Sacks, President Trump's AI czar, suggested evidence points to DeepSeek's use of OpenAI models through distillation. He anticipates countermeasures from leading AI companies to prevent this practice.

The situation highlights a significant irony: OpenAI, itself accused of utilizing copyrighted internet content to train ChatGPT, is now accusing DeepSeek of a similar violation. This hypocrisy has been widely noted on social media. OpenAI's previous assertion that creating AI models like ChatGPT without copyrighted material is "impossible" further fuels the debate.

This controversy underscores the ongoing legal battles surrounding the use of copyrighted material in training AI models. Lawsuits from the New York Times and 17 authors, including George R.R. Martin, against OpenAI and Microsoft highlight the contentious nature of "fair use" claims in the rapidly evolving field of generative AI. Furthermore, a previous U.S. Copyright Office ruling that AI-generated art cannot be copyrighted adds another layer of complexity to the legal landscape.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.