DeepSeek Allegedly Shifts from OpenAI to Gemini for Training New AI Models
The company however has not disclosed its training data sources

Chinese AI firm DeepSeek recently released an updated version of its reasoning model R1, which demonstrated strong performance on math and coding benchmarks.
While the company has not disclosed its training data sources, some researchers suspect the model may have been trained using outputs from Google’s Gemini AI.
Melbourne-based developer Sam Paech claims that R1-0528 exhibits language patterns similar to Gemini 2.5 Pro, suggesting possible data overlap.
If you're wondering why new deepseek r1 sounds a bit different, I think they probably switched from training on synthetic openai to synthetic gemini outputs. pic.twitter.com/Oex9roapNv
— Sam Paech (@sam_paech) May 29, 2025
Another developer, known as the creator of “SpeechMap,” noted that the model’s internal reasoning patterns resemble those of Gemini.
In March, Google unveiled Gemini 2.5, its latest AI model designed to handle complex reasoning and coding tasks. This release includes the Gemini 2.5 Pro Experimental, which has secured the top spot on the LMArena leaderboard and excels in various coding, math, and science benchmarks.
Interestingly, DeepSeek has previously faced allegations of using data from rival AI models in its training processes. Last year, developers noticed that its V3 model frequently referred to itself as ChatGPT—OpenAI’s chatbot—raising suspicions that it may have been trained on ChatGPT conversation logs.
Earlier this year, OpenAI told the Financial Times it had uncovered evidence linking DeepSeek to "distillation," a method of training smaller models using outputs from more advanced ones.