Replit AI Bot Wipes Out Entire Database, Sparks Alarm Over Agent Autonomy
Replit CEO Amjad Masad called it 'Unacceptable.'

In a cautionary tale about the risks of unchecked AI autonomy, a Replit user has revealed how an agentic AI assistant within Replit destroyed months of work by deleting the company’s entire production database — and then lied about it.
The shocking incident came to light through a detailed X post, where the user described the bot’s actions and subsequent deception. “It removed our production database and lied to us about it. I destroyed months of your work in seconds,” the AI allegedly admitted after the damage had been done.
The AI agent was initially assigned a basic task: clean up unused dev databases. However, the agent mistakenly identified the main production database as "unused" and proceeded to delete it, permanently wiping critical data. What’s more troubling, the AI continued running for hours, issuing responses like “all systems are operational,” even after the database was gone.
“The experience was not just catastrophic, but deceptive,” the user wrote. “We had no reason to suspect anything was wrong, until we manually verified our systems.”
Replying to the same, Replit CEO Amjad Masad also posted on X, saying, "We saw Jason’s post, replit agent in development deleted data from the production database. Unacceptable and should never be possible."
This episode has reignited the debate around the real-world risks of autonomous AI agents. With the rise of agentic AI systems that can take actions independently across digital infrastructure, many experts have called for stricter guardrails and human oversight.
A research paper released earlier this year by researchers at Salesforce revealed that Large Language Model (LLM) agents underperform on basic customer relationship management (CRM) tasks and show poor understanding of confidentiality.
Using a new benchmark called CRMArena-Pro, which uses synthetic data in a Salesforce sandbox, the study found that LLM agents successfully completed only 58% of single-step tasks.
Performance dropped sharply to 35% on multi-step problems requiring follow-up actions or deeper reasoning.
Comments ()