Study Shows Large Language Models Can Identify Anonymous Users Online

The study demonstrates how AI systems can analyze fragments of information from posts, comments and online profiles to identify individuals behind anonymous accounts.

Study Shows Large Language Models Can Identify Anonymous Users Online
(Image-Freepik)

A new research paper published on arXiv by researchers from ETH Zurich, Anthropic and the Machine Learning Alignment & Theory Scholars Program (MATS) warns that large language models (LLMs) could significantly weaken online anonymity by linking pseudonymous accounts to real-world identities using publicly available data.

The study, titled Large-scale online deanonymization with LLMs, demonstrates how AI systems can analyze fragments of information from posts, comments and online profiles to identify individuals behind anonymous accounts. Researchers built an automated pipeline that extracts identity clues from user-generated content and matches them with potential real-world profiles.

"Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered," the researchers said.

The system first uses an LLM to identify signals such as occupation, location hints, education history and interests embedded in online discussions. It then performs semantic searches across public platforms to locate possible matching profiles before verifying the most likely identity through additional reasoning steps.

To test the approach, the researchers ran experiments linking users from platforms like Hacker News to professional profiles on LinkedIn, as well as matching pseudonymous accounts across different online communities.

Results showed the AI system could identify users with up to 68% recall at 90% precision, outperforming traditional deanonymization techniques that rely on structured data or manual investigation.

The findings highlight growing privacy concerns as AI tools become more capable of aggregating and interpreting scattered digital footprints. Researchers say the work underscores the need for stronger privacy safeguards and updated threat models in an era where AI can automate large-scale identity inference across the internet.