AI Companies Are Coming for Public Datasets

Public datasets can contain personally identifiable information (PII)—especially if they were not properly anonymised before release.

AI Companies Are Coming for Public Datasets

AI leaders from prominent companies such as Microsoft, OpenAI, CoreWeave, among others, are expected to seek faster access to public datasets to train AI Models, Reuters reported.

In written testimony submitted for a Senate Commerce Committee hearing titled "Winning the AI Race, Microsoft President Brad Smith says, "The federal government remains one of the largest untapped sources of high-quality and high-volume datasets. By making government data readily available for AI training, the United States can significantly accelerate the advancement of AI capabilities."

The question of whether public datasets should be freely accessible for training AI models remains a contentious issue. Even more debated is the finer point of determining which specific datasets should be permitted and which should be restricted. This will vary from country to country.

Public datasets can contain personally identifiable information (PII)—especially if they were not properly anonymised before release. PII includes names, addresses, phone numbers, emails, biometric data, and even indirect identifiers that, when combined, can lead to identification of individuals.

In recent months, AI companies have come under heavy scrutiny for training their models with the creative work of artists such as illustrators, photographers, and authors without giving due credit.

Many artists and media publications have already taken AI companies such as OpenAI to court.

AI industry leaders also plan to urge lawmakers not only for greater access to public datasets but also to streamline federal permitting processes to meet the growing energy demands of artificial intelligence systems.

"America’s advanced economy relies on 50-year-old infrastructure that cannot meet the increasing electricity demands driven by AI, reshoring of manufacturing, and increased electrification,"Smith adds.

Similarly, Sam Altman, CEO of OpenAI writes, "We want to build a brain for the world and make it super easy for people to use it, with common-sense restrictions to prevent harm."

He will argue that as AI advances, so does the need for more compute, better chips and energy capacity and sources.