AI startup creates fresh data for businesses

Translate from : AI-startup skaber friske data til virksomheder
The AI companies lack new data to feed their models with, and now the solution may be to let other AI models create new data to feed the others with. You could call it circular AI training. Gretel AI's cofounder Golshan reveals how the company creates synthetic data to feed hungry AI models.

Since OpenAI's ChatGPT kicked off the generative AI boom in 2022, it has been clear that the right data, and enough of it, is essential to create an AI model that is accurate, reliable and effective. The problem? The best data, especially specialized "expert" data in specific fields like health and finance, is rare.

Gretel AI data generation.jpg

AI companies have scoured the internet for fresh information, but AI models are constantly hungry and need to be fed. San Francisco-based startup Gretel AI has long believed that the most satisfying solution is to create artificially generated data that resembles the characteristics of real data. Gretel helps clients such as EY, Google and the US Department of Justice generate synthetic data, that is, artificially generated data that mimics the properties of real data.

Gretel AI Startup

And it's getting easier to do: Today, for example, Gretel announced the wide availability of a generative AI-powered system that lets users create synthetic datasets for tabular data, think text and number data that goes in columns and rows, like Excel spreadsheet, with just one natural language command.

For example, if a bank wants to create a synthetic data set that resembles their own customer data, they can use Greta's Navigator product to create millions of fictitious names, IDs, dates, dollar amounts, and account balances. The resulting computer-generated data does not violate customer privacy as it contains no real customer information, and can generate enough data to train a powerful, accurate model.

In an era of data scarcity, where companies have to look to other sources to build general models or fine-tune specific tasks, synthetic data is having a moment in 2024, Gretel co-founder and CEO Ali Golshan tells Fortune. Gretel's latest product allows companies to generate data even if they lack information.

The technology focuses on highly specific data intended to improve individual tasks within a client's internal systems. Gretel is not alone in trying to conquer the market for generating synthetic data for AI models. Startups like SynthLabs, Synthetaic and Clearbox AI are all vying to provide businesses with all the data they need, computer-generated, of course.

Gretel AI data.jpg

Gretel's next big step is to build a synthetic data and model exchange that will take the company to the next level and make them the secure interface for private data. Data is the driving force behind AI. And Gretel AI is certainly on its way to becoming one of the most innovative players in this field.

This has led Golshan and his co-founders to consider the future. He says companies will soon be able to make money by allowing others to buy synthetic data trained on the organization's unique dataset. For example, organizations that have lots of data but don't build AI models could sell others access to their data to help train their synthetic data.

With this in mind, Golshan said Gretel's next big step is to build an exchange of synthetic data and models. "We want to enable companies and customers to train models on their data, get mathematical guarantees that the data is secure, and someone can come and 'subscribe' to that model, generate data and pay as you go," he explained.

This, he added, will take Gretel to the next level to "become the secure interface for private data, removing this exploitative approach to mining and harvesting data." It would also mean that companies like Anthropic and OpenAI, which have built huge AI models based on massive amounts of data, wouldn't have to enter into licensing agreements with every single company they want to get data from, he said.

As for funding, Gretel has raised a total of US$68 million with its Series B due in 2021. Golshan said the startup has a lot of money left, with "about two years of operations ahead of us." But in this "moment" for synthetic data, he says he sees an opportunity to build the next Databricks or Snowflake, two of the biggest data-cloud platforms, or even OpenAI.

"We're going into it pretty aggressively because we have a big draw," he said. "We envision building the next secure, high-quality data business, which, if you think about the needs, is a pretty significant opportunity."

Our Partners