Musk accpets AI data limit reached

Elon Musk shares the sentiment of a lot of AI experts that the pool of real-world data available to train AI models is now nearly exhausted. During a live stream on Wednesday in a discussion with Stagwell chairman Mark Penn on X Musk remarked, “We’ve essentially tapped out the totality of human knowledge for AI training. This milestone was reached roughly last year.”

As the head of the AI company xAI, Musk echoed the points that were made by former OpenAI chief scientist Ilya Sutskever at the NeurIPS machine learning conference last December. 

Sutskever noted that the AI Sector has hit “peak data”, forecasting that the scarcity of training data will require a transformation in how models are currently developed. 

Musk proposed that the future now lies in synthetic data, the data which is generated by AI systems themselves. He explained, “The only way to enhance [real-world data] is through synthetic data, where the AI produces [training data]. With synthetic data, the AI will essentially evaluate itself and engage in a process of self-improvement.”

Major companies including tech leaders such as Meta, Microsoft, Anthropic, and OpenAI, are already using synthetic data to train their premier AI models. Gartner also predicts that by 2024, 60% of the data used for AI and analytics will be synthetically generated. 

Microsoft’s Phi-4, which was made open source on Wednesday was also trained using both synthetic and real-world data, additionally, the same approach was followed by Google’s Gemma models. 

Anthropic included data in the development of its highly effective synthetic data, meanwhile, Meta refined its latest Llama series of models with AI-generated data. 

Training with synthetic data also offers financial benefits. AI startup Writer claims that its Palmyra X 004 model has been developed mostly using synthetic sources. It cost them only $700,000 to create which is significantly less than the estimated $4.6 million for a similarly sized OpenAI model.

Related Posts
×