AI Market Sees Modal Fragmentation

Nvidia Bullish but Under Pressure

Posted on March 22nd, 2025

Summary

Audio Summmary

Google has released two AI models for robotics based on Gemini 2.0. One is a vision-language-action (VLA) model, which means that model inputs are images and model outputs are actions that a robot can take. A second model focuses on spatial reasoning – which includes detecting and pointing at objects. When spatial reasoning is combined with code generation capabilities, the model is able to develop new capabilities on the fly, like understand how to hold a coffee mug. An MIT Technology Review article reviews the approach taken by Canadian company Waabi to developing driverless trucks. The idea is to create a digital twin of each real truck that receives the same sensor data such as inputs. The simulation can then test out emergency situations like animals crossing the road or extreme weather. Many experts believe that this type of simulation is the only viable manner to validate the safety of autonomous vehicles.

Nvidia CEO Jensen Huang gave a keynote address at the recent GTC 2025 conference and suggested that emerging reasoning models still require high-performance GPUs for runtime inference. This comes after the success of DeepSeek’s R1 model questioned the hypothesis that high grade GPUs are needed to train large models. Hugging Face is calling on the US government to prioritize open-source in the forthcoming AI Action Plan. They argue that the new action plan must strengthen existing open-source initiatives, encourage the participation of organizations from all sectors in the development of new models, and leverage practices from cybersecurity to ensure interoperable standards for safe and secure AI systems. Microsoft is creating a new research project whose goal is to identify the influence that any particular training data has on content generated by models. This comes at a time of lawsuits against AI companies for having used copyrighted material to train AI models, and Big Tech companies calling on the new Trump administration to weaken copyright protections to allow for AI model training. Research from Poe on the evolution of the AI market in early 2025 suggests that a modal fragmentation now exists, where users are preferring different models providers depending on the modality (voice, video, text, …) they seek to create.

Another article presents an initial review of the AI agent Manus, a much talked about Chinese chatbot. The agent uses several AI models, including Anthropic's Claude 3.5 Sonnet and fine-tuned versions of Alibaba's open-source Qwen. The Italian newspaper, Il Foglio, published a whole edition that was completely generated using AI. The role of journalists was limited to entering prompts into an AI tool. Elsewhere, a report from the San Francisco based company Harness reports that AI tools are leading to an increased volume of code being shipped, but the “blast radius” of this code is also increasing, with 67% of developers spending more time debugging AI generated code and 68% resolving extra security vulnerabilities.

In cybersecurity, a report from the Google Threat Intelligence Group summarizes how state-sponsored APT actors have been making use of Google Gemini to launch cybersecurity attacks. The report says there is no significant AI-inspired cyberattack or influence operation method, but that bad actors are using AI to help in all phases of attacks. APT actors were detected from more than 20 countries, with Iran, China, North Korea and Russia being the most frequent users.

1. Everyone in AI is talking about Manus. We put it to the test.

This MIT Technology Review article presents an initial review of the AI agent Manus, released by the Chinese company Butterfly Effect. The agent uses several AI models, including Anthropic's Claude 3.5 Sonnet and some fine-tuned versions of Alibaba's open-source Qwen. This distinguishes Manus from existing AI chatbots. According to the author, Manus seems good at breaking down goals into smaller tasks and of autonomously searching the Web for information it requires for each task. The journalist evaluated Manus by giving the agent several tasks. One task was to propose an apartment for rent in New York from the available listings, which the agent completed satisfactorily within 30 minutes. The agent was then asked to compile a list of candidates for the MIT Technology Review’s annual Innovators Under 35 awards. Here, the agent exhibited a bias towards academic institutions and “Chinese media darlings”. The journalist concludes that the agent is best suited to analytic reasoning and searches on the open Internet (without the need to extract information from behind paywalls), and concludes that working with Manus “feels like collaborating with a highly intelligent and efficient intern”.

2. Waabi says its virtual robotrucks are realistic enough to prove the real ones are safe

This article reviews the approach taken by Canadian company Waabi to developing driverless trucks. Waabi is collaborating with Uber Freight and Volvo. The idea is to create a digital twin of each of their real trucks. A twin is given the same sensor data such as inputs from radar and lidar (Light Detecting and Ranging for detecting objects and their distances). The digital twin simulates the truck’s behavior, from the effects of gear changes to precise time reactions when breaking and accelerating. Waabi’s “sim-first” approach differs from other autonomous vehicle companies where the approach is to maximize the number of kilometers spent on the road. However, the latter approach is inefficient for trucks where most of the distances travelled are in straight lines over interstate highways with few dangerous situations. Waabi’s digital twin can test the behavior of the truck by simulating dangerous situations caused by other vehicles, extreme weather or interference from other objects (e.g., items falling from overhead bridges, animals crossing, etc.). Waabi claims that its simulator is 99.7% accurate, which is thought to be a very good result, especially since many experts believe that this type of simulation is the only viable manner to validate the safety of autonomous vehicles.

3. Gemini Robotics brings AI into the physical world

Google has released two new AI models for robotics based on Gemini 2.0: Gemini Robotics and Gemini Robotics-ER. Gemini Robotics is a vision-language-action (VLA) model, which means that model inputs are images and model outputs are actions that a robot can take. Three key properties of robot models are identified. First, generality is the ability of a robot model to execute a task never seen before, and Google claims that Gemini Robotics doubles performance on a generalization benchmark. Second, interactivity is the ability of a model to interact with people and objects in an environment, adapting to changes (for example if an object slips from the robot’s hand, it can replan and carry on its task). This requires a model being able to respond to a wide range of natural language instructions. Third, dexterity is the ability of a robot to perform complex precise tasks – like making a paper airplane. The Gemini Robotics-ER focuses on spatial reasoning – which includes detecting and pointing at objects. When spatial reasoning is combined with Gemini’s code generation capabilities, the model is able to develop new capabilities on the fly – like understand how to hold a coffee mug. Finally, the researchers have developed a framework where robot safety rules are expressed in natural language. The goal is a robot development framework for “robots that are safer and more aligned with human values”. Google is partnering with the robot provider Apptronik which is developing the Apollo humanoid robot using the models.

4. Major AI market share shift revealed: DALL-E plummets 80% as Black Forest Labs dominates 2025 data

This VentureBeat article reports on research from Poe – a platform hosting over 100 AI models – on the evolution of the AI market in early 2025. The results show a modal fragmentation – meaning that users are preferring different models depending on the modality (voice, video, text, …) to be created. Further, new models quickly “cannibalize” older models (e.g., users leaving GPT-4 for GPT-4o, or Claude-3 for Claude 3.5, etc.). These developments suggest that it will be complicated for organizations to maintain a stable AI ecosystem. An example of modal fragmentation is Google’s ecosystem, where the Gemini text model family has been in decline since late last year, while the Imagen3 family has 30% market share and the Veo-2 video generation model has 40% market share. Runway is leading the video generation market with up to 50% market share. Black Forest Labs’s Flux family is now the leader in image generation with nearly 40% market share. For text generation, OpenAI and Anthropic share 85% of the market with their GPT and Claude model families. However, DeepSeek’s R1 went from zero to 7% very quickly, which illustrates how tenuous market hold is for AI models.

5. The State of Software Delivery 2025

This report from Harness, a San Francisco based company developing an AI-based software delivery platform, surveys current developer attitudes towards AI. The company interviewed 500 software development experts. The report finds that 80% of developers are working more than 40 hours per week, and half mention worries about unhealthy work-life balance, increased stress and risk of burnout. 98% of developers believe that AI tools can help with this, while at the same time, 90% of developers believe that AI tools will eventually replace programmers. Another finding is that AI tools are leading to an increased volume of code being shipped, but the “blast radius” of this code is also increasing. 67% of developers are spending more time debugging AI generated code and 68% resolving security vulnerabilities. 59% of developers have experienced problems with deployments of AI generated code. The issues encountered include outdated dependencies, insecure coding patterns, and also the time it takes to understand the code. So while AI accelerates code production, it increases the time spent on other aspects of the software development lifecycle. Finally, the report also finds that AI is creating a new form of shadow IT: 52% of developers are using AI tools not provided by the IT department, and 60% of organizations have not evaluated the effectiveness of the AI tools being used.

6. Italian newspaper says it has published world’s first AI-generated edition

The Italian newspaper, Il Foglio, published a whole edition that was completely generated using AI tools. The role of journalists was limited to entering prompts into an AI tool. All headlines, articles and even letters from readers were generated using AI. The front-page articles included one on Donald Trump and how admirers of Trump in Italy turn a blind eye when “their idol in the US behaves like the despot of a banana republic”. The article has a good deal or irony. Another article is extremely critical of Russian president Vladimir Putin. No article directly quotes human beings. The editor of Il Foglio created the AI edition as part of an experiment to create awareness around the impact of AI on journalism. He is quoted as saying “It is just another [Il] Foglio made with intelligence, don’t call it artificial”.

7. Adversarial Misuse of Generative AI

This report from the Google Threat Intelligence Group summarizes how state-sponsored APT actors have been making use of Google Gemini to launch cybersecurity attacks. Much of the research into the potential misuse of generative AI by bad actors is largely theoretical, so the report aims to complement this with observations from the field. The authors conclude that there is no evidence yet of any significant impact by AI on cyberattacks. Rather, AI is being used to assist in existing types of cyberattacks and information influence operations. APT actors are using AI to assist in all phases of a cyberattack: reconnaissance (e.g., research on potential victim companies across multiple sectors and countries), weaponization (e.g., rewrite publicly available malware into different programming languages), delivery (e.g., generating phishing content for targeting a US defense organization), exploitation (e.g., reverse engineer endpoint detection and response software), installation (e.g., explaining how to add a self-signed certificate to Active Directory), and command and control (C2) (e.g., looking up Active Directory management commands). Use of generative AI for influence operations included the creation of social media personas, translation, and comment brigading (where a large number of comments are generated for a social media post to make it appear more popular than it actually is).

Google observed APT actors from more than 20 countries, with Iran, China, North Korea and Russia being the most frequent users. Iran’s use focused on defense organizations (e.g., jamming F-35 fighter jets, anti-drone systems, and Israel's missile defense systems), dissidents, and the Israel-Hamas conflict. China’s influence operations were strongly related to the issue of Taiwan. North Korea is using generative AI to create fake résumés and place clandestine IT workers at Western companies. Google believes that several hundred full-time and freelance jobs are held by such people, who send salary and proprietary information back to North Korea. Finally, Russian APT groups seem to minimize their use of Western language models for fear of Western surveillance, but Google observes a significant market for jail-broken models on Russian markets.

8. GTC felt more bullish than ever, but Nvidia’s challenges are piling up

This TechCrunch article reports on the keynote address by Nvidia CEO Jensen Huang at the recent GTC 2025 conference in San José, where 25’000 people attended. Nvidia seemed to be in an unassailable position at the turn of the year with high profit margins and a dominant market position. Since then, DeepSeek released R1 – a high performing reasoning model trained with low-grade quality GPUs. This questioned the hypothesis that high grade GPUs are needed to train large models. However, Huang suggested that reasoning models still require high-performance GPUs for runtime inference, and presented the next generation Nvidia Vera Rubin GPUs which he claims perform inference at twice the rate of Nvidia’s best Blackwell chip. The company also launched DGX Spark and DGX Station to allow users to develop and fine-tune their own AI models in edge-computing contexts.

Another challenge facing Nvida comes from the trade tariffs imposed by the new Trump administration. Nividia’s chips come mostly from Taiwan, and the company must invest heavily to relocate manufacturing to the US. Also, Big Tech is looking to reduce its dependence on Nvidia by developing their own chips (AWS’ Graviton, Google’s TPUs, Microsoft’s Cobalt 100) and even OpenAI and Meta have chip development programs. In a quest for new research and development avenues, Nvidia announced that it is opening a center for quantum computing in Boston where researchers can simulate quantum systems (on Nvidia hardware). The article notes that Nvidia’s share price fell 4% after the keynote address.

9. Hugging Face calls for open-source focus in the AI Action Plan

This article reviews the call by Hugging Face to the US Office of Science and Technology Policy (OSTP) to promote and prioritize open-source in the forthcoming US AI Action Plan. Hugging Face currently has seven million users and its platforms host 1.5 million open-source models. For Hugging Face, a new action plan must strengthen existing open-source initiatives, encourage the participation of organizations from all sectors in the development of new models, and leverage practices from cybersecurity to ensure interoperable standards for safe and secure AI systems. The organization also argues that current commercial models are built on years of open research, and from an economic standpoint, claims that open technical systems have a 2000x multiplier effect. It claims that a country would lose 2.2% of its GDP without recourse to open-source developments, and that open-source contributed between 65 and 95 billion EUR of the European Union GDP in 2018. In the case of AI models, improvements come from cost efficiency since developing models from scratch is costly, from being able to customize models to particular environments, from avoiding vendor lock-in, and from using models that often outperform similarly sized proprietary models. In the context of the forthcoming US AI Action Plan, Hugging Face is calling for fair protocols to allow developers access to existing datasets and the development of new datasets. Further, as the energy footprint of AI remains very high, (with data centers’ electricity consumption doubling from 2022 levels to 1,000 TWh by 2026, which is Japan’s entire electricity demand), they call for the development of frameworks that permit hardware and software optimizations when training and running models.

10. Microsoft is exploring a way to credit contributors to AI training data

Microsoft is creating a new research project whose goal is to identify the influence that a particular content in training data has on content generated by models trained with that training data. The project is to consider multi-model models. Microsoft is the subject of several lawsuits for using data to train their models without the explicit consent of copyright owners, such as a lawsuit from the New York Times newspaper and a lawsuit from software developers who claim that Github Copilot was trained on software without developer permission. One impact of being able to identify the contribution of individual training data content is the possibility of compensating authors. Language models are based on artificial neural network architectures where it is hard to identify individual data contributions. The article cites an AI company, Bria, that raised 40 million USD in venture capital funding to develop a similar type system. In any case, AI firms are still claiming that use of copyrighted material to train AI models should fall under the fair use doctrine of copyright law. Further, several top Tech firms, including both Google and OpenAI, are recommending to the new Trump administration that new regulation be enacted to weaken copyright protections to allow for AI model training.