Summary
Audio Summmary
MIT Technology Review has published an in-depth analysis of the huge energy costs of US data centers. AI is the principal reason for big rises in energy usage: consumption was flat between 2005 and 2017, despite the advent of social media and streaming services. Big Tech have expressed interest in nuclear power, but the US grid still uses a large amount of fossil fuels, so AI is leading to significant increase in carbon emissions. For instance, the X supercomputing center near Memphis is using methane gas generators in violation of the Clean Air Act.
Questions continue to be asked about the validity of benchmarks used for AI models. Big Tech is accused of “conducting undisclosed private testing and releasing their scores selectively” so that their models perform well on popular benchmarks. Another question is whether benchmarks can really measure general intelligence. SWE-Bench for instance, the popular coding benchmark, does not measure downstream tasks like how well code is documented or a model’s ability to resolve flagged issues. These tasks nonetheless measure a software developer’s ability. Elsewhere, research from Google on retrieval-augmented generation (RAG) investigates the effectiveness of RAG based on the quality of data that RAG passes in a model query. The researchers define the notion of sufficient context to denote when the RAG provides enough information for the model to be able to give a correct answer. A language model classifier was designed to identify RAG data output as sufficient or not. When the classifier identifies sufficient context, the number of correct model answers increases by 2% to 10% for Gemini, GPT, and Gemma.
An InfoWorld article reviews LiteLLM – a popular Python orchestration framework to simplify the development of applications that use multiple large language models. LiteLLM provides a uniform API for accessing each of the 100 supported models, and has a proxy server that implements access control, budget limits, and real-time monitoring of API calls. Meanwhile, there were several AI announcements at the recent Google I/O 2025 conference. These include new versions of the AI image and video creation tools (Imagen 4 and Veo 3), a rollout of Project Mariner (the AI agent framework), Stitch (an AI tool to create Web and app UIs), and the continued integration of AI into existing Google products. Elsewhere, TechCrunch presents an introduction to Mistral AI – the French AI startup behind the Le Chat AI assistant. Founded in 2023, the startup has raised around 1 billion EUR in funding and is currently valued at 6 billion USD.
On the geopolitical front, the US has signed an agreement with the United Arabs Emirates (UAE) that allows the Gulf country to import 500’000 high-grade Nvidia GPUs each year for a huge AI data center in Abu Dhabi. The deal is a significant departure from President Biden’s policy which restricted export of chips for national security reasons. Nonetheless, the US is insisting that, for security reasons, chips are managed by US companies in the UAE data center.
In cybersecurity, a consortium of European and North American police has dismantled a cybercriminal network composed of 20 individuals, mostly living in Russia, who were behind the Qakbot, Danabot and Trickbot malware operations. Danabot caused malware intrusions on 300’000 computers, mainly in the US and India. One of the individuals mentioned in relation to Trickbot is Russian national Vitalii Nikolayevich Kovalev who is also alleged to be behind the Conti ransomware blackmail group and whose crypto-wallet is said to have 1 billion EUR.
Table of Contents
1. How to build a better AI benchmark
2. Trump agrees deal for UAE to build largest AI campus outside US
3. LiteLLM: An open-source gateway for unified LLM access
4. Google I/O 2025: Everything announced at this year’s developer conference
5. We did the math on AI’s energy footprint. Here’s the story you haven’t heard.
6. What is Mistral AI? Everything to know about the OpenAI competitor
7. Russian-led cybercrime network dismantled in global operation
8. Sufficient Context: A New Lens on Retrieval-Augment Generation Systems
1. How to build a better AI benchmark
This article poses several questions on the validity of benchmarks that Big Tech is using to evaluate their AI models, including OpenAI, Anthropic, and Google. The first question is whether models are being specifically designed to work well on benchmarks. For instance, SWE-Bench has become the primary benchmark used for evaluating coding skills. Launched in November 2024, it is composed of 2’000 programming challenges from 12 public GitHub repositories of Python projects. Some models that perform well on this benchmark have been shown to perform relatively poorly on problems that use other programming languages. For all leading benchmarks, Big Tech is accused of “conducting undisclosed private testing and releasing their scores selectively”. A second question asked of leading benchmarks is whether their tasks can really measure general intelligence. SWE-Bench for instance does not measure downstream tasks like how well the code has been documented. Generating code is just one of the tasks software developers engage in, so evaluating a model should require a benchmark for each task.
The article suggests using an approach from the social sciences when measuring concepts like ideology, democracy, and media bias. Social scientists establish rigorous definitions of each concept and then create a list of questions related to the concepts. When applied to AI benchmarks, core benchmark concepts like “ability to resolve flagged issues in software”, “reasoning” and “math proficiency” would be defined and then translated to a set of questions that can evaluate models.
2. Trump agrees deal for UAE to build largest AI campus outside US
The US has signed an agreement with the United Arabs Emirates (UAE) that allows the Gulf country to build a huge AI data center in Abu Dhabi using high-grade Nvidia GPUs. The data center is expected to operate over an area of 26 square kilometers and consume 5 Gigawatts of power. The deal allows the UAE to import 500’000 AI chips each year. The deal is a significant departure from President Joe Biden’s administration policy which restricted export of chips over fears in Washington that the Chinese could gain access to the technology. The US believes that there is a significant amount of AI chip smuggling into China. In the new deal, the US is insisting that, for security reasons, chips are managed by US companies in the UAE data center. Nvidia’s CEO, Jensen Huang who was present in the UAE, and OpenAI’s Sam Altman have openly supported the new deal.
3. LiteLLM: An open-source gateway for unified LLM access
This article introduces LiteLLM – a Python-implemented orchestration framework that simplifies the development of applications that use multiple large language models. LiteLLM provides a uniform API for accessing each model, thus hiding differences in naming conventions and data formats in the different models. It has a proxy server through which calls are made to the different large language models. In this way, the service implements cost tracking through the setting of budgets for each model, access control, and real-time monitoring of API calls. Developers can specify fallback options when a model is unavailable, and the Python Pydantic library can be used to validate data. LiteLLM can be used with over 100 large models, including models from Anthropic, AWS Bedrock, AWS SageMaker, Azure OpenAI, DeepSeek, Google Vertex AI, OpenAI, and Ollama. The project’s GitHub repository has over 23’000 stars and 2’700 forks. Organizations using LiteLLM include Netflix, Lemonade, and Rocket Money.
4. Google I/O 2025: Everything announced at this year’s developer conference
This TechCrunch article summarizes the key announcements made at the Google I/O 2025 conference that took place recently. The following are the main points relating to AI.
- Deep Think in Gemini 2.5 Pro. This integrates a model reasoning mode (where the model considers several possible answers to a query, before choosing one) into Gemini 2.5 Pro model. The service is currently only available to “trusted testers”.
- The latest AI video generation model, Veo 3, is available. It allows sound effects and dialogue to be added to the video. Imagen 4, its latest AI image creation tool, is available and allows “fine details” to be included in images (e.g., water droplets, fabrics, …). Both Veo 3 and Imagen 4 are used by Flow – an AI tool for filmmaking.
- Stitch is an AI-tool to design Web and mobile app front-ends, and generate the HTML and CSS code for the designs.
- Project Mariner is Google’s AI agent framework that is now rolling out to users. Its agents can concurrently execute several tasks like “purchase tickets to a baseball game or buy groceries online”.
- Project Astra is Google’s multimodal AI experience. One of the associated projects is the building of glasses in partnership with Samsung and Warby Parker. Another project is enhanced interaction with Google Search.
- Gemini is being integrated into Chrome, and AI is coming to the workspace applications (Gmail, Google Docs, Google Vids) allowing features like smart inbox cleaning, and new ways to create and edit content. The AI model Gemma 3n was announced, designed to run on phones and tablets.
5. We did the math on AI’s energy footprint. Here’s the story you haven’t heard.
MIT Technology Review has published an analysis of the huge energy costs of AI in the US. Data centers in the country consumed about 200 terawatt-hours of electricity in 2024, which is 4.4% of total US energy consumption, and this figure is expected to reach 12% by 2028. AI is the principal reason for this rise in energy usage: data center consumption was flat between 2005 and 2017, despite the advent of social media and streaming services. The rise in energy needs is pushing Big Tech to invest in power. For instance, Meta and Microsoft want to use new nuclear power plants, Apple is spending 500 billion USD on manufacturing and data centers while Google will spend 75 billion USD. President Trump’s Stargate program will see the construction of 10 data centers that each consume 5 gigawatts. Despite an expressed interest in nuclear and solar power, the US grid still uses a large amount of fossil fuels. The X supercomputing center near Memphis is currently using methane gas generators (in violation of the Clean Air Act).
One challenge with AI consumption is the large cost of training a model – a cost that model providers need to compensate since customers only pay for inference. It is estimated that OpenAI spent 100 million USD and 50 gigawatt-hours to train GPT-4. Nonetheless, model usage is now estimated to account for 80% to 90% of model energy usage. The energy consumed depends on the type of AI request.
- Text. For simple queries, Llama 3.1 8B (an 8 billion parameter model) uses around 57 joules per response, or 114 joules accounting for cooling. This is equivalent to running a microwave for one-tenth of a second. Llama 3.1 405B has 50 times more parameters, thereby consuming more energy. The model consumes an average of 6’706 joules per request, which is like running a microwave for eight seconds.
- Images. Image diffusion models have fewer parameters than most big text models. Creating a 1024 by 1024 pixel image with Stable Diffusion costs 2’282 joules – the same as running a microwave for five and a half seconds. That said, OpenAI says its service creates 78 million images per day.
- Video. Creating a 5 second video with OpenAI’s Sora costs around 3.4 million joules, which is equivalent to running a microwave for over an hour.
Energy usage for AI will increase with features like immersive AI and agentic AI. Also, the cost of energy usage will ultimately fall on the consumer. A parliamentary report in Virginia published in 2024 found that residents could pay an extra 37.50 USD per month for energy to account for increased energy usage by data centers.
6. What is Mistral AI? Everything to know about the OpenAI competitor
This regularly updated TechCrunch article presents Mistral AI – the French AI startup behind the Le Chat AI assistant. Founded in 2023, the startup has raised around 1 billion EUR in funding and is currently valued at 6 billion USD. Mistral AI has released several models, including Mistral Large 2 (its main large language model), Pixtral (a family of multimodal models), Devstral (an AI model for coding that is available with the open-source Apache 2.0 license), Les Ministraux (a family of models for edge and devices), and Mistral Saba (a model focused on Arabic). The Le Chat AI assistant got 1 million downloads it the two weeks after its release.
Mistral AI’s revenue model is based on tiered access to Le Chat, with Le Chat Pro costing 14.99 USD per month, as well as on enterprise licenses for its larger models. The company signed a deal with Microsoft in 2024 to distribute its models via Microsoft’s Azure platform and received 15 million EUR in investment. In January, the company signed a content deal with Agence France-Presse (AFP) giving it access to AFP content back to 1983, and in a joint venture with the UAE investment firm MGX and with Nvida, the company is participating in the building of an AI campus outside of Paris. The company owners say that Mistral AI is not for sale but an IPO plan is being prepared.
7. Russian-led cybercrime network dismantled in global operation
A consortium of British, Canadian, Danish, Dutch, French, German and US police working together have dismantled a cybercriminal network composed of 20 individuals, mostly living in Russia. The individuals are believed to lead the Qakbot, Danabot and Trickbot malware operations. The police said that Danabot operations led to malware intrusions on 300’000 computers, mainly in the US and India. Police said the group also had an “espionage variant used to target military, diplomatic, government and non-governmental organizations”. One of the individuals mentioned in relation to Trickbot is Russian national Vitalii Nikolayevich Kovalev who is also alleged to be behind the Conti ransomware blackmail group. German police authorities have described him as the “most successful blackmailers in the history of cybercrime”, and his crypto-wallet is said to have 1 billion EUR. Conti was active in many ransomware attacks on US hospitals, with a peak during the Covid-19 pandemic.
8. Sufficient Context: A New Lens on Retrieval-Augment Generation Systems
Retrieval-augment generation (RAG) is a technique used with large language models that permits information to be added to a model query so the model can cater for information that post-dates its training data, and reduce hallucinations. In this study from Google, Duke University and UC San Diego, researchers investigate the effectiveness of RAG based on the quality of data passed to a model query using RAG. They define the notion of sufficient context as the RAG providing enough information for the model to be able to give a correct answer. A language model based auto-rater was designed to classify RAG data as sufficient or not. When the classifier identifies sufficient context, the number of correct answers increases by 2% to 10% for Gemini, GPT, and Gemma. The researchers also found that common language models give correct responses 35% to 62% of the time even when the context is insufficient, showing that models rely heavily on their trained knowledge. Finally, the research shows that when the context is insufficient, models tend to hallucinate more than abstain from answering the query. Abstention is the desirable outcome when context is insufficient, but it is as if the model uses the context to compensate for its lack of trained knowledge. One approach suggested is to fine tune models to encourage abstention when prompt contexts are insufficient.