Summary
One major announcement was Google’s quantum computing chip called Willow. The chip incorporates real-time error correcting for qubits that enable the overall computational error rate to drop as the number of qubits increases. Also, it performed one of quantum computing’s toughest benchmarks in under 5 minutes, a benchmark that would take today’s most powerful supercomputer 10 septillion years to execute.
On ecology, a research report calculates that 56% of the 3’318 power plants that feed US data centers use fossil fuels, and the CO2 emissions of these data centers in 2024 was 105.59 million metric tons (MT), corresponding to 2.18% of total US emissions. On the question of AI investments, a survey suggests that 40% of company executives believe that unprecedented investment in data management will be required in 2025, primarily due to the large number of data silos in organizations. This figure exceeds the number of executives who believe that investment in data infrastructure, compute power or talent needs to be prioritized for AI success.
On the technical side, Tokyo startup Sakana AI has developed a technique called universal transformer memory that significantly reduces the memory footprint of language models. It achieves this by training a model to throw away non-useful information in the context window, e.g., comments and whitespaces from code samples, duplicate frames from videos, grammatical redundancies in natural language text.
On the application side, an MIT Technology Review journalist reviews the latest developments around Google’s Astra project – Google’s agent or universal assistant project. Another article describes the experience of having an avatar (deepfake) of oneself created using Synthesia’s latest technology, and at the implications of avatars becoming increasingly realistic. Elsewhere, an InfoWorld article surveys a number of LLM application frameworks available to developers. These allow systems that orchestrate individual AI services, and connect them to local data sources for RAG (retrieval-augmented generation) or to other standard services like email or Web search for agents.
In cybersecurity, researchers attribute the Salt Typhoon attack (the hack on several telecommunications companies) to Chinese hackers, though the Chinese government denies involvement. One US senator called it the “worst telecom hack in our nation’s history”, at a time of increased tension between the US and China around the race to lead AI and the future of TikTok in the US. Meanwhile, cybersecurity specialists discovered several vulnerabilities in the infotainment unit of Skoda’s Superb III sedan that gives attackers unrestricted code execution on the unit by connecting via Bluetooth without authentication. They did not find any vulnerabilities around the car’s critical zones like steering, braking and acceleration.
Table of Contents
1. Google’s new Project Astra could be generative AI’s killer app
2. An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary
3. Why did China hack the world’s phone networks?
4. Surveying the LLM application framework landscape
5. Meet Willow, our state-of-the-art quantum chip
6. Researchers find security flaws in Skoda cars that may let hackers remotely track them
7. 2024 Data Complexity Report Reveals AI’s Make or Break Year Ahead
8. Environmental Burden of United States Data Centers in the Artificial Intelligence Era
9. New LLM optimization technique slashes memory costs up to 75%
1. Google’s new Project Astra could be generative AI’s killer app
This article enumerates several new AI products from Google DeepMind. These include Gemini 2.0 (the latest version of the company’s multimodal large language model), a new coding assistant called Jules, a video generation model called Veo, and a new version of the image generation model (Imagen 3). Gemini 2.0 is purported to be twice as fast as Gemini 1.5, and performs better on several benchmarks, notably MMLU-Pro – a large set of multiple-choice questions from several subjects, including math, physics, health, psychology, and philosophy. For the author, the performance of Gemini 2.0 is comparable to the other large models from Anthropic and OpenAI.
One of the principal use cases of Gemini 2.0 is Project Astra – Google DeepMind’s agent framework, though the Google term for agent is “universal assistant”. An MIT Technology Review journalist tried out the latest version of Astra and was generally impressed. The agent can show glitches in its understanding, though these are often corrected through oral commands. An agent can take orders in text, image, speech and video formats and interact with Google Apps like Lens, Search and Maps depending on the task. One of the default agents is Mariner which is designed to search the Web on behalf of its user. The journalist was impressed by the ability of Astra agents to remember events – thereby relying on a long context window. This increases the range of support that the agent can offer, e.g., spontaneously reminding the user of a door code when arriving at a door, having been asked for that code on a previous occasion. One concern of this power relates to privacy, because the agent may be remembering personal data taken in the context of interactions without the explicit consent of the person to whom the data belongs.
2. An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary
In this article, a journalist describes her experience having a deepfake of herself created by the company Synthesia using their latest technology. The deepfake is a full-body avatar of a person which can move around in a virtual space. The process starts with the help of an artist who ensures that the person’s clothes look good on camera and that make-up is appropriate. The person then goes through a data collection phase where facial features and mannerisms are captured. This is done by having the person recite phases in various tonalities. For instance, the English sentence “All the boys ate a fish” is recited several times since speaking the sentence mobilizes all of the facial muscles. Reciting phrases with different emotions is also important since building a convincing avatar goes beyond mouth movements, but needs to include gestures like eyebrow movements and should shrugs. The human brain rejects avatars after a moment because it detects features that are unreal or unnatural – the phenomenon known as the uncanny valley – but techniques developed at Synthesia are making significant progress. Commenting on the reality of the avatar created, a person close to the journalist said that the avatar looked fairly real, but “the voice sometimes sounds exactly like you, and at other times like a generic American and with a weird tone”.
The article also discusses some ethical issues around deepfakes. Interestingly, Synthesia uses the term synthetic media rather than deepfake because of the negative connotations associated with deepfakes (e.g., many deepfakes have been created from images taken from social media without consent to depict sexual content, and deepfakes have also been created for political campaigns to spread disinformation). The worry about deepfakes in the political context is that people are no longer able or willing to distinguish between unreal and real information – a phenomenon known as the liars’ dividend. Synthesia refuses to create avatars of anyone without their explicit permission, and content creation demands concerning cryptocurrencies or sexual health are verified by human moderators. Watermarks are also included in generated content so that humans know that they are interacting with avatars. Synthesia says that its technology is used by 56% of Fortune 100 companies for internal communication purposes.
3. Why did China hack the world’s phone networks?
In an attack named Salt Typhoon, several telecommunications companies around the world have been hacked. Independent researchers and US Intelligence attribute the attack to Chinese hackers, though the Chinese government denies involvement. The latest breaches in the US gave hackers unprecedented access to message contents as well as logs of who had been calling who. It is reported that even the US’s wiretapping program was breached and that hackers may have succeeded in hacking the phones of Donald Trump and Kamela Harris. US Intelligence estimates that Salt Typhoon has been active for nearly two years and a US senator called it the “worst telecom hack in our nation’s history”. US government employees are being urged to use encrypted messaging apps such as Signal, WhatsApp, and FaceTime.
The attack is active at a time when relations between China and the US are strained. On the one hand, there is the race to lead AI. This requires powerful chips, and the US controlled Nvidia is currently the world leader. China has brought an antitrust case against Nvidia and has banned the export of minerals like gallium and germanium, needed for chip manufacture, to the US. Another touchy subject is the future of TikTok in the US where there are 170 million users. US lawmakers are proposing a bill to ban TikTok – or force its sale – primarily because of the fear that the App is used to steal sensitive data and spread Chinese government propaganda. A court of appeals has upheld the ban. TikTok claim that there is no evidence of Chinese government interference via the platform and that the courts should support free speech. The judges said that “Preventing covert content manipulation by an adversary nation also serves a compelling governmental interest”. The success of the Salt Typhoon attacks could work against TikTok.
4. Surveying the LLM application framework landscape
The end of the year sees a number of LLM application frameworks available to developers. These allow systems that orchestrate individual AI services, and connect them to local data sources for RAG (retrieval-augmented generation) or other standard services like email or Web search. Well-known frameworks include LangChain, LlamaIndex, Semantic Kernel from Microsoft (an open-source SDK for AI orchestration), and the open-source Haystack framework.
Agents are one of the main use cases for LLM application frameworks – though Microsoft calls agents copilots and Google calls them universal assistants. Agents can take autonomous actions like processing email and purchasing items over the Web. Chatbots are another use case. A chatbot is designed to mimic human conversation, and use LLM and RAG to keep track of the conversation’s history. RAG can be implemented in different ways. One way is to have the an orchestrated framework retrieve data from a database and add that data to the prompt of requests sent to a language model. Another approach is to use an embedding model for local data such as Word2vec for text documents or DeViSE (mixed text and media) and have this data stored in a vector database (e.g., Qdrant, Elasticsearch). This approach allows for better RAG results.
5. Meet Willow, our state-of-the-art quantum chip
Hartmut Neven, founder and lead at Google Quantum AI has announced Willow, a new quantum computing chip which Google says marks two landmark achievements in the field of quantum computing. The first achievement relates to error corrections. The basic unit of information in quantum computing is the qubit, but the physical qubit can lose information quickly. This means that several physical qubits are needed to represent logical qubits with an error correcting algorithm. Historically, increasing the number of qubits increased the number of errors that get introduced into computations. However, Willow uses the a real-time quantum error detection technique that cuts error rates. The technique even allows error rates to be reduced as the number of qubits increaseds. For instance, scaling from grids of 3x3 qubits, to 5x5, to 7x7 cuts error rates on each iteration. For Google, the processor is the first compelling example of beyond break-even – where arrays of qubits have longer lifetimes than individual physical qubits.
A second major achievement for Willow is performance. It performed one of quantum computing’s toughest benchmark – random circuit sampling (RCS) – in under 5 minutes. The world’s most powerful supercomputer, Frontier, would need 1025 or 10 septillion years to execute this benchmark. This number largely exceeds the age of the universe (a mere 13.7 billion years). With 105 qubits, the next challenge for Willow is to demonstrate a first "useful, beyond-classical" benchmark problem which is relevant to a real-world application such as finding efficient electrical batteries and accelerating research in fusion and alternative energies.
6. Researchers find security flaws in Skoda cars that may let hackers remotely track them
Cybersecurity specialists have discovered several vulnerabilities in the infotainment unit of Skoda’s Superb III sedan that gives attackers unrestricted code execution on the unit by connecting via Bluetooth without authentication. Malware running on the device would allow attacks such as getting real-time access to the vehicle’s GPS coordinates and speed data, record conversations using the car’s microphone, take screenshots of the infotainment display, and play arbitrary sounds in the car. Also, an attacker can steal information from smartphones that have synchronized with the infotainment unit, since the unit does not store records in an encrypted format. The article estimates that there are potentially more than 1.4 million vulnerable vehicles currently in circulation. However, the number might even be higher as the with the second hand sale of infotainment units on platforms like eBay. The security researchers stress that they did not find any vulnerabilities around the car’s critical zones like steering, braking and acceleration. Skoda, which is owned by Volkswagen, is working to fix the issues.
7. 2024 Data Complexity Report Reveals AI’s Make or Break Year Ahead
This report from NetApp presents the results of a survey of over 1300 senior company executives worldwide on how they expect their organizations to spend on AI in 2025. 40% of executives believe that unprecedented investment in data management will be required for their companies in 2025. This figure exceeds the number of executives who believe that investment in data infrastructure, compute power or talent needs to be prioritized for AI success. The main challenge related to data are data silos – data repositories controlled by one company department or computing application, which are isolated from the remainder of the organizational data. 79% of executives believe removing silos is very important for achieving AI success. 41% of executives predict that cybersecurity threats will significantly increase due to AI and 50% believe that AI will have a big impact on their companies’ carbon footprints. Nonetheless, there was a 12% decline this year in the number of executives that stated carbon footprint reduction was a top priority.
8. Environmental Burden of United States Data Centers in the Artificial Intelligence Era
This research article reports on the level of CO2 emissions from US data centers and on their carbon-intensity (which depends on the nature of fuel used by the power stations providing electricity to the data centers). Data centers are energy-hungry because of the large compute power required, and also due to cooling systems which are needed to maintain the stability and performance of processors. The International Energy Agency (IEA) estimates that data centers worldwide consumed between 240 and 340 terawatt hours (Twh) in 2022, which is around 1% to 1.3% of the global electricity consumption. Data center consumption is expected to reach 480-680 TWh worldwide by 2026 – which exceeds the total energy consumption of Canada. There are 7,945 data centers worldwide, of which 2’990 are in the US.
Energy consumption in the US is expected to rise to 260 TWh in 2026, which accounts for between 4% and 6% of total US consumption. The researchers identified 3’318 power plants that feed US data centers. 56% of these power plants use fossil fuels, with 16% using coal, and the CO2 emissions of the connected data centers was 105.59 million metric tons (MT), which corresponds to 2.18% of total US emissions. The cleanliness of energy is measured by carbon-intensity. The carbon-intensity of the data centers was 548 grams of CO2 per kilowatt hour (kWh), which is 48% higher than the national average. This comes from data centers being located near to high carbon-intensive power plants. The researchers note that research and development around AI is forcing companies to increase carbon emissions. It cites the case of Google whose 2023 emissions were up 13% on the previous year, and Microsoft whose emissions have increased by 29% since 2020.
9. New LLM optimization technique slashes memory costs up to 75%
This article reports on a technique developed by Tokyo startup Sakana AI, called universal transformer memory, that enables language models to significantly reduce their memory footprint when running. The context window of a language model is its working memory. It contains the prompt and all documents that the prompt engineer believes will improve the quality a model’s response to a query. The tendency has been to extend the size of the context window to improve response quality, but this hurts the performance of the model. With the universal transformer memory technique, a neural attention memory model (NAMM) is trained to throw away non-useful information in the context window, e.g., comments and whitespaces from code samples, duplicate frames from videos, grammatical redundancies in natural language text. Use of NAMMs requires access to the language model’s internals, so it can only be used with open-source language models. Researchers tested the approach on the Meta Llama 3-8B model and found that the NAMM allowed the language model to save up to 75% of its memory consumption.