Nuclear Energy for AI

Dissecting RAG and VC Investment

Posted on October 21st, 2024

Summary

Several articles this week consider the huge energy increase that data centers require for supporting AI. Google is purchasing six or seven small nuclear reactors (SMRs) in California which should be operational between 2030 and 2035, while Microsoft has struck a deal with the Three Mile Island nuclear power plant. An opinion article in MIT Technology Review mentions that in addition to developing power plants based on clean energies, there is also the challenge of building the infrastructure to connect these power stations to the grid.

Several articles look at the question of model deployment by organizations. A VentureBeat article gives an overview of the elements and costs that need to be considered when deploying a language model. A research paper from Microsoft looks at the challenge of integrating domain-specific and up-to-date knowledge in language models. Retrieval-augmented generation (RAG) and fine-tuning are different solutions, but the paper shows that there is no one size fits all solution for external data access. Meanwhile, AMD has announced the new generation of its EPYC CPU which Hugging Face has been testing.

On the adoption of AI, a presentation by Air Street Capital gives a very detailed review of developments over the last 12 months in the field of AI, and gives some predictions for 2025. These include that an app created by someone with no coding ability will become one of the most popular AppStore Apps, and that an open alternative to OpenAI o1 will emerge and even outperform it on some reasoning benchmarks. A TechCrunch article mentions that synthetic data generation is an emerging business that could be worth 2.34 billion USD in 2030. AI companies hope that synthetic data could be used to train models some day. This would avoid the IP issues of real data, and significantly lower the cost of training. Elsewhere, PitchBook, the financial services firm, reports that venture capitalists invested 3.9 billion USD in AI startups in Q3 of 2024. Meta has released the Open Materials 2024 (OMat24) data set as open source, by far the largest data set in the materials science domain.

Finally, TechCrunch reports that the number of user records stolen in data breaches this year has already exceeded 1 billion. Data for nearly all of the 110 million AT&T customers was stolen, and the company reportedly felt obliged to pay a ransom to the criminals to delete their copies.

1. The promise and perils of synthetic data

This TechCrunch article looks at the increasing importance that synthetic data will take in training AI models. This is data that is artificially created but which has the same degree of randomness as data from the real world. Part of the training data for Claude 3.5 Sonnet and OpenAI’s Orion model is synthetic. The reason for synthetic data is the lack of high-quality real data. For instance, the article estimates that 35% of the most popular 1’000 websites block OpenAI’s web scraper. Shutterstock is asking AI companies for tens of millions of dollars for access to its content. Further, AI companies are worried of using data which could lead them to be sued for copyright violations. The challenge with synthetic data is model collapse – the phenomenon that a model trained in AI generated data degrades in performance, so we are currently a long way away from models trained exclusively on synthetic data. Finally, biases are more difficult to detect in synthetic data and considerable effort is needed to create quality data.

If we are able to use synthetic data for training models some day, then the cost of models would significantly reduce. The Writer AI firm has created a model, Palmyra X 004, that is almost entirely trained on synthetic data. The company claim that it cost 700’000 USD to train, compared to 4.6 million USD for the same-sized model from OpenAI. The article mentions that synthetic data generation is an emerging business that could be worth 2.34 billion USD in 2030. In any case, model quality depends on high-quality training data. Annotating training data is known to improve its quality. Dimension Market Research estimates that the annotation market is worth 838.2 million USD today, and could rise to 10.34 billion USD in ten years. On the flip side, human annotators are relatively slow and error-prone, and their biases can propagate to the training data and subsequent model.

2. State of AI Report – Air Street Capital

This presentation from Air Street Capital presents a very detailed review of developments over the last 12 months in the field of AI. One of the key technical developments was OpenAI’s GPT-o1 (“Strawberry”) model which restructured its inference engine so that the model could support chain-of-thought prompting more effectively. This meant that the model has shown relatively good performance for logical problems, such as in mathematics, science and coding. Another technical improvement has been seen in embedding models – the way the models represent information items internally. The improvements are designed to make it easier for the model to determine whether at some point it should continue inference, or else hand-off the request to an outside system as is required for retrieval augmented generation. Elsewhere, the report highlights how models like Claude 3.5 Sonnet, Gemini 1.5, and Grok 2 have been closing the performance gap on the GPT family. In addition, the performance of open models like Llama 3 is catching up on proprietary models. Finally, as organizations increasingly seek to run models on smaller-sized devices, quantization has become essential for models (i.e., the process of transforming weights and internal parameters from floats to integers to reduce model size).

In other developments, there has been a greater focus on data this year compared to previously. Many websites are blocking data scraping by AI companies, partly in an effort to prevent IP violations on their data. Synthetic data is beginning to be used, and there is greater awareness of the risks of training data contamination and how training data that is well curated can lead to improvements in model performance. On the industrial side, Nvidia’s market cap reached 3 trillion USD in June, only the third US company to achieve this figure after Microsoft and Apple. Chinese language models continue to dominate leaderboards, despite US sanctions.

Finally, the report makes some predictions for 2025. These include the prediction that an app created by someone with no coding ability at all will become one of the most popular AppStore Apps, that an open alternative to OpenAI o1 will emerge and even outperform it on some reasoning benchmarks, and that Apple’s on-device research will accelerate breakthroughs on personal on-device AI.

3. Why artificial intelligence and clean energy need each other

This MIT Technology Review article argues that the huge energy demands required by data centers for AI is the opportunity needed to invest in clean energy technologies. The article cites impressive figures around data center energy consumption. For instance, in the US alone, the increased demand by data centers between now and 2026 is equivalent to three times the energy consumption of New York city. Data centers will consume 9% of the electrical grid’s energy by 2030. A ChatGPT query consumes 3 watt-hours of electricity compared to 0.3 watt-hours for a Google search. OpenAI CEO has reportedly spoken of the need for data centers that consume 5 gigawatts, which is roughly the energy consumed by 3 million US homes.

Another energetic challenge is providing the infrastructure to connect clean energy sources to the grid, where the authors refer to nuclear energy. They estimate that there are 1’500 gigawatts of capacity that can be connected to the grid, but that the infrastructure to connect to the grid could take 10 years to complete. They cite the example of Three Mile Island’s nuclear power plant which will take longer to connect to the grid than to restart operations on site. Finally, the authors cite technical advances that can contribute to the increased provision of clean energy in the near future. These include in advanced nuclear fission which can support smaller nuclear reactors that can be deployed relatively quickly, advanced conductors for transmission lines so that energy can be moved at greater capacities, improved cooling technologies for data centers, and next-generation transformers that enable the efficient use of higher-voltage power.

4. The biggest data breaches in 2024: 1 billion stolen records and rising

This TechCrunch article reports that data breaches in 2024 are more numerous than ever, and that over 1 billion user records have already been stolen. The US telephone company AT&T had data records stolen for nearly all of its 110 million customers. The data contained easy-to-decrypt passcodes which meant that 7.6 million user accounts were open to hijack. The data also contained telephone numbers of non customers. This has raised concerns for high-risk individuals such as domestic abuse survivors. AT&T reportedly paid a ransom to the hackers to have the data deleted.

Also in the US, Change Healthcare was hacked and lost personal, medical and billing information for possible one third of the US population. The system was taken down for several weeks, which caused problems in hospitals, pharmacies and healthcare practices nationwide. The company reportedly paid the hacker group a ransom to recover customer data. Another criminal group used stolen access credentials of engineers with access to corporate Snowflake accounts to steal data from several companies (that use Snowflake). The theft included 560 million user records from Ticketmaster, 79 million records from Advance Auto Parts and 30 million records from TEG. The security firm Mandiant believes that around 165 Snowflake corporate customers had data stolen.

In the UK, the pathology laboratory Synnovis, which tests blood samples for hospitals, lost 300 million patient interactions dating back a “significant number” of years, leading to the declaration of a “critical incident” across the whole UK health sector. The laboratory refused to pay a 50 million USD ransom to a Russian ransomware gang.

5. Google to buy nuclear power for AI data centers in ‘world first’ deal

There have been high-profile deals recently between Big Tech and energy providers, in order to deal with increasing data center consumption. Microsoft has signed a deal for energy from the Three Mile Island nuclear power plant while Amazon has bought a data center powered by nuclear energy in Pennsylvania. Meanwhile, Google is purchasing six or seven small nuclear reactors (SMRs) from Kairos Power in California which should be operational between 2030 and 2035. An SMR is a reactor that can have up to 300 megawatts of power and produce more than 7 million kilowatt hours a day. For Google, this nuclear option provides “a clean, round-the-clock power source that can help us reliably meet electricity demands”. Proponents of the technology argue that SMRs provide a more flexible approach to constructing nuclear plants, require less cooling water and have a smaller footprint, thus leading to a greater variety of potential site locations. Opponents say that the technology remains unproven.

6. The race to find new materials with AI need more data. Meta is giving massive amounts away for free.

Meta will release Open Materials 2024 (OMat24) – a massive data set, with models, for use by materials scientists. The data set has around 110 million data points, making it much larger than any data set seen in the domain, and existing data sets are proprietary. Material scientists discover materials by taking elements from the periodic table and simulating different combinations. This is a long process, given the huge number of possibilities, so the OMat24 data set has significant importance to researchers. Only a company with a huge compute power could produce such a data set. The search for new materials is important for areas such as new fuels and nonpolluting building materials. The model is available on Hugging Face.

7. What’s the minimum viable infrastructure your enterprise needs for AI?

This VentureBeat article is part of a series of articles entitled “Fit for Purpose: Tailoring AI Infrastructure”. It provides a high-level overview of the elements and costs that need to be considered when deploying a language model. The first consideration is what data needs to be shared with the model. When no company data needs sharing, then often readily available chatbots like those from Google’s Gemini, OpenAI’s ChatGPT, and Anthropic Claude can suffice. If the company agrees to share data with a model provider, then solutions like Google Vertex are interesting especially since its ecosystem has access to unstructured and structured company data in Gmail and Drive. Otherwise, the company must ensure that the API for the model is adapted to integration into the company’s workflow. In highly regulated industrial sectors, an on-house deployment is more appropriate, which means choosing an open-source model. Choosing the right model can be done by using one of the Hugging Face leaderboards, e.g., LMSYS Chatbot Arena Leaderboard.

Accurate and up-to-date responses from the model requires a retrieval-augmented generation (RAG) framework. This in turn requires a vector database like Pinecone or Milvus, which stores document embeddings for company documents which can be processed by the language model. The article estimates that simple chatbot deployments can take less than two weeks, but that any custom development requires experienced machine-learning engineers or data scientists. For proprietary models, subscription costs can cost as much as 5’000 USD per month.

8. Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

This research paper from Microsoft looks at the challenge of integrating domain-specific and up-to-date knowledge in language models – a challenge for which retrieval-augmented generation (RAG) and fine-tuning are seem as solutions – and concludes that there is no one size fits all solution. The challenge for a model is to retrieve relevant external data, correctly interpreting user intent and fully exploiting the reasoning capabilities of the language model. The authors propose a classification of model requests (that require access to up-to-date or domain-specific data). The first elements focus on data retrieval, whereas the latter focus on the model’s ability to learn and infer.

  • Explicit Facts: E.g., “Where will the next soccer world cup be held?
  • Implicit Facts: Here, the necessary information can require simple inference, e.g., “What is the majority party now in the country where Canberra is located?
  • Interpretable Rationales: for example, in the finance domain, the language model must consider that the organization must adhere to FINMA regulations. These regulations are an external rationale in decision making by the model.
  • Hidden Rationales, where the rationales are not explicitly documented but inferred from patterns and outcomes observed in external data. An example cited from software engineering might be the debugging history of the project as this provides important information for a model whose role is to help software development.

The paper concludes that RAG is essential for explicit fact queries, where the challenge is to locate facts in an external system. Iterative RAG searches address implicit fact queries. These methods can be complemented with tools like text-to-SQL. Interpretable rational queries are best handled with prompt engineering and chain-of-thought prompting with the external rationale document. Hidden rational queries pose the greatest challenge. The authors propose offline learning (where another model can extract data from a dataset offline, and return these in queries to the primary model), in-context learning (where as much information as possible is included in the prompt), and fine-tuning (where the model’s weights are changed) as approaches to use.

9. Investments in generative AI startups topped $3.9B in Q3 2024

PitchBook, the financial services firm, reports that venture capitalists invested 3.9 billion USD in AI startups in Q3 of 2024 – and this figure excludes the 6.6 billion USD recently raised by OpenAI. US firms took 2.9 billion USD of the total amount. The main beneficiaries of funding were Magic (320 million USD for their coding assistant), Glean (260 million USD for their search tool for documents within the company), Hebbia (130 million USD for work on business analytics), Moonshot AI in China (300 million USD for developing commerce platforms) and Sakana AI (214 million USD for the Japanese AI Scientist). This shows that VCs are still optimistic about the uptake of generative AI in the enterprise, notably for tasks related to summarization and problem-solving. This optimism is corroborated by a Forrester report that predicts 60% of companies that are currently skeptical of AI will eventually adopt the technology, and by recent AI models that perform particularly well in coding and scientific tasks. The article underlines that VCs are keen to invest despite problems, notably linked to powering data centers. For instance, data centers are required that consume 5 to 20 times the amount of power of today’s centers, and this demand is prolonging the life of power stations using coal. It mentions that Morgan Stanley believes that global greenhouse emissions could be three times higher in 2030 than they would have been if generative AI had never happened.

10. Introducing the AMD 5th GenEPYC™ CPU

AMD has unveiled the new generation of its EPYC CPU called Turin. The processor’s thread count can reach 192 and 384. Hugging Face has been testing the processor’s performance using multiple instances of the Meta LLaMA 3.1 8B model spread over several cores for use cases related to summarization, chatbots, translation, essay writing, and live captioning. Work has also been going on to deploy torch.compile over the processor – this is a compiler framework for Pytorch code, Pytorch being the open-source library initially developed by Facebook’s AI Research lab used for developing AI applications. The goal for Hugging Face in the partnership is to reduce latency in model deployment, increase the throughput on model execution on supported architectures, and to lower operational costs.