Data Centers Too Greedy for Grids

AI and programmer productivity and lie detection

Posted on July 11th, 2024

Summary

This week’s articles look at the use of generative AI (GenAI) and consider the limits that AI firms might not be able to overcome.

A WIkipedia article contains a very detailed and extensive list of domains where AI is being used. A study from Ludwig-Maximilians University in Munich looks at the impact of AI-assistants on programmer productivity. One conclusion is that there is no objective improvement in the quality of code produced, though programmers subjectively feel that there is improvement. Another article looks at the use for AI to detect lies – a study at the University of Würzburg in Germany found that AI might be better than humans at this, but perhaps this is due to humans being inherently bad at lie detection.

Two major limitations of current GenAI models are discussed. First, there is a lack of high-quality training data and AI firms do not have options to find new data. People are increasingly refusing to have their personal data scraped by AI firms and several lawsuits are underway over copyright infringement of training data previously used. The second limitation is that data centers require a huge amount of energy. Current data center demand exceeds the availability of renewable energies in some countries.

Finally, an MIT article describes GenSQL – a framework that extends standard SQL with primitives to extract probabilistic and predictive data.

1. MIT researchers introduce generative AI for databases

This article describes GenSQL – a generative AI system for databases from MIT. The tool is an extension of SQL, the relational database query language, and includes the ability to extract probabilistic data. In the case of very large datasets, as used for generative AI platforms, probabilistic queries are needed to make predictions, detect data anomalies, guess missing values, fix errors, or generate synthetic data. An example prediction mentioned in the article from the medical context is detecting a low blood pressure reading for a patient: even if a blood pressure reading appears normal, a combination of other health indicators for that patient could imply the blood pressure is too low. Synthetic data generation is also valuable for large datasets. This is data that is representative of data in the original dataset, but does not contain personal identifiable information or other sensitive or proprietary data. The original research paper can be found here.

Source MIT News

2. Meta AI removes block on election-related queries in India while Google still applies limits

Meta has removed its restriction on Indian election-related queries using its Meta AI chatbot. The Indian elections took place in April and May, and during that time, the chatbot referred users to the Election Committee website when asked questions about candidates and political parties. As for Google, which recently launched the Gemini AI app for Android in India in nine local languages, restrictions on election-related queries are still in place. This year, nearly half of the planet go to the polls in elections, and Big Tech is under much scrutiny about how their platforms are being used to influence electors.

3. AI lie detectors are better than humans at spotting lies

This MIT Technology Review article describes an experiment at the University of Würzburg in Germany to evaluate the use of AI to detect lies. The goal is a tool that is better than humans at detecting lies, which the researchers mention is not hard since humans are inherently bad at lie detection. In the experiment, 768 people were asked to write their weekend plans, with half of the subjects asked to lie about these. The results were then used to train a BERT-based model. The tool was able to detect lies with a success rate of 67%, whereas humans’ success rate is around 50%. Another interesting result is that subjects were more likely to accuse others of lying based on the AI results, even though without AI they would assume that the other person was telling the truth. Compared to the polygraph lie detector test, now widely discredited in US courts, AI can have a grater impact since there is no limit to the number of tests that can be performed in one day. AI systems are also being developed that visually analyze “micro-gestures” in a person’s facial expression for signs of lies.

4. Applications of artificial intelligence – Wikipedia

This Wikipedia article compiles the wide range of uses of AI today. Despite some warnings about the article not conforming to Wikipedia standards, the content is quite detailed with many links to ongoing projects. Here are just some of the very many examples cited:

Source Wikipedia

5. AI companies are finally being forced to cough up for training data

In this blog post, Melissa Heikkilä of MIT Technology News, looks at some of the challenging issues facing AI firms. One of the most fundamental is the quest for high-quality training data, especially as models become bigger. On the one hand, people are less inclined to contribute their personal data, as evident from opt-out clauses appearing on social media platforms. On the other hand, training data already used may have compromised copyright law, and AI firms are currently facing lawsuits in the music industry as well as from major journals like the New York Times. The author believes that it could take years for this legal situation to become clear. Even if AI platforms do cite their training data sources in generated content, they are still capable of hallucinating these sources. The scarcity of high quality data can create leverage for users. Coupled with energy consumption concerns, the result of all this may be that smaller more efficient models will be the only viable option for AI firms.

6. AI Is Already Wreaking Havoc on Global Power Systems

This article discusses the huge strain that AI is putting on energy provision across the world. AI is increasing the demand for energy in data centers, and this demand exceeds available power supply in many regions. The article cites a Bloomberg analysis that mentions how the energy required for data centers in Ireland, Saudi Arabia and Malaysia exceeds the supply of renewable energy in those countries. In the US, data centers will require 8% of total power supplied in 2030, compared to 3% in 2022. In Ireland, a third of the country’s energy will be used by data centers in 2026. There are currently 7000 data centers around the world, in operation or under development. Combined, they consume 508 Terawatt-hours which is greater than the annual power production in Australia. A single Terawatt is said to be equivalent to the output of 1’000 nuclear power plants. Big Tech companies are researching energy-saving techniques in chip design and cooling, though Sam Altman believes that an energy breakthrough on the scale of nuclear power discovery will be required. There is much to be done since the Nvidia H100 chip, used in many AI cloud centers, uses eight times as much power as a flat 60-inch TV.

7. Significant Productivity Gains through Programming with Large Language Models

This article details a study from Ludwig-Maximilians University in Munich on the productivity gains of AI assisted programming. The study considered a group of 24 Python programmers. The researchers compared two AI-supported approaches to traditional coding where only Internet browsing was allowed. The two AI approaches were AI-supported auto-complete in the interface using GitHub Copilot and a conversational system using GPT-3. The auto-complete feature performed best for short code snippets whereas chatbot interactions were used for longer code developments. The longer interactions with the chatbot, with an inherent context-switching, lessened participant appreciation of the tool, so there is a need for ergonomic tool design. The participants were generally positive on their experience with AI, but tended to over-evaluate the quality of produced code: objective measures showed no quality improvement over non-AI produced code, though the participants subjectively felt the code was better. Also, code created with AI-assisted tools tended to be more voluminous.