AI Model Scaling and Covert Racism

Tough Week for Apple and Google

Posted on September 11th, 2024

Summary

The Epoch AI research institute has published a report on factors that might hinder AI training from continuing to scale until 2030. They conclude that a training run consuming 2x1029 FLOPs will be possible in 2030, which would allow for models that out-scale GPT-4 to the same degree that GPT-4 training compute power out-scales GPT-2. Factors that can hinder scaling are power constraints, chip manufacturing limits, data scarcity and the inherent latencies in model processing. Elsewhere, an article looks at the challenge of reducing energy consumption by robots, notably by helping robots distinguish information that is important for the task at hand and thus minimize over-processing.

On societal aspects, an article in Nature shows that popular AI models suffer from covert racism. In experiments comparing inputs in standard American English (SAE) to African American English (AAE), the study found that in a court conviction scenario, the AI would more likely hand down a death sentence to an AAE speaker than to an SAE speaker. This is a covert form of racism since there is no explicit mention of race or color in the data processed by the model. Elsewhere, an MIT Technology Review opinion article argues that fears of the impact of AI on influencing voters is over-rated, and debates on this issue is taking away from more serious concerns like attacks on the freedom of the press, intimidation of election officials and voters, and politicians disseminating falsehoods.

In other news, the European Court of Justice has upheld a decision to force Apple to pay 13 billion EUR in taxes in Ireland, and upheld a 2.4 billion EUR antitrust fine for Google. Github is permitting corporate users to fine-tune the Copilot model on their codebases, thereby giving programmers more pertinent code suggestions. Magic, a startup company working on AI tools for software development, has now raised 465 million USD in funding, despite not yet having generated significant revenue.

TechCrunch has published articles that overview AI model families. This week saw articles on Meta’s Llama and Google’s Gemini.

1. Generative AI coding startup Magic lands $320M investment from Eric Schmidt, Atlassian and others

The startup Magic, which specializes in AI-based tools for software development tasks, has received 320 million USD in fund-raising. Magic is a small company (just over 20 employees) that has not yet generated significant revenue. Nonetheless, its total funding is now around 465 million USD. Magic has made a strategic alliance with Google Cloud and Nvidia to build new supercomputers – the Magic-G4 and Magic-G5. Nvidia’s GPUs are expected to improve inference and training efficiency, while Google’s Cloud provides the infrastructure and ecosystem needed for scaling. At a technical level, an innovation of Magic’s AI models are their long context windows. A context window is the data that a model stores in its “memory” when it works on a task. In principle, the larger the context window, the more relevant the content generated is. Magic’s AI model LTM-2-mini is said to have a 100 million-token window, which is equivalent in size to 750 novels or 10 million lines of program code. As a comparison, the context windows for Google’s Gemini models are around 2 million tokens.

2. AI generates covertly racist decisions about people based on their dialect

Research reported in the Nature scientific journal shows that language models exhibit covert racial biases. The researchers used GPT2, RoBERTa, T5, GPT3.5 and GPT4 to compare responses to treatment of standard American English (SAE) to the treatment of text in African American English (AAE). The results showed that AI models were more likely to give a less prestigious job to an AAE speaker than to an SAE speaker. An experiment also showed that in a court conviction scenario, the AI would more likely hand down a death sentence to an AAE speaker than to an SAE speaker. This phenomenon is termed dialect prejudice. It is recognized as covert racism because, in contrast to overt racism, there is no explicit mention of race or color in the data processed by the model, or no clear expression of racist beliefs. Covert racism in models is a serious problem as organizations are attracted by the idea of using AI chatbots in education and housing. The authors found that using humans in the loop to help detect and remove racist outcomes in AI models does not help to remove covert racism. The source of the problem remains the training data, as large models are still trained using sources of dubious quality like Reddit.

3. Can AI Scaling Continue Through 2030?

Epoch AI is a research institute whose role is to analyze long-term trends around AI. Researchers from Epoch AI have just published a report on factors that could hinder AI training continuing to scale until 2030. Ultimately, the extensiveness of the training phase determines the power of the model. In recent years, and mostly due to improvements in computational resources, training compute power has been increasing by a factor of four each year, which is a greater growth than the uptake of mobile phones a few decades ago. We have also seen an increase in the duration of training.

The researchers conclude that a training run consuming 2x1029 FLOPs will be possible in 2030, assuming training runs lasting from 2 to 9 months. By way of comparison, GPT-4 was probably trained using 2x1025 FLOPs of computing. This would allow for models which would out-scale GPT-4 to the same degree that GPT-4 training compute power out-scales GPT-2. Four factors are identified that can limit this growth:

  1. Power constraints. Data centers consuming up to 5 GW are assumed feasible by 2030. Such a center could support training runs that consume 3x1029 FLOPs. Challenges can arise nonetheless because the energy grids are shared, and political and environmental concerns can determine the share of power that data centers can use.
  2. Chip manufacturing, where expansion could be constrained by packaging techniques (that protect and connect semiconductor devices) and by capacity limits on high-bandwidth memory production.
  3. Data scarcity is another limitation. Currently the indexable Web (where training data comes from) has 500 TB of unique text, though this is expected to increase by 50% by 2030. An increased use of multimodal data (images, video, sound) and advances in synthetic data (artificially generated data) are expected to allay scarcity fears.
  4. Inherent latency in training can limit an increased scaling factor for training compute power. One example of latency is that training data is not processed sequentially, but rather by multiple passes over the data. Increasing the training data size will increase the latency, even if this limiting latency has not yet manifested itself in AI model training.

4. AI’s impact on elections is being overblown

This MIT Technology Review opinion article argues that fears over the impact of AI on elections are overrated, despite the documented cases of deepfakes and influence operations where generative AI is used to create disinformation articles on websites and social media platforms. A report in May by the UK-based Alan Turing Institute analyzed 100 national elections held since 2023: 19 had evidence of AI-interference but there was no evidence that this interference led to significant changes in the results. The challenge for AI influence campaigns is that it is hard to get the AI content to the people who could potentially be influenced to change their vote, especially given the mass of information sources. Also, research shows that factors like values, age, gender and socialization have greater impact on the choice of vote than information received during the campaign. Voters dislike receiving excessively tailored campaign messages.

The article argues that the excessive focus on AI is taking away from factors that pose greater threats to democracy such as attacks on the freedom of the press, intimidation of election officials and voters, as well as politicians disseminating falsehoods. Nonetheless, political campaigners are using AI to help with tasks like fund-raising and operations to get people out to vote.

5. Large Language Model Security Requirements for Supply Chain

The World Digital Technology Academy (WDTA) is an international innovation organization in Geneva under the guidance of the United Nations. Its aim is to aid the creation of norms to encourage fair and sustainable use of digital technology. The WDTA has just published a short report outlining requirements for supply chain security around the development and operation of language models. Report reviewers include representatives from Microsoft, Google, NIST, Baidu, Meta, and Cohere. The report includes both governance and technical requirements.

The key security requirements for the supply chain are integrity (e.g., preventing unauthorized access to model code and preventing data poisoning attacks on training data), availability, confidentiality, controllability (i.e., ensuring transparency for consumers and control over all phases of the supply chain), reliability and visibility (ensuring that all updates to models in the supply chain have a clear ownership). The measures cited include risk assessments, training and awareness for all actors, standard technical security measures at the network, model runtime and OS layers. For model security, the report insists on the use of a documented and version controlled Bill of Materials (BOM), which is the basis of risk assessments and audits. The BOM describes the model architecture, data sources, data preparation methods, software libraries, as well as licenses and liability clauses with software suppliers.

6. ChatGPT Glossary: 45 AI Terms That Everyone Should Know

CNET has published an accessible glossary of AI terms which it intends to update regularly. The terms relate to technical and ethics concepts. Technical terms, perhaps lesser known to the general public, include alignment (the idea of modifying model parameters to coerce the model to behave in a more desirable manner, for instance, to prevent toxic output content) and temperature (the inherent degree of randomness of the model – more randomness in a GenAI model can translate to more creativity, but higher risk). The glossary includes the term artificial general intelligence which is an AI capable of performing “tasks much better than humans while also teaching and advancing its own capabilities”.

In relation to ethics, the terms include anthropomorphism, which is the human tendency to attribute human qualities to objects. This is a particular concern for human-chatbot interactions. Emergent behavior is the property that an AI system can start behaving in ways that its creators had not foreseen. In this context, the glossary cites the paperclips thought experiment from philosopher Nick Boström. In this scenario, an artificial general intelligence is programmed to maximize the production of paperclips. In doing this, the machines remove other objects and kill people that prevent maximization. The designers of the machine, ultimately, set short sighted goals and were unable to reflect long-term.

7. Meta Llama: Everything you need to know about the open generative AI model

This TechCrunch article gives a short overview of the Llama, a family of AI models developed by Meta. The most recent versions are Llama 3.1 8B, Llama 3.1 70B and Llama 3.1 405B. Llama 3.1 8B and Llama 3.1 70B are smaller sized models, designed to run on PCs and company servers for tasks like chatbots and AI-assisted programming. Llama 3.1 405B is designed to run in data centers, and is used for tasks like model distillation (a process of transferring knowledge from large models to smaller models) and creating synthetic data (i.e., data that is artificially created but which has the same degree of randomness as data from the real world). Meta models integrate third-party applications and APIs. The examples cited are the use of Brave Search to get responses to queries for recent information, an in-built Python interpreter for validating code and the Wolfram Alpha API for handling math questions.

Llama models are trained on Web pages, public code repositories as well as posts on Instagram and Facebook. It is believed that the training data includes copyrighted materials, and the use of social media posts has led to clashes with privacy advocates. Llama is a relatively open-source model: developers can reuse Llama in their applications though an App that acquires 700 million monthly users must acquire a special licenses from Meta.

8. Apple loses EU court battle over EUR 13bn tax bill in Ireland

In a landmark ruling, the European court of justice has decided that Apple must pay 13 billion EUR in taxes to Ireland, upholding a decision made by the European Commission in 2016. The Commission is trying to crack down on favorable “sweetheart” tax deals with multinationals. Apple has had its European headquarters in Ireland since 1980, and Ireland has always worked to attract Tech companies through financial incentives. Apple’s tax rate in 2014 was effectively 0.005%, which the Commission believes gave Apple an unfair advantage when marketing its iPhone. Apple says it is disappointed with the ruling, claiming that the Commission is retroactively changing tax rules, and that tax is being paid in the US. The Irish government has said that it will respect the court ruling. In a separate ruling, the European Court upheld a 2.4 billion EUR antitrust fine for Google who used its own price comparison shopping service to get an unfair advantage over its rivals.

9. To be more useful, robots need to become lazier

Living organisms consume a lot of metabolic energy to process information, but have strategies to reduce consumption. For instance, when crossing a road, a human will typically concentrate on oncoming traffic and filter out other information. For robots, identifying information that can safely be ignored is a challenge. Processing needless information for a task at hand is energy inefficient and ultimately leads to a poorer execution of the task. This is becoming a problem as industries appear with increasing numbers of robots. (Amazon has around 750’000 robots – mainly mobile robots for moving stock around and intelligent arms for packing). Generative AI models are suggested as a more efficient means of teaching robots than the traditional pre-coded strategies. Robots can take decisions autonomously and adapt better to dynamic environments. The article describes an experiment in lazy robotics at Eindhoven University of Technology in the Netherlands where these principles were applied to a team of robots competing in the RoboCup soccer tournament. Soccer is considered a good environment for testing robotic principles because of the high level of individual and collective coordination and strategy required.

10. Fine-tuned models are now in limited public beta for GitHub Copilot Enterprise

Github Copilot now gives users the possibility of fine-tuning the Copilot base model with their own codebases. The result is that code suggestions made by the tool to programmers should be more pertinent with respect to existing client code, and coding styles suggested should match the user’s or corporate style. The fine-tuning process is handled on Azure’s OpenAI service. Github promises that a client’s code is never used in the fine-tuning of another client’s fine-tuned model. The fine-tuned model is tested against validation code provided by the client. Github claims that use of Copilot to date has led to an 84% increase in the build success rates (i.e., when code from all developers is put together and tests carried out on the code all pass).

11. Google Gemini: Everything you need to know about the generative AI models

This TechCrunch article provides an overview of Google Gemini – a family of AI models developed at Google’s AI research labs DeepMind and Google Research. The models are multimodal, meaning that they can use text, images, audio and video as input and output. The models are trained on public image, audio and video content. Google has an AI indemnification policy which states that Google will take responsibility in intellectual property claims against its clients, in certain circumstances.

On the technical side, Gemini app is replacing Google Assistant app on Android, giving features like the possibility of asking questions about what is currently on the screen. Gemini is used in Gmail where it can summarize email conversation threads. Google Chrome is using the AI as a writing tool. Google Search uses Gemini for personalized travel itineraries. Gemini powers Code Assist, the AI programming tool, and also Imagen 3 for creating artworks. One of the Gemini models - Nano - runs on Android phones and one use case suggested by Google is to alert users to scam calls. Finally, Google recently announced the Gems feature – chatbots powered by Gemini that users can create for personalized purposes like coaching.

Source TechCrunch