Google's Quantum Breakthrough

Election Interference and Small Models

Posted on September 21st, 2024

Summary

Google is announcing a breakthrough in quantum computers. The basic unit of data in quantum computing is the qubit, but errors quickly introduce themselves into physical qubits. To counteract this, computers use logical qubits, composed of several physical qubits, and error correcting algorithms are used to obtain the logical qubit value. Until now, adding more physical components has had the effect of introducing more errors, but Google claims to have found a technique to add more physical qubits and have less errors.

OpenAI has released a new family of models, o1 and o1-mini, that have scored 83% on the International Mathematics Olympiad exam (compared to 13% for GPT-4o). The Center for AI Safety is alarmed because the model responded better to questions on bioweapon creation than biology and chemistry PhD students. Google announced DataGemma, a framework designed to improve factual accuracy of models by using retrieval-augmented generation (RAG) and retrieval-interleaved generation (RIG) to extract data from trusted data sources like the United Nations and the World Health Organization (WHO).

MIT published a paper on Co-LLM, where a general-purpose model is able to recognize from internal parameters when it needs to delegate response generation to an expert model. This means that supervision among a collection of models becomes automatic. Finally, a Computer World article looks at the likely increase in use of small language models by enterprises. These models can often run on a single GPU and are specialized in very specific tasks, e.g., text-to-SQL generation.

With regard to social issues, a Science article shows that language models are effective in convincing people with conspiracy theories to dismiss their beliefs. An MIT Technology Review article reports that generative AI has not had an impact on convincing people to change their vote in recent European elections. On the other hand, generative AI has been used to destabilize candidates, with women candidates notably being the victim of deepfake sexual content. A TechCrunch article describes how a jail-breaker managed to get ChatGPT to explain how to create fertilizer bombs. Finally, a study has shown more evidence of covert racism in common chatbots: when trained on home surveillance videos, a chatbot was more likely to call the police for a house in a predominantly white area than in a predominantly colored area.

1. Google says it’s made a quantum computing breakthrough that reduces errors

Researchers at Google have announced a breakthrough in quantum computing error correction. A key problem for quantum computers is that hardware components are sensitive, and errors quickly introduce themselves into operations. This means that quantum computer computations have a very short duration. Error correcting techniques are therefore necessary for longer computations, but engineering limitations have meant that adding more components introduces more errors. The basic unit of information in a quantum computer is the qubit (the 0 or 1, or a superposition of these values). A qubit is stored in a physical qubit device. In Google’s case, several physical qubits are used to represent a logical qubit, and an algorithm called the surface code is used to determine the value of the logical qubit from the physical ones. Google showed that a logical qubit with 105 physical qubits suppressed errors more effectively than a logical qubit with 72 physical qubits. Thus adding more components can improve error correction, contradicting previous observations, and the logical qubit is observed to retain information 2.4 times longer than the physical qubit. The article notes that the next milestone for quantum computing is the quantum computer with 100 logical qubits, and this may be met by QuEra in 2026.

2. The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers

This article describes a research study to evaluate the productivity of software developers using AI-assisted programming. A total of 4’867 developers from Microsoft, Accenture and an unnamed Fortune 500 company were evaluated in the study; the AI tool used was Github Copilot. The study differs from previous studies because it measures productivity on-site, rather than via lab experiments. Productivity is measured via the number of completed tasks, the number of pull requests (usually, the basic unit of development work) and successful builds (when all features are put together and finally compiled). The results show a productivity increase of 26.08% for developers using Copilot, a 13.55% increase in the number of committed pull requests and 38.38% increase in successful builds. Interestingly, the results showed that junior developers experienced a higher improvement in productivity compared to senior staff.

3. Durably reducing conspiracy beliefs through dialogues with AI

The growth of the Internet has helped the spread and persistence in beliefs of conspiracy theories. It has generally been thought that people who hold conspiracy beliefs cannot be persuaded to disbelieve these through rational discussions or the presentation of facts, because the belief is built on an underlying psychic need. This Science article presents the results of an experiment where an AI model, GPT-4 Turbo, was used to dialog with 2'190 Americans who believed in some conspiracy theory, and found that after a three-round conversation, 20% of participants reduced their belief in the conspiracy. The conspiracy theories measured related to the assassination of John F. Kennedy, the illuminati, COVID-19, the 2020 US presidential election, Princess Diana, September 11th, Aliens, and the moon landing. The conspiracy negation claims put forward by the AI were validated by a professional fact-checker. The results go against conventional thinking by suggesting that a direct factual discussion can work. The factual approach to convincing people is believed to lead to longer-lasting beliefs than persuasion based on emotional cues.

4. OpenAI o1 Model Warning Issued by Scientist: 'Particularly Dangerous'

OpenAI has released a new family of models, o1 and o1-mini, that are claimed to have improved reasoning. An OpenAI scientist said that the models are programmed to try different strategies and recognize mistakes. Concretely, the o1 model is said to have scored 83% on the International Mathematics Olympiad exam (compared to 13% for GPT-4o), reached the 89th percentile in Codeforces coding competition, and outperforms PhD students in physics, chemistry and biology benchmarks. This is an issue for Dan Hendrycks, director of the Center for AI Safety, because the model outperforms students on questions relating to bioweapons. OpenAI says that the model has been strongly tested, and got 84% on the hardest of the company’s jailbreaking test (compared to 22% for GPT-4o). The o1 model is available to developers via an API and to ChatGPT Plus users. A VentureBeat article notes that the model is particularly good at coding tasks, drafting legal documents and creating action plans.

Source Newsweek

5. Hacker tricks ChatGPT into giving out detailed instructions for making homemade bombs

This TechCrunch article reports how a hacker, going by the name of Amadon, managed to bypass the ChatGPT guardrails and get the chatbot to explain how to create mines and other improvised explosive devices. An explosive experts confirmed that the methods suggested by ChatGPT were very dangerous. The hacker managed to jailbreak the chatbot (go around the guardrails that normally should prevent the chatbot responding with dangerous content) by convincing the chatbot to play a game, where normal guardrails were unnecessary. Amadon reported his findings to OpenAI's bug bounty program, who replied that jailbreaking issues where not in its purview. A jailbreak is not a hack in the traditional sense, where a modification to the code of the system (patch) can suffice to fix the problem. The article mentions how the fertilizer industry has been working over the past few years to make their products less dangerous.

6. DataGemma: Using real-world data to address AI hallucinations

Google has announced DataGemma, a language model that relies heavily on the Data Commons platform for up-to-date statistical data. Data Commons is an open-source initiative led by Google to build a knowledge graph from trusted sources, including the United Nations (UN), the World Health Organization (WHO), Centers for Disease Control and Prevention (CDC) and Census Bureaus. The database includes more than 250 billion data points and over 2.5 trillion triples from hundreds of global sources. Closely related to the Gemini family of language models, the Gemma models can query the Data Commons for fact-checked information. Two approaches are used. In Retrieval-Interleaved Generation (RIG), the model queries the database when it responds to a prompt whenever it comes across instances of statistical data. In Retrieval-Augmented Generation (RAG), the model first loads facts from the database into its context window, before beginning to execute its response to a prompt. These techniques should have the effect of improving response accuracy. Indeed, Google claims that the model leads to less hallucination in its laboratory testing.

Source Google

7. Learning to Decode Collaboratively with Multiple Language Models

Every language model is inherently limited in its knowledge to what it can learn from its training data. Several techniques are used to overcome this limitation. For instance, the model can be linked to an external database or search engine for up-to-date facts (which is the basis for retrieval-augmented generation), and to code executions platforms for helping to reply to software queries. Another approach is to connect several specialized models together, where the request is forwarded to the most pertinent model for the subject matter of the query. The challenge for this approach is that a supervision framework must be developed to indicate when a model calls an external API and to collate responses from the different models.

In new research from MIT, an algorithm called Co-LLM has been developed whereby a general-purpose model forwards a request to an expert model when it estimates that it can get a better reply from the expert – the so-called phone a friend scenario. In one example presented, in response to a question about the number of bear species that have become extinct, the base model (Llama-7b) indicated two species of bear, and the expert model (Llama-70b in this case) corrected the dates proposed by the first model and added the Latin species names. The base model is trained to know when to delegate part of a request, so the reply to the user is based on tokens generated by the base model and expert models. No direct supervision is required, which means that the overall execution is more efficient than combining the responses of different language models. The researchers tested their approach with GSM8k and MATH for reasoning and math problem solving, and BioASQ for medical question answering. The models used are from the Llama and Llemma families.

8. AI-generated content doesn’t seem to have swayed recent European elections

The use of generative AI for spreading misinformation is being highly surveyed as half of the world’s population live in countries that will go to the polls this year. Research from the Alan Turing institute in the UK suggests that AI-generated misinformation did not impact the results of recent elections for the European Union parliament, or in the UK and French parliamentary elections. Misinformation campaigns using generative AI were active, but evidence suggests that these efforts did not significantly sway voter opinion. Internal interests groups and foreign actors, notably Russia, are reported to be behind AI-generated misinformation efforts which generally are tailored to the creation of news articles and comments on social media that carried misinformation. The article notes that the people who read this content are most likely people who already hold these beliefs.

Rather than creating a mass shift in public opinion, a more pressing concern about AI-generated is its use to target and destabilize politicians. A number of female candidates were targeted with deepfake sexual content of which they were the subject. The article argues that such targeted efforts can have a greater impact on the democratic process in the long run. In addition, research shows that people are finding it increasingly difficult to distinguish between real and fake political content. This was exploited in one instance in the French parliamentary elections where members of the far right party shared deepfake content with a strong anti-immigration narrative.

9. With GenAI models, size matters (and smaller may be better)

A current trend around language models is the development of smaller versions. Compared to common large language models (LLM), a small language model (SLM) works with much fewer parameters and is trained on a smaller volume of data. An SLM can run on a single GPU with just 5 billion parameters, which makes it more accessible to corporate environments, in part because the model can be run on premises. In comparison, a large language model can require up to 10’000 GPUs to run. Whereas LLMs are designed for general purpose tasks, an SLM is designed for very specific tasks – optical character recognition (OCR) type tasks and text-to-SQL transformations being two cited by the article. It is suggested that the future company IT infrastructure could be 50% traditional applications and 50% SLMs. Another advantage of SLMs is that they are less costly to train. LLMs are increasingly expensive to train – it is even suggested that LLM training costs will reach 22 trillion USD in 2026, which is roughly the US national debt. Current examples of SLMs include Google Gemini Nano, Microsoft’s Orca-2–7b and Orca-2-13b, and Meta’s Llama-2–13b.

10. Study: AI could lead to inconsistent outcomes in home surveillance

This article describes research from MIT and Penn State University that evaluated how different language models behaved in relation to home surveillance videos. The models studied were GPT-4, Gemini, and Claude, and the training data came from Amazon Ring home surveillance videos. One result of the study was to identify inconsistencies across the models. One model would flag a vehicle break-in whereas another model would not flag it. Another study result was that models were more likely to suggest calling the police in predominantly white areas compared to predominantly colored areas. Also, the term delivery worker was suggested more frequently in white areas, and burglary tools more frequently in predominantly colored areas. The researchers call this norm inconsistency – the idea that a model fails to perform the same in all deployments. These results were surprising because the models were given no explicit information about the demographics of the area. On the other hand, the study found that skin color of people on camera did not influence the decision taken by the model. The researchers explain this by the model designers mitigating skin-color during the model’s development and training phase.