Brain-scanning GenAI Models

GPT-4o, Project Astra and poor Chinese Language Training Data

Posted on May 30th, 2024

Summary

This week features several articles discussing how generative AI models "reason" when creating content. Understanding this process is crucial because when an AI model makes a decision, it is essential for human operators to understand how the AI arrived at that decision. This is the foundation of Explainable AI, which is now considered a core requirement for Responsible AI Usage. Additionally, understanding the decision-making process is important because if we can identify the conditions under which models generate toxic output (e.g., dangerous content), we might be able to prevent such content by intervening on the parameters during content generation. Two articles describe research from Anthropic where models were "brain-scanned" to observe the reasoning process.

An interesting result is that while chain-of-thought prompting – a form of machine psychology where the machine is asked to explain each step of its reasoning process – has flaws. Like humans, machines often fabricate logic to justify their responses.

One article highlights the improving ability of chatbots to discern irony and intentions in conversations with humans, a capability previously thought to be unique to humans. Another article describes an experiment with health chatbots designed to coach behavioral changes in humans, such as quitting smoking. The research found that chatbot advice is less effective in the initial stages of behavioral change because the chatbots cannot recognize when users are hesitant or ambivalent about change and therefore fail to guide them appropriately.

Elsewhere, an article addresses the quality issues of Chinese language training data, which is primarily sourced from state-sponsored media or numerous pornographic and gambling websites. Another article summarizes an interview with a Meta director who asserts that there is no evidence yet of systemic electoral disinformation campaigns and defends Meta’s decision to allow advertisements claiming the US 2020 election was "stolen."

One article examines the steps medium to large-scale enterprises should take to adopt generative AI. Another explores the primary uses of generative AI by criminal groups.

Google has launched LearnLM, a learning platform integrated into their ecosystem. Personalized chatbots are also discussed, including GPT-4o from OpenAI and Project Astra from Google.

1. Anthropic’s Generative AI Research Reveals More About How LLMs Affect Security and Bias

Anthropic researchers have created a detailed map of the inner workings of its Claude 3 Sonnet 3.0 model. This map allows researchers to examine how neuron-like data points, called features, influence the generative AI's output. Some features are "safety relevant," which means that identifying these features can help tune AI to avoid dangerous responses being generated by the model. Features in the model were identified when the model was questioned in relation to security vulnerabilities in code, or asked to produce dangerous content (like produce bioweapons). Features where also observed when bias, lying, deception, and sycophancy were observed in generated content. An experimental goal was to determine whether models that appear safe during training will actually be safe in deployment. The Anthropic researchers experimented with clamping to increase or decrease the intensity of specific features, aiding in tuning models to handle sensitive security topics appropriately. The original Anthropic research paper can be found here.

2. OpenAI’s latest blunder shows the challenges facing Chinese AI models

This article argues that AI companies need to invest more effort into curating models’ Chinese training data and filter out inappropriate content. The article cites the case of OpenAI’s new GPT-4o model and its handling of Chinese data sources that were used to train the model’s tokenizer. This data was polluted by Chinese spam websites, leading to a Chinese token library being filled with phrases related to pornography and gambling. These spam websites constitute up to 90% of the Chinese language data sources used. The alternative Web sources are Chinese state media sites, where training data quality is questionable for Western AI companies because of their rigid and party-controlled political orientation.

3. Meta says AI-generated election content is not happening at a “systemic level”

Nick Clegg, President of Global Affairs at Meta, said in an interview that while Meta has observed attempts at interference in events like the Taiwanese election, the scale of this interference remains “manageable.” He defended the company’s efforts to prevent violent groups from organizing on the platform, noting that 200 such groups have been removed since 2016. In relation to upcoming elections, Meta now relies on fact-checkers and AI technology to identify unwanted groups on its platforms. As part of Meta’s commitment to the Partnership on AI, the company has begun adding visible markers to AI-generated images published on Facebook, Instagram, and Threads. However, Clegg acknowledged that not all AI-generated content can be detected. Clegg also defended the company’s decision to allow advertisements claiming that the 2020 US election was stolen. He noted that such claims are common worldwide. Many US politicians are concerned by this decision, as it threatens public trust in elections and the safety of individual election workers.

4. 10 Questions to Ask About Generative AI

This article, aimed mainly at management in medium-sized to large organizations, puts forward ten principles to help in adopting GenAI. These are:

  1. Identify the business opportunity. This involves identifying concrete use cases where GenAI, despite the associated risks, can bring measurable business value.
  2. Define an AI Strategy. This entails being able to measure the value that AI services can generate, and implementing a program for employees to develop and use the technology.
  3. Understand the regulatory and ethical issues. Several countries have defined regulations to oversee the use of AI and most major tech companies have defined Responsible AI charters.
  4. Have a policy to manage or source data produced by AI. Data produced by GenAI should be included in the organization’s data governance plan. For instance, if the company creates its own models which are trained on proprietary company data, then plans need to be made to protect the content generated. Further, the provenance of external training data may need to be clarified so that its trustworthiness can be estimated.
  5. Cultivate talent to implement GenAI. The talent required includes data scientists, developers, operations people as well as compliance officers.
  6. Governance Framework for Experimentation. A governance framework defines organizational rules for ethical use, risk management, legal compliance and policies, as well as standards and controls for all divisions within the organization. For instance, the framework can define the business lines where GenAI must never be used or the classes of company data that should never be entered as prompts.
  7. Implement Monitoring of AI Usage. These can be self-check procedures that employees apply to themselves, human review of generated content as well as regular audits of company processes that use GenAI.
  8. Set Accountability. The model’s behavior should be supervised with the same diligence as that of any employee. GenAI brings a business risk since content generated can engage the company’s liability. Someone needs to be appointed AI supervisor.
  9. Manage Risks. As with all types of risks companies are exposed to, C-suite employees must understand those of GenAI and process them throughout the organization with equal rigor.
  10. Define Management Issues. GenAI in the company is a game changer, but so are initiatives in relation to environmental, social and governance (ESG) as well as diversity, equity and inclusion (DEI). These initiatives must all be addressed in a coherent manner.

5. Five ways criminals are using AI

WormGPT is large language model service that appeared in 2023, built over GPTJ, and trained on malware-related data. The purpose of WormGPT was to assist hackers and the model had no ethical rules or restrictions. Over the last year cybercriminals have shifted from developing their own AI models to exploiting existing tools. This article identifies the five most popular use cases of GenAI in the criminal world.

  1. Phishing. The arrival of ChatGPT saw a huge increase in phishing emails, as criminals could enhance or translate their messages to victims. ChatGPT has been integrated into spam-generating services like GoMail Pro.
  2. Deepfake Audio Scams. Cybercriminals have used deepfake technology to scam significant amounts of money. For example, an employee in Hong Kong was reportedly tricked into transferring 25 million USD by a deepfake of the company's CFO. There have also been reports of distressing voice messages sent to people where the voices mimicked those of loved ones asking for money to be transferred to help them out of a difficult situation.
  3. Bypassing Identity Checks. Criminals are selling apps on platforms like Telegram that bypass the identity verification processes, used by many banking services, where the client identifies himself by having his photo taken. These services are offered for cryptocurrency exchanges like Binance for as little as 70 USD.
  4. Jailbreak-as-a-Service. Services like EscapeGPT and BlackhatGPT provide access to language-model APIs and provide jailbreak prompts that bypass GenAI platform safeguards. The jailbreaks are updated regulary.
  5. Doxxing and Surveillance. Doxxing is the process of revealing private and sensitive information about someone online. GenAI models can infer sensitive personal details such as ethnicity, location, and occupation from mundane chatbot conversations, making them powerful tools for malicious actors.
  6. 6. AI models can outperform humans in tests to identify mental states

    Theory of mind is considered to be a cornerstone of emotional and social intelligence. It is what allows us to infer people's intentions, and to engage with and empathize with one another. Most children develop these skills between the ages of three and five. This article reports on research published in the journal Nature Human Behavior showing that some GenAI models are sometimes better than humans at tracking people's mental states. Psychologists previously believed this ability was unique to humans. The research involved models such as GPT-3.5, GPT-4, and three versions of Llama, along with 1,907 human participants. Both the models and humans were tested with the same questions to assess various abilities, such as inferring someone else’s intentions and comprehending irony. For humans, naturalistic responses from models increase the tendency by humans to attribute a mind and intention to these models. (This anthropomorphizing effect of GenAI has been noted elsewhere).

    7. Why semantic search is a better term than understanding for gen AI

    Semantic search has been an area of research since the early 2000s, and the introduction of GenAI marks a significant evolution. The idea of semantic search of course is to locate conceptually relevant items, e.g., a search for “place to eat” should yield a list of bars and restaurants located nearby. Earlier semantic search approaches used older machine learning techniques and were principally for textual searching. With GenAI, semantic searching can include text, audio and video, and also integrate connections between elements of these types (or modalities). It is still a research challenge for GenAI "multimodal diffusion models" to generate semantically consistent audio for video, where the connection between sounds and visual events is correct. An example cited in the article is creating a video of waves hitting the shoreline and ensuring that the crashing sound occurs only when the waves hit. Another challenge is to improve models’ understanding of tone, volume, and rhythm in voice audio streams. When audio is converted to tokens, aspects such as humor or sarcasm can be lost.

    8. How Does ChatGPT Think?

    A great challenge of GenAI currently is that humans cannot always understand why the AI model gave a specific reply to an input prompt. The ability to understand how a decision is taken by AI is known as Explainable AI, which is now considered as fundamental for responsible use of AI. Though it is possible to highlight those parts of an image of a cat that let to the AI model identifying the image as a cat, explainability is still in its infancy as a domain. The challenge stems from the billions of parameters that GenAI process. This article reports on an experiment by researchers at Anthropic who probed the parameters of an LLM when the human asked the LLM to shut down. The researchers observed that the LLM drew on Arthur C. Clarke’s book 2010: Odyssey Two to plead with the human not to be shut down. Another approach to explainability is to use machine psychology, where the human engages with the GenAI to understand its output, also known as chain of thought prompting. However, this approach is seen as limited because models, like humans, can be shown to fabricate logic to justify their responses. Yet another approach to understanding how model's think is to a brain scan where the behavior of specific parameters are observed when different prompts are entered, much like a brain specialist observes neuronal behavior in humans – but without the need for the patient’s permission.

    9. Google Introduces New LearnLM Models for Students, Gemini Features for Education

    Google is introducing a new AI model for students called LearnLM, and is integrating it with its Gemini AI model into Google Workspace for Education. The goals of LearnLM include promoting active learning, providing well-structured information in various modalities, adapting to learner goals and needs, stimulating curiosity, and helping learners track their progress. On Android devices, LearnLM will assist with complex math problems using the “Circle to Search” feature (where a user can quickly search by encircling the element to search on the screen). Additionally, integration with the YouTube app for Android will allow users to interact with videos by asking questions or seeking clarifications.

    10. Study: Large language models can’t effectively recognize users’ motivation, but can support behavior change for those ready to act

    Researchers from the ACTION Lab at the University of Illinois Urbana-Champaign found that large language model-based chatbots, such as ChatGPT, Google Bard, and Llama 2, have potential for promoting healthy behavioral changes but are fundamentally limited for people in the early stage of behavioral change. The study examined chatbot responses to various health scenarios, including physical activity, nutrition, mental health, cancer screening, sexually transmitted diseases, and substance dependency. The researchers categorized user questions according to the five stages of motivational change in behavior (c.f., Trans-Theoretical model). The results showed that while chatbots can identify motivational states and provide relevant information for users with established goals and who have commited themselves to taking action, the chatbots struggle in the early stages of behavioral change. Specifically, chatbots are unable to recognize when users are hesitant or ambivalent about change and therefore fail to guide them appropriately. Consequently, when users are resistant to habit change, chatbots do not effectively help them evaluate their behavior, understand its causes and consequences, or assess environmental influences on their behavior. The original research paper can be found here.

    11. OpenAI and Google are launching supercharged AI assistants. Here’s how you can try them out.

    Google and OpenAI have both recently announced new AI assistants with advanced capabilities. OpenAI's GPT-4o can converse in real time with a response delay of about 320 milliseconds, matching natural human conversation speed according to OpenAI researchers. It can interpret objects pointed at by a smartphone camera and assist with tasks like coding and translating text. Google's Gemini Live, launching in the coming months, offers similar features and is marketed as a “do-everything” chatbot. Further, Google presented Project Astra at the Google I/O annual developer convention, which will be accessible via smartphones and potentially desktop computers, with future plans to embed it into smart glasses and other devices. For this Technology Review article, GPT-4o excels in audio capabilities, including realistic voices and singing, Project Astra boasts superior visual capabilities, such as remembering the location of objects like glasses.