AI Startup Challenges

Open-source licensing and the Hiroshima AI Policy Framework

Posted on May 5th, 2024

Summary

This week saw more articles about the adoption of generative AI. One article explores AI's potential impact on the retail industry. Also, WIRED magazine presents perspectives from AI startups, highlighting the economic challenges they face compared to those in the Web era. The high costs of training and operating models mean that the economic advantage still lies with Big Tech, prompting startups to increasingly focus on niche services and applications.

On the regulatory front, Japan continues to seek support for its Hiroshima AI policy.

An MIT Technology Review article emphasizes the ongoing need for personal data among AI companies. It raises concerns about the growing use of social media profiles of deceased individuals for training purposes.

NIST has initiated a program to evaluate generative AI, initially aiming to promote tools that can detect AI-generated content. CleanLab, a startup from MIT, has developed a tool to assess the reliability of AI-generated text. A cited problem of fake news is the liar’s dividend – the phenomenon that doubts can be easily raised about factual content.

An editorial in the International Journal of Information Management discusses generative AI's impact on the creative industries, believing there will always be a "yearning for the unmistakable human touch" and suggesting a research agenda to oversee AI-human co-creation.

Finally, a post appeared that discusses the complexity of defining open-source in AI. This topic is more complicated than traditional open-source software licensing, as licensing must consider the openness of code, model parameters, and training data.

1. The Unsexy Future of Generative AI Is Enterprise Apps

Following the release of ChatGPT, startups in the AI sector often aimed to create models and applications that offered multiple services. This was the case for instance of Tome, a startup backed by venture capitalists including LinkedIn co-founder Reid Hoffman and former Google CEO Eric Schmidt. Today, the company focuses on a single service, called “PowerPoint-on-GenAI”. Two market factors explain this change of focus.

  1. Cost of OpenAI Services: establishing and operating AI startups is expensive due to the significant computing power required. A notable challenge is the financial and temporal constraints in training AI models for diverse tasks. OpenAI introduced ChatGPT Enterprise, and charges 60 USD per employee. Additionally, its developer API's pricing is based on tokens processed by the model, with its most powerful model costing up to 0.12 USD per 1000 tokens. This economic reality contrasts to that of tech startups of the past two decades.
  2. Narrow Applications: organizations continue to favor "narrow" applications, where each application addresses a very specific problem. This philosophy is what enables smaller tech companies to carve out market niches for themselves.

The article also notes a decline in real-term venture capital funding for AI companies. Despite a higher total investment in 2023, there were substantial corporate contributions like Microsoft's capital investment in OpenAI and Amazon's funding of Anthropic.

2. Japan's Kishida unveils a framework for global regulation of generative AI

Japanese Prime Minister Fumio Kishida recently announced a new international framework for the regulation and use of generative AI. The announcement was made during a speech at the Organization for Economic Cooperation and Development (OECD) in Paris. The framework, named the Hiroshima AI Process Comprehensive Policy Framework, sets out guiding principles and a code of conduct for AI developers. This voluntary framework has already seen participation from nine countries and regions, known as the Hiroshima AI Process Friends Group. Key principles of the framework include the requirement to tag AI-generated content, evaluate the societal risks of deploying AI systems, and prioritize the development of systems that tackle global challenges such as education and climate change. More information about the Hiroshima AI process can be found here.

Source ABC News

3. My deepfake shows how valuable our data is in the age of AI

A journalist got herself cloned by the AI video startup Synthesia, which created a hyperrealistic deepfake resembling her appearance and voice. After just a year of refining generative AI technology, Synthesia has managed to produce AI avatars that are strikingly humanlike. The article explores two concerns about this progress. One significant concern is the phenomenon known as the liar’s dividend. As disinformation proliferates, if people become overly skeptical about the media they consume, they might start to distrust all content, creating a vacuum of trust. This scenario could be exploited by malicious actors and politicians to discredit authentic content. Another critical issue is the protection of personal data. One expert predicts that, within the next few decades, Facebook could store the profiles of over a billion deceased individuals. This vast pool of data might be used to train new AI models or to draw inferences about the descendants of these users. The article also highlights a looming scarcity of freely available online training data. Consequently, AI companies are eagerly negotiating with news organizations and publishers to secure access to their vast repositories of data or buying data from out of business social media sites.

4. NIST launches a new platform to assess generative AI

NIST has launched a new initiative called NIST GenAI to evaluate generative AI technologies. The program aims to establish benchmarks, support the creation of systems for validating “content authenticity” (such as deepfake detection), and foster the development of tools to identify the origins of fake or misleading AI-generated information. According to the official NIST GenAI website, the program will introduce a series of challenge problems designed to “evaluate and measure the capabilities and limitations of generative AI technologies”. The inaugural project of NIST GenAI is a pilot study focused on developing systems capable of distinguishing between human-created and AI-generated media. NIST is calling on academic, industry, and research lab teams to participate by submitting either “generators” - AI systems that generate content based on given topics and documents — or “discriminators”, which are systems engineered to identify whether a summary is AI-generated. The article cites data from Clarity, a deepfake detection firm, that indicates a 900% rise tin deepfakes this year.

Source TechCrunch

5. The Power of Algorithms: 10 Ways Generative AI is Revolutionizing the Way We Shop Online

This article looks at how generative AI and advanced algorithms can impact the shopping experience. These technologies process various data, including users' past purchases, browsing histories, demographics, etc., to suggest products that customers might find appealing. One novel application is virtual try-on technology which allows customers to see how products would look on them without physically trying them on; this also minimizes returns due to sizing errors. Additionally, AI facilitates hyper-personalized advertisements and more accurate predictions of which products customers are likely to purchase, based on past buying behavior, current trends, and seasonality. Improvements in search functionality and product discovery help since customers often encounter an overload of irrelevant products on shopping portals. Further, consumers can upload images or screenshots of desired products instead of relying solely on text-based searches. From a retailer's perspective, advanced algorithms utilize real-time data from various sources, such as inventory levels and customer demand trends, to generate precise stock replenishment forecasts. They also enhance 24/7 customer service capabilities and assist in pricing optimization by analyzing market trends and competitor pricing strategies. Finally, AI might significantly improve fraud detection.

6. Chatbot answers are all made up. This new tool helps you figure out which ones to trust.

Cleanlab, an AI startup from an MIT quantum computing lab, has developed a new tool known as the Trustworthy Language Model. This tool evaluates the reliability of outputs from large language models, assigning them a score from 0 to 1 based on their trustworthiness. This scoring system allows users to determine how reliable responses are. The technology builds on Cleanlab's earlier work from 2021, where they identified errors in popular datasets used for training machine learning algorithms by analyzing inconsistencies across different models. The Trustworthy Language Model extends this principle to evaluate chatbot responses, using disagreements among models as an indicator of reliability. Cleanlab's approach was tested in a practical scenario by Berkeley Research Group, which needed to review tens of thousands of corporate documents for health-care compliance – a task that could take weeks if done manually. By applying the Trustworthy Language Model, the firm was able to focus on documents that the chatbot flagged as less reliable, reducing the workload by about 80%.

7. Open Source AI – definition and selected legal challenges

This post discusses the complexities of defining and implementing an open-source approach to AI. The post follows a panel discussion organized by the Global Partnership on AI (GPAI). The conversation highlighted several issues. The first is the definition of Open Source AI. Here, questions arise about which components should be open-source (training data, model weights, architectural details of the model, usage). Another issue is the variability in open-source licenses. For example, Meta’s Llama2 model is considered less open due to its permissive license, which imposes additional terms when active monthly users exceed 700 million. Similarly, ChatGPT has been criticized for its low degree of openness, which explained the legal action by Elon Musk, who claims it functions as a de facto closed-source model. Training data is another important issue for open-source licenses. In particular, not all training data used for AI are licensed permissively; some data, while publicly available, may not be legally reusable due to copyright restrictions. According to the Open Source Initiative (OSI), an open-source AI license should include terms that allow the “four essential freedoms” (to use, study, modify, and share) for the three main components: data (including training data and methodologies), code (including model architecture), and model parameters (including weights).

8. The impending disruption of creative industries by generative AI: Opportunities, challenges, and research agenda

This editorial article examines the impact of generative AI on the creative industries (art, music, film, fashion, design, advertising, and IT services like gaming). Generally, AI seeks to automate repetitive tasks and therefore reduce human intervention; the impact of AI in the creative industries must be to support creative processes. The authors argue that despite the power of AI, “a yearning for the unmistakable human touch endures” which AI cannot replicate. The authors propose a research agenda around the themes of governance (copyright, plagiarism, and authenticity of works that are co-created by humans and AI), human-AI collaboration (how to foster co-creation while maintaining emotional resonance with consumers), helping the creative workforce adapt to AI, designing new business models, and understanding consumer perception of AI generated works so that the creative industry can evolve.