Small Models for Edge Computing

The Rise of Graph Databases

Posted on November 26th, 2024

Summary

Edge computing is in the news this week. With its applications including domotics, smart factories, wearable computing and IoT, an InfoQ article explains how quantization (transforming model weights from floats to lower-precision integers) and pruning (removing less utilized weights) can reduce model size enough to run them on edge computing devices. Small models are efficient in the edge computing context because they recognize patterns and can therefore avoid unnecessary recalculations and communications. Meanwhile, Gartner estimates that 75% of enterprise-generated data will be created and processed by edge devices by the end of 2025.

A VentureBeat article looks the increasing use of graph databases for cybersecurity. Combined with AI, graph databases support fast querying and multi-domain modeling. This makes it possible to detect more attack patterns across all digital assets.

OpenAI has published two papers on red-teaming. Its automated red-teaming approach uses reinforcement learning to reward a testing model for coming up with effective and diverse attacks to test other models. OpenAI notes many of the current weaknesses of current red-teaming, including the psychological harm for testers who need to process violent content.

Big Tech is expected to have spent 240 billion USD on AI by the end of 2024. This growth is explained by the increasing global market for AI – expected to reach 20 trillion USD globally by 2030 – and by the costs of training and running AI models. Meanwhile, in the lawsuit brought against OpenAI and Microsoft by the New York Times and Daily News, lawyers for the news sites claim that OpenAI engineers accidentally deleted data logs from machines handed over to the news agencies.

A VentureBeat article looks at the problem of quantifying the “productivity boost” and “cost savings” benefits that companies expect of their AI projects. Though most C-suite employees are confident in return on investment on AI projects, though there are no standards for measuring ROI.

A WIRED article looks at the impact that a new Trump administration is expected to have. Experts expect a lightening of legislation around cybersecurity and AI, including regulation directed at critical infrastructure, as well as a more aggressive stance against cyber attacks from adversaries.

Finally, the head of AI with the UK police has warned of the increasing use by criminals of AI. The largest criminal use of AI today is by pedophile groups who are using generative AI to create images and videos of children being sexually abused. There is also the problem of AI “heists” where criminals use deepfake technology to impersonate company executives and convince employees to transfer money.

1. Efficient Resource Management with Small Language Models (SLMs) in Edge Computing

Edge computing is about processing data where it is produced, rather than moving it first to the cloud or central server. Applications include domotics, smart factories, wearable computing and IoT. Edge computing devices face physical challenges: 1) the devices have limited processing power, generally using low-grade CPUs or microcontrollers, 2) they have limited memory sizes, 3) they need to be energy-efficient, by ensuring long-lasting operation while minimizing the number of battery changes, and 4) they can have limited bandwidth since they can operate in environments with limited connectivity. The low processing power and memory sizes might seem to exclude the use of language models in such environments, but quantization (transforming model weights from floats to lower-precision integers) and pruning (removing less utilized weights) can allow small models to run on edge computing devices. For instance, PruneBERT is a pruned model that has a 97% reduction in weights compared to BERT but still exhibits 93% of the original model’s accuracy, but with significantly increased inference times. The article describes a short demonstration of how to use TensorFlow Lite to prune and quantize a pre-trained MobileNetV2 model, and then deploy this modified model to the Google Edge TPU. The pruning removed 50% of the model’s weights, and with quantization, the model was 60% smaller. Inference times were reduced from 50ms to 40ms on the Edge TPU and energy consumption was reduced by 30% compared to the original model.

A key point of this article is that these small models are efficient in the edge computing context because they recognize patterns and can therefore avoid unnecessary recalculations and communications. For a smart thermostat for instance, the thermostat can observe behavior in the home and adjust temperature without checking with the cloud. In the case of a smart heart rate monitor embedded with an SLM, the monitor learns the patient’s regular heart rhythm and only transmits data when anomalies, such as arrhythmias, are detected, reducing unnecessary power usage and data transmission. This adaptive inference approach reduces computation, saving energy for more critical tasks and extending battery life.

2. What Edge Computing Means for Infrastructure and Operations Leaders

Staying on the topic of edge computing, Gartner is predicting that while around 10% of enterprise-generated data is created and processed outside a traditional centralized data center or cloud, this figure will rise to 75% by the end of 2025. For Gartner, edge computing includes mobile devices like vehicles and smartphones, as well as infrastructure like building management solutions, manufacturing plant solutions, offshore stations like oil rigs, hospitals and medical device systems. One of the technical challenges with the increase in volume of edge data is that funneling that data to cloud centers will become less efficient. This will lead to an increase in edge servers deployed on 5G cellular base stations, which could become clusters or micro data centers. These will host applications that manage data from local devices and cache content. Gartner warns that edge servers will increase the attack surface for cybercriminals. For instance, edge servers will be the target of denial-of-service attacks or exploited as entry points into organizations’ infrastructures.

3. Advancing red teaming with people and AI

OpenAI has published two papers on red-teaming. The first discusses how the company engages people for external red-teams. The second looks at OpenAI’s current approach to automated red-teaming. OpenAI has been using red-teaming to test models since the DALL·E 2 image generation model in 2022. In the case of human red-teaming, the key lessons are to choose a diverse testing group in expertise and languages spoken, to choose interfaces and instructions that facilitate the rapid documentation and collection of test results, and to choose a version of the model to test. For instance, OpenAI find that testing a model early in the training cycle helps to clarify the nascent capabilities of the model, while at the same time, red-teaming results might be obsolete because the model continues to learn and has safety mechanisms added.

The challenge for automated red-teaming is getting a testing model to generate sufficiently diverse attacks – because models tend to repeat known attack strategies. In OpenAI’s approach, they automate the generation of jailbreaking and prompt injection attacks using reinforcement learning. In this approach, the model is rewarded for generating effective attacks that are different, thereby inciting attack diversity. The OpenAI papers highlight some of the current weaknesses of red-teaming: 1) it captures results at a particular point in time, whereas models evolve quickly, 2) it costs time and money, 3) testing models for violent content can lead to psychological harm for testers, 4) publishing red-teaming results can give insights to bad actors, and 5) as models evolve, more sophistication is needed on the part of testers.

Source OpenAI

4. OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit

The lawsuit brought against OpenAI and Microsoft by the New York Times and Daily News is still ongoing. OpenAI is accused of scraping the news sites without permission for content to train their AI models. In a recent development, the lawyers for the news sites claim that OpenAI engineers accidentally deleted data logs from two virtual machines handed over to the news agencies by OpenAI. Lawyers for the news agencies say that these logs could prove that OpenAI did scrape copyrighted content from the news sites. OpenAI has denied deleting any logs, and that any data loss results from misconfigurations by engineers working for the news sites. OpenAI still maintains that training models with content that is publicly available, including newspaper articles, should fall under fair use – the idea that content can be freely copied without permission for the purpose of education, research or general societal benefit. That said, OpenAI refuses to say whether articles from the New York Times and Daily News were used for model training. At the same time, OpenAI has signed licensing agreements with the Associated Press, Business Insider owner Axel Springer, the Financial Times, People magazine’s parent company Dotdash Meredith, and News Corp. The article suggests that Dotdash is being paid 16 million USD per year.

5. AI increasingly used for sextortion, scams and child abuse, says senior UK police chief

The head of AI with the UK police has warned of the increasing use by criminals of AI. The largest criminal use of AI today is by pedophile groups who are using generative AI to create images and videos of children being sexually abused. Sextortion is also evolving, from situations where victims had photos of themselves posted by former partners for romantic revenge or blackmail, to general blackmail where the criminals “nudify” photos of victims taken from social media. Another example is the multiplication of AI “heists” where criminals use deepfake technology to impersonate company executives and convince company employees to transfer large sums of money. The article cites the example of a finance director with a multinational firm who was scammed into transferring nearly 25 million EUR when a criminal managed to impersonate the chief financial officer of the firm. Another emerging trend is chatbot radicalization where an apparently friendly chatbot convinces someone into performing a radical act. This is what happened with the perpetrator of an attempted crossbow attack on Queen Elizabeth II in 2021.

6. The graph database arms race: How Microsoft and rivals are revolutionizing cybersecurity

This article highlights the increasing importance of graph databases for defending against cyber-attacks. Several graph systems were recently presented at Microsoft’s Ignite 2024 conference, including Microsoft’s Security Exposure Management Platform (MSEM), Cisco’s XDR platform as well as CrowdStrike’s Threat Graph, SentinelOne’s Purple AI, Palo Alto Networks’ Cortex XDR, and several others. Graph databases are attractive for cybersecurity platforms because these databases facilitate the visualization and analysis of interconnected data – digital assets, users, devices – and finding relationships between these. This can help to identify new vulnerabilities and attack patterns. Graph databases support fast querying as they can process billions of nodes in milliseconds. When combined with generative AI, this makes real-time detection easier, a reduction in the number of false positives feasible, as well as making it possible to detect subtle attack patterns. One Microsoft spokesman said that the approach offers “defenders the tools to unify fragmented insights into actionable intelligence”. Cisco’s XDR platform is applying the approach to network environments, processing data from several network and IoT devices.

Elsewhere, Microsoft announced several security enhancements. There are several improvements to Windows 11 security, including a zero-trust DNS protocol. Microsoft claims that improvement to its Security Copilot improves incident triage and reduces mean time to resolution by 30%. Microsoft is also offering 4 million USD in security vulnerability bounties for its AI and cloud platforms.

7. Unlocking generative AI’s true value: a guide to measuring ROI

This VentureBeat article looks at the problem of quantifying the “productivity boost” and “cost savings” benefits that companies expect of their AI projects. It cites a KPMG survey which shows that 78% of C-suite employees are confident in return on investment (ROI) on AI projects, though there are no standards for measuring ROI. Other expected benefits are improved decision-making, enhanced customer experiences, and accelerated innovation. Yet measuring the impact of AI is hard. One reason is that it is hard to disentangle improvements from outside factors like market forces. Also, there are costs associated with keeping the technology up-to-date, making data quality acceptable for AI and integrating AI with the existing IT infrastructure. Another issue is that the benefits of AI depend on the sector. For healthcare and life sciences, document assessment tools are the key goal for AI. In financial services, 30% of companies see customer service chatbots as the main goal. In industrial markets, 64% of companies see inventory management as the main use case. Automation is the main use case for technology, media, and telecommunications companies.

One metric that is beginning to emerge is return on data. This is the percentage of an organization’s available historical data used effectively in its AI models. The article cites the ROI metrics of a Fintech startup, where metrics are applied pre and post AI transformation. These include productivity (with metrics: number of trade documents processed per day and total number of documents processed), cost savings (with metrics: reduction in labor costs due to manual processing of documents), error count (metric: the number of errors per 1’000 documents), time savings (the time to process a single transaction on average), risk assessment (with metric comparing risk predictions with historical data), as well as customer satisfaction scores per transaction. The article concludes with several pieces of advice for implementing metrics. Organizations should start with focused use cases, cloud-service AI to avoid implementation costs during the test phase, leverage existing historical data and maintain human oversight throughout the whole of the evaluation period.

8. More Spyware, Fewer Rules: What Trump’s Return Means for US Cybersecurity

This WIRED article looks at the expected changes that the incoming Trump administration is expected to have on the cybersecurity and AI fields. Experts expect a lessening of legislation that impacts business, including regulation directed at critical infrastructure, as well as a more aggressive stance against cyber armies in Russia, China, Iran, and North Korea. In Biden’s AI agenda, the focus was on social harms like bias, and on AI model developers needing to report to government. These restrictions are likely to be relaxed. Another Biden initiative that is likely to be scrapped is the limitation on the proliferation of commercial spyware technologies. These have been used by governments to harass journalists and civil rights activists, including in Saudi Arabia and the United Arab Emirates, whose governments are close allies of Trump. The US Cybersecurity and Infrastructure Security Agency (CISA) will be impacted in two ways. First, its efforts on combating misinformation will be curtailed, especially in relation to elections. Second, it had set up a center for collection of cyber-incident reports in 2022, but this is facing backlash because many companies feel that too much information needs to be given to the CISA. Finally, Trump might support the creation of a separate military cyber service, impose import restrictions on Chinese technologies, and has chosen a national security advisor who favors cyberattacks on Russia, North Korea, and other adversaries.

9. Big Tech’s AI spending to surpass 240 billion USD in 2024

This article reports that spending by Big Tech on AI is projected to have exceeded 240 billion USD in 2024. Spending was 74 billion USD in the first half of 2023. This figure reached 109 billion USD in Q3 of 2023, and then 104 billion USD in Q2 of 2024 – a 47% rise in one year. The spending after Q3 in 2024 is 171 billion USD. This growth in spending is explained by the increasing global market for AI – expected to reach 20 trillion USD globally by 2030 – and by the costs of training and running AI models.

  • In the case of Alphabet, its spending increased by 62% in Q3 of 2024. Meanwhile, the Google’s search engine generated just under 50 billion USD in ad revenue (a 12% annual increase) and its cloud revenue is up 35%. Alphabet reported a 34% jump in profits for Q3. The company is also streamlining its operations, with the article claiming that 25% of code is now being written using AI.
  • For Microsoft, its AI products could generate 10 billion USD annually. This is the fastest growing segment in the history of the company. Its productivity segment, which includes Office, generates around 28 billion USD and its PC/Xbox segment generates around 13 billion USD. The company spent 20 billion USD on AI and cloud infrastructure in Q3.
  • Amazon is expected to spend 75 billion USD on AI in 2024.
  • Meta will spend around 40 billion USD on AI by the end of 2024.