MatterGen - Generative AI for Materials Science

Clock Running out on TikTok

Posted on January 18th, 2025

Summary

AI Agents is a hot topic in AI at the start of this year. In an interview, Anthropic’s co-founder explains how improving interaction with existing tools is key for agents to be able to do anything useful. One key concern is prompt injection attacks, where malicious prompts hidden in web pages visited by an agent confuse it into executing an undesired task or leaking data. Meanwhile, a survey by IBM and Morning Consult found that 99% of AI developers are working on AI agents, mostly for personal assistants, content creation or customer service. Security is one of their main concerns, as are their own lack of expertise and poor quality development tools.

On the technical front, Microsoft published an article on MatterGen – an AI system designed to generate the design of novel useful inorganic materials. MaterGen generates material designs using a diffusion model, similar to the way image-generation models work. The search for new materials is considered fundamental to scientific advancement in energy storage, semiconductor technologies, and carbon capture. An article on retrieval-augmented generation (RAG) looks at the evolution of RAG from simple database queries to the use of language models for creating queries on knowledge graphs to handle complex queries.

On the ecological front, Meta is buying 200 megawatts of solar energy to feed data centers in Texas, bringing its renewable power “portfolio” to over 12 gigawatts. In cybersecurity, an article suggests that security operations centers (SOC) will increase their use of AI from 5% today to 70% in 2028 to keep up with increased threats. 2024 saw the average time it takes hackers to breach a company network fall from 79 minutes to 62 minutes. Meanwhile, US president Joe Biden signed an executive order to increase defenses against cyber-attacks from Russia and China.

Meta is the subject of a lawsuit claiming that it used copyrighted books to train its Llama models. It is claimed that Meta used content from the LibGen and Z-library sites – both of which have been sued and ordered to shutdown. Meanwhile, TikTok is poised to shutdown its operation in the US this Sunday because of a ban made following suspicions that user data is being leaked to the government in China.

1. Anthropic’s chief scientist on 4 ways agents will be even better in 2025

This MIT Technology Review article contains an interview with Jared Kaplan, the co-founder and Chief Scientist at Anthropic, on the evolution of Agentic AI. Agents are chatbots that can do tasks on a user’s behalf, e.g., book a flight, fill out forms, or create a bullet-point summary of a meeting, all with minimal or zero user supervision. OpenAI’s Sam Altman has said that 2025 could be the year that agents “join the workforce. For Kaplan, a crucial requirement for AI agents is to be able to interact with exiting IT tools. For instance, the Claude chatbot is now able to execute on-screen tasks by moving the mouse, clicking buttons and writing text. Many agent tasks will not actually require much reasoning. Kaplan cites the example of earlier AI systems that were “superhuman in terms of how well they could play board games”, but which were not useful because they could not do anything else. IT tool interaction is needed for multimodal functionality and for applications like robotics. Improved IT tool interaction might also improve software development agents because improved browser interaction patterns can make it easier for agents to do interactive debugging. Finally, Kaplan mentions that one of the biggest challenges is defending against prompt injection attacks. In this attack, an attacker includes data in a prompt that confuses the AI into executing a task or revealing information that it should not have revealed. The danger for agents is that malicious prompts may be hidden in the pages of the websites that the agent visits.

2. In AI copyright case, Zuckerberg turns to YouTube for his defense

Meta is the target of a lawsuit by authors where the company is accused of training its Llama models, including the future Llama 4 models, on copyrighted books. Specifically, Meta is accused of training its models from content found on the LibGen and Z-library sites – both of which have been sued and ordered to shutdown. The Russian nationals who allegedly ran Z-library have even been charged with copyright infringement, wire fraud, and money laundering. The sites gave access to copyrighted works from publishers that included Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. Zuckerberg allegedly agreed to the use of LibGen to train at least one Llama model, apparently using the logic that a whole site cannot be ignored just because some of the content is illegally placed there. He compared the site to Youtube in this regard, meaning that it is not reasonable to avoid using Youtube for training purposes even if some videos are pirated. AI companies still claim that training models using copyrighted content should be permitted under the fair use provision of copyright law.

3. Meta adds 200 megawatts of solar to its 12 gigawatt renewable portfolio

Meta announced that it is buying 200 megawatts of solar energy from a solar farm run by Engie to feed its data centers in Texas. This will bring the company’s renewable power “portfolio” to over 12 gigawatts. Meta is also building a 2-gigawatt data center in Louisiana which will be powered by natural gas sourced electricity. Google is working on a 20 billion USD collaboration with Intersect Power and TPG Rise on renewable energies, and Microsoft on a 9 billion USD renewable energy deal with Acadia Infrastructure Capital. After last year’s announcement by Google and Kairos for the deployment of 500 megawatts of small modular nuclear reactors from 2030, Amazon has signed a deal with X-Energy for 300 megawatts of nuclear energy sourced electricity. These announcements all come at a time of extreme concern that data centers will not have enough power for their operation. The article writes that half of all new AI servers could be underpowered by 2027.

4. MatterGen: a generative model for inorganic materials design

This research report from Microsoft Research, also published in Nature, describes MatterGen – a new AI system designed to generate the design of novel useful inorganic materials. The search for new materials is considered fundamental to scientific advancement in energy storage (better batteries and efficient solar cells), semiconductor technologies, carbon capture (systems that capture CO2 emissions from the atmosphere or industrial sites in an effort to combat climate change) and generally S.U.N. materials (for Sustainable, Unique, and Nature-based materials, defined as innovative and based on natural or renewable resources). MatterGen uses a diffusion model to create structures – similar to the way that image-generation AI tools like DALL-E create images from textual descriptions. Compared to previous generative AI approaches, MatterGen is more than twice as likely to create a novel and stable structure that is more than 15 times closer to the local energy minimum (which in materials science, indicates a state where the structure is less susceptible to outside disturbances). MaterGen can generate material designs from across the periodic table while satisfying constraints on properties linked to chemistry as well as electronic and magnetic properties. The MatterGen’s source code has been released under an open-source license on Github.

Source Arxiv

5. Winning the war against adversarial AI needs to start with AI-native SOCs

This article looks at the problems facing security operations centers (SOC) today, and how AI may be helping to address these. According to the article, the average time it takes hackers to breach a company network fell from 79 minutes to 62 minutes in 2024. Attackers are increasingly using a combination of generative AI and social engineering to gain a foothold in companies. There has been an increase in malware-less attacks, where legitimate tools are exploited to exfiltrate data, and multi-domain attacks (where the attack exploits a combination of systems and networks). Companies working in AI and emerging technologies are particularly targeted at the moment, and cloud intrusions surged 75% in the last year. This situation is increasing the strain on SOC operators who are suffering from alert fatigue (the problem that 4 out of 10 alerts produced by SOCs are false positives), “swivel-chair integration” (the problem that legacy systems were not designed to share data, so SOC operators need to move around to view different dashboards), all of which is leading to burnout and churn among operators. The article estimates that the global cybersecurity workforce is short 3.4 million professionals.

AI support for SOC threat detection and incident response is seen as potentially reducing these concerns over the next years, with Gartner predicting that AI support will increase from 5% to 70% by 2028. Example systems today using AI include CrowdStrike’s Charlotte AI, Google’s Threat Intelligence Copilot, Microsoft Security Copilot, Palo Alto Networks’ AI Copilots, and SentinelOne’s Purple AI. A 50% decrease in incident response times is reported for organizations using these systems.

6. The journey towards a knowledge graph for generative AI

This article looks at the evolution of retrieval-augmented generation (RAG) for large language models, drawing a parallel with the evolution of search engines on the Web. Early search engines effected simple keyword matching when looking for information, essentially treating each page as a distinct entity. The first RAG implementations were similar, allowing only for simple and single-point questions (e.g., “who won the 2022 World Cup?”). Search engines since Google view the web as a graph of knowledge, and this concept has been formalized into formal, structured machine-readable networks of entities and relationships. Wikidata, the sister project of Wikipedia, is a prime example of such a graph. Knowledge graphs are used GraphRAG and Knowledge-GraphRAG, because they make it easier to implement more complex queries like multi-point questions (e.g., “who played in the 2022 World Cup final and recently signed for Real Madrid?”).

The author mentions that it remains a challenge to formulate queries on knowledge graph properties when the number of properties is large (Wikidata has over 10’000 distinct properties for instance). An approach taken in QirK (Question Answering via Intermediate Representation on Knowledge Graphs) is to use a language model to create the knowledge graph query. Thus, the language model extends its role from synthesizing content, to creating the graph queries for the accompanying RAG operations.

Source InfoWorld

7. Enterprise AI Development: Obstacles & Opportunities

This survey from IBM and Morning Consult looks at the feelings of developers towards AI tools. The survey questioned over 1000 Enterprise AI Developers in the US. Two key observations are the lack of skills and the lack of quality tools. The survey found that only 24% of developers considered themselves to be experts in generative AI. Their main concerns are the lack of standardized AI processes and the absence of an ethical and trusted lifecycle for transparency and traceability of data. 72% of developers use between 5 and 15 tools in their AI projects and so are reluctant to have to learn more tools. The key requirements for tools that developers mentioned are performance (for 42%), flexibility (for 41%), ease of use (for 40%), and integration (for 36%), yet over one-third of developers say that current tools lack exactly these qualities. Nonetheless, 41% of developers said tools save them 1 to 2 hours per day. 99% of respondents are working on AI projects where the goal is to develop AI agents for personal assistants, content creation or customer service. The highest concern for agents that developers feel are trustworthiness (31%), the introduction of new attack vectors (23%) and regulatory issues (22%).

8. Biden strengthens US cyber defenses against Russia and China threats

US President Joe Biden has signed an executive order to increase defenses against cyber-attacks. The order gives federal agencies the mandate to implement end-to-end encryption on all email and video communications, as well as the use of AI-powered cyber-defense systems. The Cybersecurity and Infrastructure Security Agency’s (CISA) is now authorized to search for threats across all federal networks and to enforce security commitments on all government contractors. From 2027 onwards, federal agencies will only be able to purchase equipment that has a special cyber-trust label. This move is designed to ensure manufacturers of consumer electronics (from baby monitors to home security systems) implement rigorous cybersecurity processes. The approach even applies to companies in the space domain, following Russia’s hacking of Ukraine’s telecommunications systems in the early days of the war. The order was signed following several high-profile attacks in the US, such as the cyber-attacks on the US Treasury and the alleged hacking of Donald Trump’s phone. A White House spokesman specially named Russia, China and Iran as adversaries in protection against whom the measures are being put in place.

9. TikTok prepares to shut down app in US on Sunday, sources say

The social media platform TikTok is poised to shutdown its operation in the US this Sunday, January 19th. This follows a law signed by President Joe Biden last April that requires the platform owner, ByteDance, to sell its U.S. assets or face a nationwide ban. ByteDance is a private company that is 60% owned by institutional investors like BlackRock and General Atlantic, and its founders and employees own 20% each. The ban is being made because of fears that user data is being leaked to the Chinese government. The App will still be usable from Sunday, except that US companies are banned from providing any services on the platform or from distributing or maintaining the App. It is thought that this ban might also impact users worldwide because they depend on service providers in the US. The App was used by 170 million Americans and the company employs over 7’000 people in the US. TikTok has said that it expects to lose one third of its clients if the ban lasts more than a month.