Suggested Reading, 2023 | Science of Security Virtual Organization

Generative AI and Large Language Models

LLM Safety/Security

OWASP

https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-v05.pdf

The purpose of our group, as outlined in the OWASP Top 10 for LLM Applications Working Group Charter, is to identify and highlight the top security and safety issues that developers and security teams must consider when building applications leveraging Large Language Models (LLMs). Our objective is to provide clear, practical, and actionable guidance to enable these teams to proactively address potential vulnerabilities in LLM-based applications... This document, Version 0.5, serves as a crucial milestone in our ongoing journey. It encapsulates the collective insights and understanding of our group, at this early stage, of the unique vulnerabilities inherent to applications leveraging LLMs. It's important to note that this is not the final version of the OWASP Top 10 for LLMs. Instead, consider it a 'preview' of what's to come.
Gradient-Based Word Substitution for Obstinate Adversarial Examples Generation in Language Models
(PREVIOUSLY: Investigating the Existence of “Secret Language” in Language Models)

https://arxiv.org/abs/2307.12507

In this paper, we study the problem of generating obstinate (over-stability) adversarial examples by word substitution in NLP, where input text is meaningfully changed but the model's prediction does not, even though it should. Previous word substitution approaches have predominantly focused on manually designed antonym-based strategies for generating obstinate adversarial examples, which hinders its application as these strategies can only find a subset of obstinate adversarial examples and require human efforts. To address this issue, in this paper, we introduce a novel word substitution method named GradObstinate, a gradient-based approach that automatically generates obstinate adversarial examples without any constraints on the search space or the need for manual design principles. To empirically evaluate the efficacy of GradObstinate, we conduct comprehensive experiments on five representative models (Electra, ALBERT, Roberta, DistillBERT, and CLIP) finetuned on four NLP benchmarks (SST-2, MRPC, SNLI, and SQuAD) and a language-grounding benchmark (MSCOCO). Extensive experiments show that our proposed GradObstinate generates more powerful obstinate adversarial examples, exhibiting a higher attack success rate compared to antonym-based methods. Furthermore, to show the transferability of obstinate word substitutions found by GradObstinate, we replace the words in four representative NLP benchmarks with their obstinate substitutions. Notably, obstinate substitutions exhibit a high success rate when transferred to other models in black-box settings, including even GPT-3 and ChatGPT.

High-Capacity Model Architectures (Does Size Determine Capability?)

Emergent Abilities of Large Language Models

https://openreview.net/pdf?id=yzkSU5zdwD

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.
TinyStories: How Small Can Language Models Be and Still Speak Coherent English

https://arxiv.org/abs/2305.07759

Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention).

In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities.

Methods for LLM Augmentation

Toolformer: Language Models Can Teach Themselves to Use Tools
https://arxiv.org/abs/2302.04761

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT
https://arxiv.org/pdf/2304.11116.pdf

In this paper, we aim to develop a large language model (LLM) with the reasoning ability on complex graph data. Currently, LLMs have achieved very impressive performance on various natural language learning tasks, extensions of which have also been applied to study the vision tasks with data in multiple modalities. However, when it comes to the graph learning tasks, existing LLMs present very serious flaws due to their inherited weaknesses in performing precise mathematical calculation, multi-step logic reasoning, perception about the spatial and topological factors, and handling the temporal progression.

Cyberpsychology Aspects of Foreign Malign Influence

"Combating Foreign Disinformation on Social Media: Study Overview and Conclusions"

Abstract: How are state adversaries using disinformation on social media to advance their interests? What does the Joint Force—and the U.S. Air Force (USAF) in particular—need to be prepared to do in response? Drawing on a host of different primary and secondary sources and more than 150 original interviews from across the U.S. government, the joint force, industry, civil society, and subject-matter experts from nine countries around the world, researchers examined how China, Russia, and North Korea have used disinformation on social media and what the United States and its allies and partners are doing in response. The authors found that disinformation campaigns on social media may be more nuanced than they are commonly portrayed. Still, much of the response to disinformation remains ad hoc and uncoordinated. Disinformation campaigns on social media will likely increase over the coming decade, but it remains unclear who has the competitive edge in this race; disinformation techniques and countermeasures are evolving at the same time. This overview of a multi-volume series presents recommendations to better prepare for this new age of communications warfare.

https://www.rand.org/pubs/research_reports/RR4373z1.html
CSET (2021) - AI and the future of disinformation campaigns - Part 1 The RICHDATA framework

Artificial intelligence offers enormous promise to advance progress, and powerful capabilities to disrupt it. This policy brief is the first installment of a series that examines how advances in AI could be exploited to enhance operations that automate disinformation. Introducing the RICHDATA framework—a disinformation kill chain—this report describes the stages and techniques used by human operators to build disinformation campaigns.

https://cset.georgetown.edu/publication/ai-and-the-future-of-disinformation-campaigns/
CSET (2021) - AI and the future of disinformation campaigns - Part 2

This policy brief is the second installment of a series that examines how advances in AI could be exploited to enhance operations that automate disinformation campaigns. Building on the RICHDATA framework, this report describes how AI can supercharge current techniques to increase the speed, scale, and personalization of disinformation campaigns.

https://cset.georgetown.edu/publication/ai-and-the-future-of-disinformation-campaigns-2/