"AI Research Team at RIT Publish Findings on Generative Harmful Content"

Faculty and Ph.D. students at the Rochester Institute of Technology's (RIT) ESL Global Cybersecurity Institute identified problems regarding generative hate speech in Google's PaLM2 Large Language Model (LLM), which drives Bard. These issues show the fundamental limitations of LLMs. The team pointed out that despite LLMs being deployed for the general population, there are no proper guardrails in place to ensure that they are not used to generate hate speech and other harmful content. Therefore, they designed a novel framework called Toxicity Rabbit Hole, which they believe can become a standard practice for benchmarking the efficiency of LLM guardrails in the future. This article continues to discuss the new framework developed to benchmark the efficiency of LLM guardrails.

Rochester Institute of Technology reports "AI Research Team at RIT Publish Findings on Generative Harmful Content"

Submitted by grigby1

Submitted by Gregory Rigby on Wed, 11/01/2023 - 15:09