C3E 2022 Challenge Problems | Science of Security Virtual Organization

C3E 2022 Challenge Problems

The Computational Cybersecurity in Compromised Environments Workshops (C3E) have introduced challenge problems (CPs) since 2013. For 2021 the C3E Workshop examined issues in securing the supply chain with particular interest in software. Previous Challenge Problems can be viewed above.

Challenge Problem as an Analytic Research Follow-up

A follow-on program is available for researchers to address issues raised at the Workshop. For 2021-2022, the Challenge Problems will have three themes from the Workshop: supply chain software static analysis coverage, artificial intelligence applied to cybersecurity for the supply chain, and computational victimology to develop risk models for supply chain cybersecurity. The application of victimology to cybersecurity is a new area of research attempting to determine if an organization is more likely to be attacked, by whom, how, and why. We will be engaging 8-10 researchers on a part time basis to identify and explore specific issues developed on these three themes during C3E and present their findings at the 2022 C3E workshop. We have an approved NSF funding grant to pay a small honorarium to these researchers for their efforts over the next 10-12 months.

Overall Challenge. The overall challenge is to improve cybersecurity and software in the supply chain.

Task. The anticipated outcome will include a description of the critical security events taking place and the research process followed for the effort. That process may include details on how the research was taken into account and possible issues or limitations associated with the support provided by automation to address one of the themes. The results might include new models of the supply chain or actual software available via open source for improving the quality of source code.

Deliverables. Researchers are required to prepare a ten-minute video presentation to be presented at C3E 2022, a poster that describes their research, and a technical article suitable for publication in a major academic publication.

Researcher might also provide models, actual software applications (APPs) for open-source static analysis systems, narratives to define improvement to current supply chain processes, or AI tools or techniques for operational assessments and risk analysis.

The C3E Team encourages the researchers to think creatively on how to improve cybersecurity and software in the supply chain. What other deliverables might be useful?

There are three options that developed from presentations and discussions during the C3E Workshop on 27-28 October 2021. Researchers can choose any of the three for their proposed work.

Option 1 - Challenge Problem on Static Analysis Coverage

Current static and dynamic testing products provide detailed indications of potential software vulnerabilities that are either malicious or unintentional. These analyzers help developers and users to identify and mitigate weaknesses in the code.

One of the missing outcomes of most static analysis regimes is indication of what was and was not actually evaluated for the analysis report. If a tool finds many (real) weaknesses, the user has some confidence that it is thorough. But if few weaknesses are reported, does that mean there aren't many weaknesses or did the tool not analyze areas in the code?

To instill confidence in software developed, there needs to be a mechanism to report the actual coverage of static analysis tools. Coverage includes what code segments or basic blocks or modules were examined, as well as what types of vulnerabilities (buffer overflow, memory leakage, SQL injection, hardcoded passwords, etc.) were considered.

Such information would allow the reviewer or integrator or user to perform a more thorough risk assessment of incorporating the software in their environment as part of the supply chain. Based on anticipated threats, and the environment considered for deployment, a reviewer may assess the risk differently. If a static analysis tool did not examine a module at all, using that module has higher risk. In-depth reporting would provide coverage metrics of what was really analyzed, in addition to possible weaknesses.

The notion of coverage in dynamic test addresses the question: “have I tested everything?” in depth. Well-developed coverage metrics include block coverage, MC/DC, and branch coverage. See e.g. Zhu, Hall, and May, “Software Unit Test Coverage and Adequacy,” 1997, DOI 10.1145/267580.267590.

The Challenge Problem (CP) is to improve static analysis tools to provide “coverage metrics” as part of the analysis results reported. Coverage means what modules, functions, code blocks, etc. the tool actually analyzed. This implies reporting more than just the discovery of a buffer overflow or cross-site scripting weakness.

As an approach to the CP, researchers could begin with an open-source software static analyzer such as Frama-C, Clang, CodeHawk, or SpotBugs. (For more open-source tools, see for instance, “List of tools for static code analysis,” Wikipedia.) One of the tools could be modified to experiment with coverage metrics, such as functions not examined or sites located. “Sites” is a possible fundamental unit of measure similar to statement in test coverage. See Black and Ribeiro, “SATE V Ockham Sound Analysis Criteria,” 2016, Sec 2.2, DOI 10.6028/NIST.IR.8113.

Kestrel's CodeHawk already reports every site of possible memory errors (and its judgement on the site), but this is a huge report! How would you enhance the reporting results to clearly communicate to the user what was really covered in the analysis? Hint: this might involve a graphical map of the software package to show what was evaluated, what types of vulnerabilities were considered, and what vulnerabilities were found.

Some specific outcomes are:

Develop code for a static analyzer that reports a coverage metric.
Develop means of presenting coverage reports to clearly communicate information that drives decisions, such as, what was not covered? what functionality has its implementation sufficiently covered?
Show relations between possible static coverage metrics, analogous to what Zhu, Hall, and May have for testing.
Show relations (domination) between static coverage metrics and test coverage metrics.
Develop a visualization method to aid developers and operators in risk management decisions
Develop an algorithm in code for additional capability for inclusion in current open-source static analyzer tools
Apply AI/ML to the coverage metrics to detect emerging trends in vulnerabilities (increase in certain vulnerability types, emergence of new vulnerabilities as CVE/CWE are published, changes to the code, patterns of vulnerabilities discovered across static analysis of multiple software systems, etc.). Describe how to leverage AI/ML to analyze the multiple page static analysis output reports. Graph theory might be a useful tool in understanding the software risk.

Providing enhanced metrics from software analyzers offers the potential user with a new tool to do more accurate risk analysis of adopting a software application in their environment. Wise decisions based on this information can improve the overall cybersecurity of the software in the supply system. Specifically, the challenge problem is to investigate possible coverage metrics for static analysis.

Option 2 - Artificial Intelligence Analysis of Supply Chain

AI Challenge Question – Artificial Intelligence

Definition of AI: Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. Leading AI textbooks define the field as the study of "intelligent agents": any system that perceives its environment and takes actions that maximize its chance of achieving its goals. Some popular accounts use the term "artificial intelligence" to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving", however this definition is rejected by major AI researchers. History and definition can be found at “What is Artificial Intelligence – AI Definition & Application (intellipaat.com).” A key technology component of artificial intelligence is machine learning. Machine learning has revolutionized many domains, enabling super-human performance through data-heavy programming of complex systems.

Challenge: AI for software supply chain situational awareness. Examine potential uses of AI for cybersecurity applications related to understanding, maintaining, and improving the supply chain. Resting on a foundation of big data analytics, sensors/sensing and possibly the use of quantum computing, develop some specific techniques to apply to the software supply chain. Consider the use of modern machine learning to enable better understanding of the supply chains of complex systems-of-systems and better detection and mitigation of threats to the supply chain.

The AI application(s) should react dynamically, i.e., near real time, when sensing the cyber environment. Define the big data construct, the data input, and how derived. Define how AI might meet a need for testing individual components with emphasis on software as well as system performance (e.g., cloud computing) to enable the discovery of corrupted performance. Although this research effort covers both hardware and software, suggested emphasis on the software component. As presented at C3E there are related research projects performed by IARPA and DARPA.

As a separate requirement, investigate potential AI applications for cyber supply chain monitoring.

Some specific outcomes are:

Develop a system to learn key properties of the supply chain of a complex system from annotated data
Develop an interface for users to explore and understand a complex supply chain, using novel visualizations or user-assistant textual or verbal interfaces, where the system provides guidance or suggestions regarding robustness or risks in the supply chain
Apply AI/ML to the problem of understanding multiple dimensions of supply chains. For example, though there may be multiple providers of a component software or hardware module, each of those providers may in turn rely on a single manufacturer or supplier of a critical subcomponent. Though data-driven learning techniques, it may be possible to expose such hidden common points of risk.

Option 3 - Computational Victimology Risk Models for Supply Chain

Computational Victimology to inform Cybersecurity Risk Models for Supply Chain Attacks

Background: In the past year, supply chain attacks have targeted hospitals, school systems, oil pipelines, and even major meat distributors. The threat escalated significantly with a supply chain attack targeting the IT infrastructure company SolarWinds, which counts many federal institutions among its clients, including the business computers of the National Nuclear Security Administration (NNSA). Also, REvil, the group blamed for the May 30 ransomware attack of meatpacking giant JBS SA, is believed to be behind hacks on at least 20 managed-service providers, which provide IT services to more than a thousand small- and medium-sized businesses. Who might be the next target of a supply chain attack? What makes a target of a supply chain attack a good target for attackers?

Key Concepts:

Economics of Security

Preventative security measures are costly. Some level of uncertainty will likely have to be accepted and choices need to be made, trading off competing objectives and limited resources.
Security is not merely a technical problem that can be fixed with engineering solutions, but that is also has important economic and behavioral dimensions that need to be addressed.
Cyber threat intelligence to inform risk models are dependent upon the expertise and experience of the information security specialists charged with interpreting the intelligence.
A key variable in any risk model that incorporates threat intelligence is the probability that the organization will be a target of a particular attack type.

Victimology

A field of study/science most commonly associated with criminology and sometimes considered a sub discipline of criminology.
Victimology refers to the scientific study of victimization, including the relationships between victims and offenders, investigators, courts, corrections, media, and social movements.
In criminology, the victimology is described as studying victims of crimes, the emotional and psychological effects of the crime, and relationships between perpetrators and victims.
The purpose of the study of victimology is to identify what factors may increase someone’s chances of becoming a victim. Criminal statistics and victim demographics such as age, race, gender and social class are often compared for developing victimology profiles.

Cyber Victimology

An organization’s business interests, political action campaigns, vigilance level, protection abilities, and cyber risk tolerance are just some of the characteristics that can determine if an organization is more likely to be attacked, by whom, how, and why.
Taking the time to establish what an organization’s victimology can help a CISO and their team parallel the right protections and determine what risk posture the organization should assume.
The key task for CISOs is to understand the victimological profile of both their organization and their organization’s leadership. Then the CISO must map these to the specific cybersecurity program they build while identifying potential adversaries, commonly used tactics, and the subsequent prioritized protections that need to be put into place for the organization’s defense.
Victimological approach places the business risk into perspective for the Board of Directors. It adds likelihood and impact, which are details that have influence in the boardroom.

Contextual Vulnerabilities

The congruence of a potential target and motivated offender at a particular place and time is a basic premise for a contextual vulnerability as it relates to victimology (Spacio-temporality). This doesn’t work well for cyber victimology, but begs the question “what are some contextual vulnerabilities in cyberspace?
Threat Intelligence analogy: Carjacking incidents in Washington DC rose 143% in 2020. You own a car in Washington DC, the victimology profile indicates victims are mostly in delivery service or patrons at restaurant curbside/take-out as they exited/returned to their vehicles. Based on the victim profile, you can determine whether you have a low or high contextual vulnerability to this threat and what security measures you might take to reduce risk.
Are there ways to determine contextual vulnerability and associated weighting/values as part of a cybersecurity risk model?
Contextual vulnerabilities research in the cyber domain have mostly focused on the individual level (cybercrimes against individuals). Can this research be extended to the organizational level (groups, companies, large corporations, institutions, agencies, etc.)?

Challenge:

How can science be applied to victimology to create ‘Computational Victimology’ approaches that can quantitatively inform supply chain cybersecurity risk models?
Can we develop rigorous systems and methods based on victim profiles to better inform risk models and cybersecurity investment decisions?
What are the relevant data sets? What/Where is the data on victims? Is there data?
Can we conceive low cost measures to make a target less desirable to an attacker?
How does the risk model for a “supplier” (provider) differ from “the supplied” (customer)?

Proposal Process – Next Steps

If you are interested, please send a short description (1 to 5 pages) of your proposal to Dr. Don Goff, Co-PI with Dan Wolf for the Challenge Problem, at dgoff@cyberpackventures.com by December 17, 2021. The proposals will go through a peer review process with only 8-10 selected for funding in the range of $2000 to $10,000 per effort with announcements of the approved funding around January 17, 2022. The awards will be in the form of an honorarium and will not provide sufficient support for full time engagement.

Please send any questions to the Co-PIs Don at dgoff@cyberpackventures.com or Dan at dwolf@cyberpackventures.com

Additional details will be provided via email to the workshop participants and the CPS-VO web site.

(Version 1.0 dtd 6 December 2021)