C3E Cyber Security Challenge Problem 2015

 

Novel Approaches to Avoid Misattribution of Malicious Cyber Activity

Incident, event, and associated forensic analysis of malicious cyber activity have for the most part been focused specifically on those malicious executable files, exploits, and attack vectors that directly support threat actor’s operations. The subsequent grouping or categorization of threat actors is largely based on the discovery of distinguishing features in this focused analysis from a collection of malicious events. Unfortunately, many cyber threat actors, or groups of threat actors, are using similar tools and techniques to exploit common system vulnerabilities (and even common victims) making it much harder to distinguish the operations of different threat actors. The potential exists for certain cyber threats, especially nation-state actors, to actually attempt to mislead cybersecurity analysts by using tools and techniques of other threat actors. Ultimately, this kind of situation creates an even greater challenge for those cybersecurity professionals who are trying to achieve high confidence attribution.

C3E is looking for innovative methods, other than those traditionally used today, to distinguish cyber threat activities. For instance, one approach might be to look for distinguishing features or patterns (signatures) in the peripheral (non-malware) files used (and reused) by threat actors to perform routine legitimate system functions or those used for obfuscation? Another approach might be to examine routine, but necessary system (configuration) activities associated with threat actor’s operations, but not necessarily viewed as malicious. Elements such as file templates, page settings, linguistics (stylometry), icons, library functions, scripts, and etc. that wouldn't’t normally be examined as part of incident analysis procedures could also be distinguishing characteristics for attribution.

The aforementioned example is intended to stimulate thinking on non-traditional approaches and should not be viewed as C3E trying to prescribe a research direction. To reiterate, C3E is looking for novel approaches to distinguish between different cyber threat actors that could potentially support cybersecurity community in determining attribution of high interest events. The ability to achieve high confidence attribution can be a critical factor in the formulation of national cyber deterrence strategies.

What are the observable and measurable features associated with malicious cyber events that are not currently part of the standard technical or behavioral forensics analysis procedure?

Are any of these features distinct from one group to another?

Are there any recognizable threat actor procedural biases, quirks, or other subtleties that can be discerned from malicious cyber event data?

Are there any aspects or features of malicious cyber events that can supplement traditional signatures used for making threat attribution assessments?

Using a PREDICT dataset or some other collection of internet traffic, provide fact-based (non-theoretical) evidence that supports your claims/methods for supplementing attribution analysis (looking at non-traditional features). Your evidence doesn't’t have to be conclusive, but just enough to warrant further investigation.  

How unique are these features and how resilient would these methods be to obfuscation techniques? Provide fact-based evidence that supports your research claims.

Data Sets

In support of this research challenge, C3E will facilitate access to the Protected Repository for the Defense of Infrastructure against Cyber Threats (PREDICT), a data repository for cyber security research. PREDICT is supported by the Department of Homeland Security, Science & Technology Directorate. Researchers can browse the catalog and select any other that are appropriate for their approach to the problem.

There are more than 400 datasets within the PREDICT repository. To participate in this cyber discovery problem, the government sponsors strongly encourage researchers to use the PREDICT datasets.

PREDICT technical advisors have suggested 2015 NC Cyber Defense Competition data set (NCCDC 2015) for possible use to demonstrate new and innovative research approaches to address this discovery problem. Researchers can browse the catalog and select any other that are appropriate for their approach to the problem. The specific information on the suggested data set is as follows:

  • NCCDC 2015
  • Class:  Unrestricted
  • Category:  Synthetically Generated Data
  • Sub-Category:  Synthetic Cyber Exercise Data
  • Hosted By:  Packet Clearing House
  • Provider Org:  Packet Clearing House
  • Short Description:  2015 NC Cyber Defense Competition
  • Long Description:  These log files are packet captures from the 2015 National Collegiate Cyber Defense Competition (nccdc.org). NCCDC is a multi-day competition that specifically focuses on the operational aspects of managing and protecting an existing "commercial" network infrastructure. Teams of undergraduate/graduate students are provided with a fully functional (but insecure) small business network they must secure, maintain, and defend against a live Red Team. Teams must also respond to business tasks called "injects" throughout the competition. More metadata is provided as a text file accompanying the dataset data files.
  • Comment: The meta data files provide some description of what team is assigned what portions of the network address space so that can be useful ground truth for attribution purposes.  Datasets from previous years are also available. 

Researchers can access the DHS PREDICT repository via https://www.predict.org. The C3E planning team is available to help in getting researchers registered with the system and to respond to any questions. Contact John Drake at John.Drake (at) associates.hq.dhs.gov for any PREDICT questions, if you need any assistance.

Additional Information:

For more information about the Cyber Discovery Problem contact Chip Willard at gnwilla (at) nsa.gov or Dan Wolf at dwolf (at) CyberPackVentures.com.  Additionally, more information will be posted on the C3E website as it becomes available.

Given the interest in this topic, novel approaches presented at the October C3E 2015 Workshop may result in a continuation of the problem with additional research challenges for C3E 2016 Workshop.    

Relevant Articles: